WO2018094704A1 - 一种数据校验的方法及存储系统 - Google Patents

一种数据校验的方法及存储系统 Download PDF

Info

Publication number
WO2018094704A1
WO2018094704A1 PCT/CN2016/107355 CN2016107355W WO2018094704A1 WO 2018094704 A1 WO2018094704 A1 WO 2018094704A1 CN 2016107355 W CN2016107355 W CN 2016107355W WO 2018094704 A1 WO2018094704 A1 WO 2018094704A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
node
write data
data request
metadata
Prior art date
Application number
PCT/CN2016/107355
Other languages
English (en)
French (fr)
Inventor
魏明昌
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to SG11201707304RA priority Critical patent/SG11201707304RA/en
Priority to EP16897464.0A priority patent/EP3352071B1/en
Priority to JP2017552874A priority patent/JP6526235B2/ja
Priority to CN201680003337.5A priority patent/CN109074227B/zh
Priority to PCT/CN2016/107355 priority patent/WO2018094704A1/zh
Priority to CA2978927A priority patent/CA2978927C/en
Priority to AU2016397189A priority patent/AU2016397189B2/en
Priority to BR112017020736A priority patent/BR112017020736B8/pt
Publication of WO2018094704A1 publication Critical patent/WO2018094704A1/zh
Priority to US16/110,504 priority patent/US10303374B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0611Improving I/O performance in relation to response time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1076Parity data used in redundant arrays of independent storages, e.g. in RAID systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • G06F12/0253Garbage collection, i.e. reclamation of unreferenced memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0659Command handling arrangements, e.g. command buffers, queues, command scheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0688Non-volatile semiconductor memory arrays
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0689Disk arrays, e.g. RAID, JBOD
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • G06F2212/1024Latency reduction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1032Reliability improvement, data loss prevention, degraded operation etc
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/15Use in a specific computing environment
    • G06F2212/154Networked environment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/25Using a specific main memory architecture
    • G06F2212/254Distributed memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2213/00Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F2213/38Universal adapter
    • G06F2213/3854Control is performed at the peripheral side

Definitions

  • Embodiments of the present invention relate to the field of storage technologies, and in particular, to a data verification method and a storage system.
  • the host sends a plurality of generated write data requests to a plurality of storage nodes, each of which stores a respective write data request.
  • the write data request includes data and a logical address (hereinafter simply referred to as a logical address) of the data to be written to the storage node.
  • the host calculates the verification data of the generated multiple write data requests, and sends the check data to one or more storage nodes for storage. .
  • the host divides the plurality of write data requests that reach the set size according to a predetermined size, obtains a plurality of data units, and then calculates a check unit of the plurality of data units. These data units and check units form a stripe. Finally, the host sends each data unit or check unit to a storage node for storage. Since these write data requests are randomly divided into multiple data units, they are sent to different storage node stores. Therefore, when reading data, it is often necessary to collect data carried by multiple write data requests from different storage nodes. Read operations across storage nodes can affect the efficiency of reading data.
  • a first aspect of the present application provides a storage system.
  • the storage system includes a host, a check node, and a plurality of data nodes. Each data node has a unique identity.
  • the host is configured to divide the plurality of write data requests into a plurality of write data request sets according to the identifier of the data node included in the write data request.
  • Each set of write data requests includes one or more write data requests having the identity of the same data node.
  • Each write data request includes data, a logical address to which the data is to be written, and an identification of the data node to which the data is to be written.
  • the data node to which the data is to be written is the host according to the When the data or the logical address is selected, or when the user triggers multiple write data requests in the host 11, an instruction is sent to the host 11 to request that the write data requests be written to the same data node.
  • the host 11 can select a data node for these write data requests according to the needs of the user, and carry this data node in each write data request.
  • the host calculates check data of the set number of write data request sets .
  • the set number of write data request sets is a subset of the plurality of write data request sets. Sending, by the host, each write data request set in the set number of write data request sets to a data node indicated by an identifier of a data node included in the write data request set, and sending the check data to the The check node.
  • the host divides the plurality of write data requests into a plurality of write data request sets according to the identifier of the data node included in the write data request.
  • the host calculates the check data of the set number of write data request sets, and sends the check data to the check node. Storage, thereby ensuring the reliability of the data.
  • each write data request set includes a plurality of write data requests to be written to the same data node, and each write data request is to be written to the data node, the host is based on the data in the write data request or The logical address to which the data is to be written is selected.
  • each write data request set holds the same type of write data request.
  • each write data request set is sent to the data node indicated by the identifier included in the write data request set, and the same type of write data request is stored in the same data node. Since the data in the same type of write data request is more likely to be read at the same time, the data can be read in one data node when the data is read, and the read operation is not performed across nodes, thereby improving data reading. s efficiency.
  • the host is further configured to allocate an identifier for each write data request in the set number of write data request sets, and each write data request The identity of the set is sent to the data node of the data node included in the write data request set that identifies the indication.
  • the data node is configured to receive a write data request set and the write data request set The logo of the combination.
  • the data node is also used to create and save metadata.
  • the metadata includes a correspondence between an identifier of the write data request set and a logical address to be written in each write data request, and a logical address to be written in each write data request. The correspondence between internal offsets.
  • the metadata associated with the write data request set is also stored locally in the data node. If the garbage data recovery operation is performed on the write data request set, the metadata can be modified locally.
  • the storage system further includes a metadata check compute node and a metadata check node.
  • the data node is further configured to send the metadata set and the identifier of the data node to the metadata check computing node when determining that the accumulated metadata reaches the preset data amount.
  • the metadata set includes the accumulated metadata that reaches the preset amount of data.
  • the metadata check computing node is configured to receive a metadata set sent by each of the plurality of data nodes and an identifier of the data node, and save a correspondence between each metadata set and an identifier of the data node. And, a set number of metadata sets are selected from the received plurality of metadata sets according to the correspondence.
  • the selected set data amount metadata sets correspond to identifiers of different data nodes.
  • the metadata check calculation node is further configured to calculate check data of the selected set number of metadata sets. Then, the metadata check calculation node sends the check data of the set number of metadata sets to the metadata check node.
  • the metadata check node is different from a data node in which each of the set of metadata sets is stored.
  • the storage system further includes a garbage collection node.
  • the garbage collection node is used to perform system garbage collection operations. Specifically, the garbage collection node according to the correspondence between the stripe identifier sent by the host and the identifier of each write data request set included in the stripe, and the bitmap of the write data request set sent by the data node are from a plurality of The stripe containing the most invalid data is selected among the strips.
  • the stripe in this application includes settings A number of write data request sets and check data calculated from the write data request sets.
  • the bitmap in this application is used to indicate the amount of data of invalid data contained in the write data request set.
  • the garbage collection node uses the stripe containing the most invalid data as the stripe to be reclaimed, and sends a garbage collection notification message to each of the data nodes where the set of write data requests included in the stripe to be reclaimed.
  • Each garbage collection notification message is used to notify the data node to perform system garbage collection, and each garbage collection notification message includes an identifier of the write data request set. Since the garbage collection unit selects the stripe containing invalid data when performing the system garbage collection operation, the efficiency of system garbage collection is improved.
  • the data node after receiving the garbage collection notification message, according to the identifier of the write data request set and the saved bitmap of the write data request set Performing a system garbage collection operation on the set of write data requests. Specifically, the data node determines the set of write data requests to be reclaimed according to the identifier of the set of write data requests. And determining, by the data node, a logical address to be written, where the valid data is included in the write data request set, and deleting the logical address to be written and the write data request set according to the bitmap of the write data request set.
  • the data node sends the logical address of the hard disk of the write data request set to the solid state hard disk, and the block corresponding to the logical address of the hard disk is marked as invalid by the solid state hard disk.
  • subsequent SSDs perform internal garbage collection, they can be directly erased, eliminating the need to copy valid data again, reducing the number of write-ups in the SSD.
  • system garbage collection provided by the present application is that each data node splicing the valid data into other local write data request sets, so the data node independently completes the system garbage collection, and does not need to perform data interaction with other data nodes, thereby saving The bandwidth between data nodes.
  • the second aspect of the present application provides a data verification method, which is applied to the storage system implemented in any one of the first aspect or the first aspect.
  • a third aspect of the present application provides a host, the function of the host and the first aspect of the first aspect Achieve consistent storage systems provided.
  • a fourth aspect of the present application provides a storage system.
  • the storage system includes a check node, a check compute node, and a plurality of data nodes.
  • Each data node is configured to send a set of write data requests to the check computing node.
  • the set of write data requests includes one or more write data requests, each write data request including data and an identification of the data node to which the data is to be written.
  • the size of the write data request set is equal to a preset data amount.
  • the check computing node is configured to receive a plurality of write data request sets, and select a set number of write data request sets from the plurality of write data request sets.
  • the set number of write data request sets is a subset of the plurality of write data request sets, and the set number of write data request sets includes identifiers of different data nodes. Then, the check calculation node calculates check data of the set number of write data request sets, and sends the check data to the check node. The check node is different from the data node where each write data request set in the set number of write data request sets is located.
  • each data node sends the write data request as a write data request set to the check computing node when the accumulated size of all the write data requests reaches a preset data amount.
  • the verification computing node selects a set number of write data request sets from the received plurality of write data request sets, calculates check data of the set number of write data request sets, and sends the check data to the check node storage. This ensures the reliability of the data.
  • each of the write data requests further includes a logical address to be written, where the data node to be written is a host in the storage system Depending on the data or the logical address to which the data is to be written. Alternatively, the data node to which the data is to be written may also not be selected by the host, but is directly specified by the user when the write data request is triggered in the host. Each data node is further configured to receive a plurality of write data requests sent by the host.
  • each write data request set includes a plurality of write data requests having the same data node identifier, and each write data request is to be written to the data node, the host is based on the data in the write data request or the The logical address of the data to be written is selected. Therefore, each write data request set holds the same type of write data request. Due to the same category of write data requests The data in the data is more likely to be read at the same time, so that the data can be read in one data node when the data is read, and the reading operation is not performed across nodes, thereby improving the efficiency of data reading.
  • the data node is further configured to allocate an identifier to the saved write data request set, and send the identifier of the write data request set to The check compute node.
  • the data node then creates and saves the metadata.
  • the metadata includes a correspondence between an identifier of the write data request set and a logical address to be written in each write data request, and a logical address to be written in each write data request. The correspondence between internal offsets.
  • the metadata associated with the write data request set is also stored locally in the data node. If the garbage data recovery operation is performed on the write data request set, the metadata can be modified locally.
  • the storage system further includes a metadata check calculation node and a metadata check node.
  • the data node is further configured to: when determining that the accumulated metadata reaches the preset data amount, send the metadata set and the identifier of the data node to the metadata check computing node.
  • the metadata set includes the accumulated metadata that reaches the preset amount of data.
  • the metadata check calculation node is configured to receive a metadata set sent by each data node and an identifier of the data node, and save a correspondence between each metadata set and an identifier of the data node. And, the metadata check calculation node selects the set number of metadata sets from the received plurality of metadata sets according to the correspondence relationship.
  • the set data amount metadata set corresponds to an identifier of a different data node.
  • the metadata check calculation node calculates check data of the set quantity of metadata sets, and sends check data of the set quantity of metadata sets to the metadata check node.
  • the metadata check node is different from a data node that stores each of the set of metadata sets.
  • the storage system further includes a garbage collection node.
  • the check calculation node is further configured to allocate an identifier for the stripe. Said The striping includes the set number of write data request sets and the set number of write data request sets. Then, the verification computing node sends the correspondence between the stripe identifier and the identifier of each write data request set included in the stripe to the garbage collection node.
  • Each data node is further configured to send a bitmap of the saved set of write data requests to the garbage collection node, the bitmap being used to indicate a data amount of invalid data of the write data request set.
  • the garbage collection node is configured to: according to the correspondence between the stripe identifier and the identifier of each write data request set included in the stripe, and the bitmap of each write data request set from multiple strips A packet containing the most invalid data is selected, and a garbage collection notification message is sent to each of the data nodes in which the write data request set included in the stripe is located.
  • Each garbage collection notification message includes an identification of a set of write data requests. Since the garbage collection unit selects the stripe containing invalid data when performing the system garbage collection operation, the efficiency of system garbage collection is improved.
  • the data node after receiving the garbage collection notification message, according to the identifier of the write data request set and the saved bitmap of the write data request set Performing a system garbage collection operation on the set of write data requests. Specifically, the data node determines the set of write data requests to be reclaimed according to the identifier of the set of write data requests. And determining, by the data node, a logical address to be written, where the valid data is included in the write data request set, and deleting the logical address to be written and the write data request set according to the bitmap of the write data request set.
  • the data node sends the logical address of the hard disk of the write data request set to the solid state hard disk, and the block corresponding to the logical address of the hard disk is marked as invalid by the solid state hard disk.
  • subsequent SSDs perform internal garbage collection, they can be directly erased, eliminating the need to copy valid data again, reducing the number of write-ups in the SSD.
  • system garbage collection provided by the present application is that each data node splicing the valid data into other local write data request sets, so the data node independently completes the system garbage collection, and does not need to perform data interaction with other data nodes, thereby saving The bandwidth between data nodes.
  • a fifth aspect of the present application provides a data verification method, which is applied to a storage system implemented in any one of the fourth aspect or the fourth aspect.
  • a sixth aspect of the present application provides a storage system.
  • the storage system includes a host, a check computing node, and a plurality of data nodes. Each data node has a unique identity.
  • the host sends the generated multiple write data requests to the check computing node.
  • Each set of write data requests includes one or more write data requests having the identity of the same data node.
  • Each write data request includes data, a logical address to which the data is to be written, and an identification of the data node to which the data is to be written.
  • the data node to which the data is to be written is selected by the host according to the data or the logical address.
  • the data node to which the data is to be written may also not be selected by the host, but is directly specified by the user when the write data request is triggered in the host.
  • the check computing node is configured to divide the plurality of write data requests into a plurality of write data request sets according to the identifier of the data node included in the write data request.
  • the check calculation node calculates the set number of write data request sets Verify the data.
  • the set number of write data request sets is a subset of the plurality of write data request sets.
  • the check computing node sends each set of write data requests in the set number of write data request sets to the data node indicated by the identifier of the data node included in the write data request set. The check computing node saves the check data.
  • the verification computing node divides the plurality of write data requests into a plurality of write data request sets according to the identifier of the data node included in the write data request.
  • the check calculation node calculates the check data of the set number of write data request sets, and sends the check data to the check data. Verify node storage, thereby ensuring data reliability.
  • each write data request set includes a plurality of write data requests to be written to the same data node, and each write data request is to be written to the data node, the host is based on the data in the write data request or The logical address to which the data is to be written is selected. Therefore, each write data request set holds the same type of write data request. After the host calculates the verification data, each write data request set is sent to the write data request set. The data nodes indicated by the included identifiers, the same type of write data requests are stored in the same data node.
  • the check computing node is further configured to allocate an identifier for each write data request in the set number of write data request sets, and each The identifier of the write data request set is sent to the data node of the data node included in the write data request set that identifies the indication.
  • the data node is configured to receive a write data request set and an identifier of the write data request set.
  • the data node is also used to create and save metadata.
  • the metadata includes a correspondence between an identifier of the write data request set and a logical address to be written in each write data request, and a logical address to be written in each write data request.
  • the correspondence between internal offsets since the same type of write data request is stored as a write data request set in a data node, the metadata associated with the write data request set is also stored locally in the data node. If the garbage data recovery operation is performed on the write data request set, the metadata can be modified locally.
  • the storage system further includes a metadata check calculation node and a metadata check node.
  • the data node is further configured to send the metadata set and the identifier of the data node to the metadata check computing node when determining that the accumulated metadata reaches the preset data amount.
  • the metadata set includes the accumulated metadata that reaches the preset amount of data.
  • the metadata check computing node is configured to receive a metadata set sent by each of the plurality of data nodes and an identifier of the data node, and save a correspondence between each metadata set and an identifier of the data node. And, a set number of metadata sets are selected from the received plurality of metadata sets according to the correspondence.
  • the selected set data amount metadata sets correspond to identifiers of different data nodes.
  • the metadata check calculation node is further configured to calculate check data of the selected set number of metadata sets. Then, the metadata check calculation node sends the check data of the set number of metadata sets to the metadata check node.
  • the metadata check node is different from storing the set number of metadata sets The data node where each metadata collection is located.
  • the storage system further includes a garbage collection node.
  • the garbage collection node is used to perform system garbage collection operations. Specifically, the garbage collection node performs a correspondence between the stripe identifier sent by the check computing node and an identifier of each write data request set included in the stripe, and a bit of the write data request set sent by the data node.
  • the graph selects the stripe containing the most invalid data from multiple strips.
  • the stripe in the present application includes a set number of write data request sets and check data calculated based on the write data request sets.
  • the bitmap in this application is used to indicate the amount of data of invalid data contained in the write data request set.
  • the garbage collection node uses the stripe containing the most invalid data as the stripe to be reclaimed, and sends a garbage collection notification message to each of the data nodes where the set of write data requests included in the stripe to be reclaimed.
  • Each garbage collection notification message is used to notify the data node to perform system garbage collection, and each garbage collection notification message includes an identifier of the write data request set. Since the garbage collection unit selects the stripe containing invalid data when performing the system garbage collection operation, the efficiency of system garbage collection is improved.
  • the data node after receiving the garbage collection notification message, according to the identifier of the write data request set and the saved bitmap of the write data request set Performing a system garbage collection operation on the set of write data requests. Specifically, the data node determines the set of write data requests to be reclaimed according to the identifier of the set of write data requests. And determining, by the data node, a logical address to be written, where the valid data is included in the write data request set, and deleting the logical address to be written and the write data request set according to the bitmap of the write data request set.
  • the data node sends the logical address of the hard disk of the write data request set to the solid state hard disk, and the block corresponding to the logical address of the hard disk is marked as invalid by the solid state hard disk.
  • These blocks can be directly erased when subsequent SSDs perform internal garbage collection, eliminating the need to copy valid data again, reducing solid state The number of times the hard disk is internally written and enlarged.
  • system garbage collection provided by the present application is that each data node splicing the valid data into other local write data request sets, so the data node independently completes the system garbage collection, and does not need to perform data interaction with other data nodes, thereby saving The bandwidth between data nodes.
  • a seventh aspect of the present application provides a data verification method, which is applied to a storage system implemented in any one of the seventh aspect or the seventh aspect.
  • FIG. 1 is a structural diagram of a storage system according to an embodiment of the present invention.
  • FIG. 2 is a structural diagram of a host provided by an embodiment of the present invention.
  • FIG. 3 is a structural diagram of a flash memory array according to an embodiment of the present invention.
  • FIG. 4 is a schematic flowchart of a data verification method according to an embodiment of the present invention.
  • FIG. 5 is a schematic flowchart diagram of another data verification method according to an embodiment of the present disclosure.
  • FIG. 6 is a schematic flowchart of still another data verification method according to an embodiment of the present invention.
  • FIG. 7 is a schematic flowchart diagram of a system garbage collection method according to an embodiment of the present invention.
  • the embodiment of the invention provides a data verification method and a storage system, which can store the same type of data in the same storage node on the basis of ensuring the reliability of the data, and only need to read the data in the same storage node. Read within a storage node to improve the efficiency of reading data.
  • FIG. 1 depicts a composition diagram of a storage system 10 provided by an embodiment of the present invention.
  • the storage system 10 illustrated in FIG. 1 includes a host 11 and a plurality of storage nodes 22.
  • FIG. 1 is only an exemplary description, and is not limited to a specific networking manner, such as a cascading tree network or a ring network. As long as the host 11 and the storage node 22 can communicate with each other.
  • Host 11 can include any computing device such as a server, desktop computer, and the like.
  • the user can trigger a read/write instruction by the host 11, and send a write data request or a read data request to the storage node 22.
  • the host 11 can communicate with any one of the storage nodes 22, and any two storage nodes 22 can also communicate with each other.
  • the host 11 mainly includes a processor 118, a cache 120, a memory 122, a communication bus 126, and a communication interface. ) 128.
  • the processor 118, the cache 120, the memory 122, and the communication interface 128 complete communication with one another via the communication bus 126.
  • Communication interface 128 is for communicating with storage node 22.
  • the memory 122 is used to store the program 124.
  • the memory 122 may include a high speed RAM memory, and may also include a non-volatile memory, such as at least one disk memory. It can be understood that the memory 122 can be a random memory (English: Random-Access Memory, RAM), a magnetic disk, a hard disk, a solid state disk (SSD), or a non-volatile memory. A machine readable medium of code.
  • Program 124 can include program code.
  • the cache 120 (English: cache) is used to buffer data received from the application server 10 or data read from the storage node 22.
  • the cache 120 may be a machine readable medium that can store data, such as a RAM, a ROM, a flash memory, or a solid state disk (SSD), which is not limited herein.
  • memory 122 and the cache 120 may be provided in combination or separately, which is not limited by the embodiment of the present invention.
  • the processor 118 may be a central processing unit (CPU) for generating and processing write data requests and the like.
  • CPU central processing unit
  • Storage node 22 can be a disk array or a flash array or a storage server.
  • a disk array refers to a storage device that includes at least one controller and multiple disks.
  • a flash array refers to a storage device including at least one controller and a plurality of solid state devices (SSDs).
  • a solid state drive is a flash memory (English: flash memory) chip as a storage medium, also known as a solid state drive (English: solid state drive, SSD).
  • the storage node 22 can also be a storage device.
  • a server that includes at least one solid state drive.
  • the storage node 22 includes a controller 221 and one or more solid state disks 222.
  • the controller 221 includes at least an interface 223, an interface 224, a processor 225, and a memory 226.
  • the interface 223 is configured to communicate with the host 11.
  • the interface 224 is configured to communicate with the solid state hard disk 222.
  • the processor 225 may be a central processing unit (CPU).
  • the processor 225 is configured to receive a write data request or read data request from the host 11, process the write data request or read the data request, and the processor 225 may further send the data in the write data request to the solid state hard disk 222.
  • the memory 226 is configured to store a program. In addition, the memory 226 is also used to temporarily store write data requests received from the host 11 or data read from the solid state hard disk 222. When the controller 221 receives a plurality of write data requests sent by the host, the plurality of write data requests may be temporarily saved in the memory 226. When the size of the plurality of write data requests reaches the preset data amount, the plurality of write data requests stored in the memory 226 are sent to the check computing node.
  • the memory 226 includes a random access memory (English: random-access memory, RAM).
  • the memory 226 further includes a non-volatile memory, such as at least one magnetic memory. It can be understood that the memory 226 can be a random memory (RAM), a magnetic disk, a hard disk, a solid state disk (SSD), or a non-volatile memory, and the like, which can store program code. Read the media.
  • a storage node for storing a write data request sent by the host 11 is referred to as a data node
  • a storage node for calculating verification data of a plurality of write data requests is referred to as a school.
  • the calculation node, the storage node for storing the verification data of the plurality of write data requests is called a check node
  • the storage node for calculating the check data of the metadata is called a metadata check calculation node, and is used for
  • the node that performs system garbage collection is called a garbage collection node.
  • this division is not absolute.
  • a check node that stores check data can also store a write data request as a data node.
  • the application provides at least two application scenarios.
  • the host 11 has a function of calculating verification data of multiple write data requests. For example, the host 11 divides the plurality of write data requests into a plurality of write data request sets according to the identifier of the data node to be written by the write data request, each set including the same data node to be written. Write a data request. When the size of all write data requests in some or all of the sets reaches a preset amount of data, the host 11 calculates the check data of these write data request sets.
  • the host 11 does not have the function of calculating check data for multiple write data requests, and the operation of calculating the check data is performed by the check compute node in the storage node.
  • the host 11 sends each write data request to the data node according to the identifier of the data node to be written by the data carried by each write data request.
  • Each data node receives a plurality of write data requests sent by the host 11, and when the size of the write data request set (the write data request set includes a plurality of write data requests) reaches a preset data amount, the data node sets the write data request Sended to the check computing node, the check computing node calculates the check data of the plurality of write data request sets.
  • step S101 - step S104 may be performed by the processor 118 of the host 11.
  • Step S101 The host 11 generates a plurality of write data requests, each write data request including data, an identifier of the data node to which the data is to be written, and a logical address to be written by the data.
  • the logical address to be written by the data includes the identifier of the logical unit (English: Logic Unit), the logical block address (English: Logical Block Address, LBA), and the length (English: length).
  • the identifier of the logical unit is used to indicate the logical unit to which the data is to be written
  • the logical block address is used to indicate the location of the data in the logical unit
  • the length represents the size of the data.
  • the identifier of the data node to which the data is to be written is used to indicate the data node to which the data is to be written.
  • the data node to which the data is to be written is selected by the host 11 from a plurality of data nodes in the storage system 10 according to the data or a logical address to be written by the data.
  • the host 11 collects information of the data nodes in the storage system 10 in advance.
  • the information of the data node includes the number of data nodes included in the storage system 10, and the identity of each data node.
  • the identity of the data node is used to uniquely identify the data node.
  • the storage system 10 includes five data nodes, each of which is identified by A, B, C, D, and E.
  • the host 11 may send a query request to each data node to obtain its identifier, or each data node may actively report its identifier to the host 11, and may also specify a master node in the storage system 10, and the master node summarizes each data. After the identifier of the node, the identifier of each data node is reported to the host 11.
  • the host 11 determines the data node to which the data is to be written.
  • One implementation manner is that the host 11 determines the data node to be written by the data according to the logical address to be written by the data. For example, the host 11 takes the address of the data node to be written by each write data request as an input, and obtains a hash value by using a preset hash algorithm, and the hash value uniquely corresponds to the identifier of one data node. .
  • the host 11 can also determine the identity of the data node by using a hash algorithm or a remainder. This embodiment does not arbitrarily limit the algorithm for determining the identity of the data node, as long as the identifier of one data node can be uniquely determined according to the logical address.
  • the host 11 determines the data node to which the data is to be written based on the data. For example, the host 11 takes the data as an input, and obtains a hash value by using a preset hash algorithm, and the hash value uniquely corresponds to the identifier of one data node. Similarly, the host 11 can also determine the identity of the data node by using a hash algorithm or a remainder. In another implementation manner, the host 11 classifies a plurality of write data requests according to the user that triggers the write data request, and selects the same data node for the write data request triggered by the same user. Alternatively, the host 11 may classify these write data requests according to the type of data, and select the same data node for the same type of data.
  • the host 11 After the host 11 selects the data node for the data, the host 11 writes the identity of the data node into the write data request carrying the data.
  • the identifier of the data node may be located at the head of the write data request, or may be the payload of the write data request (English: payload).
  • the user When the user triggers multiple write data requests in the host 11, the user sends an instruction to the host 11 to request this. Some write data requests are written to the same data node.
  • the host 11 can select a data node for these write data requests according to the needs of the user, and carry this data node in each write data request.
  • the write data request having the same identifier belongs to the write data of the same category.
  • the request, the data carried by these write data requests is more likely to be read simultaneously.
  • Step S102 The host divides the multiple write data requests into multiple write data request sets according to the identifier of the data node included in the write data request, and each set includes multiple write data requests having the same data node identifier. .
  • write data requests for classification. For example, a plurality of write data requests in a write data request set each include an identifier of the data node A, and a plurality of write data requests in another write data request set each include an identifier of the data node B, and another write data request set Multiple write data requests each contain the identity of data node C.
  • Step S103 When the size of all the write data requests in each of the set of write data request sets in the set number of write data request sets reaches a preset data amount, the host 11 calculates the set of the set number of write data request sets. Test the data.
  • the storage system 10 includes a number of data nodes, and accordingly, there are also a number of write data request sets in the host 11. As the number of generated write data requests continues to increase, the size of each write data request set continues to accumulate.
  • the host 11 needs to select a set number of write data request sets from a set of write data requests whose size reaches a preset data amount, and calculate check data of the set number of write data request sets.
  • the preset data amount is preset, for example, 16 KB.
  • the set number is determined by a verification mode preset by the storage system 10.
  • the preset calibration mode includes 5+1 mode, or 6+2 mode, and the like.
  • 5+1 mode means A check data is generated according to the 5 write data request sets;
  • the 6+2 mode refers to generating 2 check data according to the 6 write data request sets.
  • the host 11 selects five write data request sets from a plurality of write data request sets filled with preset data amounts, and then calculates check data of the five write data request sets.
  • the five write data request sets and the check data form a stripe (English: stripe).
  • a set of write data requests whose size reaches a preset amount of data is a data unit of the stripe.
  • Step S104 The host 11 sends each set of write data requests in the set number of write data request sets to the data node indicated by the identifier included in the write data request set.
  • each write data request set contains the identifier of the same data node
  • the host can send each write data request set to the identifier indicated by the identifier according to the identifier. Data node.
  • Step S105 the host 11 selects a storage node from the storage system 10 for storing the verification data, and sends the verification data to the selected storage node, where the storage node for storing the verification data is
  • the data nodes in each of the write data request sets that make up this stripe are different.
  • a storage node that stores verification data is referred to as a check node.
  • the host 11 may select one or more of the remaining storage nodes (excluding the data nodes for holding the set of write data requests) for saving the check data.
  • the write data request set and the check data cannot be saved in the same storage node. This is to prevent the write data request set and the check data from being lost at the same time when a storage node fails.
  • the selected storage node for saving the check data may not be the storage node that holds the stripe write data request set, but may be a write data request set that holds other strips.
  • Storage node This embodiment does not limit a storage node to store a write data request set or check data.
  • the host 11 assigns a stripe identifier to each stripe, and different strips have different stripe identifiers. And, the host gives each write data request set and check data distribution included in each stripe And identifying, sending an identifier of the write data request set to the data node where the write data request set is located. For each stripe, the host 11 records the correspondence between the stripe identifier and the identifier of each write data request set included in the stripe, and the correspondence between the stripe identifier and the identifier of the check data.
  • the host 11 divides the plurality of write data requests into a plurality of write data request sets in accordance with the identifier of the data node included in the write data request.
  • the host 11 calculates the check data of the set number of write data request sets, and sends the check data to the checksum. Node storage, thereby ensuring the reliability of the data.
  • each write data request set includes a plurality of write data requests to be written to the same data node, and each write data request is to be written to the data node, the host is based on the data in the write data request or The logical address to which the data is to be written is selected. Therefore, each write data request set holds the same type of write data request.
  • each write data request set is sent to the data node indicated by the identifier contained in the write data request set, and the same type of write data request is stored in the same data node. Since the data in the same type of write data request is more likely to be read at the same time, the data can be read in one data node when the data is read, and the read operation is not performed across nodes, thereby improving data reading. s efficiency.
  • the verification computing node requests to include the data according to the write data.
  • the identifier of the data node divides the received multiple write data requests into a plurality of write data request sets, and when the set number of write data request sets reaches a preset data amount, the check data is calculated.
  • Step S401 This step is similar to step S101 shown in FIG. 4, and details are not described herein again.
  • Step S402 The host 11 sends the generated write data request to the check computing node.
  • Step S403 the check computing node according to the label of the data node included in the write data request
  • the plurality of write data requests are divided into a plurality of write data request sets, each set including a plurality of write data requests having the same data node identifier.
  • the difference from FIG. 4 is that the execution body of the step is a check calculation node, and the rest is similar to step S102 shown in FIG. 4, and details are not described herein again.
  • Step S404 When the size of all the write data requests in each of the set of write data request sets in the set number of write data request sets reaches a preset data amount, the check calculation node calculates the set number of write data. Request verification data for the collection.
  • the difference from FIG. 4 is that the execution body of the step is a check computing node, and the rest is similar to step S103 shown in FIG. 4, and details are not described herein again.
  • Step S405 The check computing node sends each set of write data requests in the set number of write data request sets to the data node indicated by the identifier included in the write data request set.
  • the difference from FIG. 4 is that the execution body of the step is a check computing node, and the rest is similar to step S104 shown in FIG. 4, and details are not described herein again.
  • the check calculation node After the check calculation node calculates the check data, the check data may be directly saved locally, and is not forwarded to other storage nodes for storage. If more than one of the calculated verification data is calculated according to a preset verification mode, the verification computing node may select one storage node from the storage system 10 for storing another verification data. The selection manner is similar to the step S105 shown in FIG. 4, and details are not described herein again.
  • 6 is a flowchart of a data verification method applied to a second scenario. As shown in FIG. 6, the method includes the following steps:
  • Step S201 The host 11 generates a plurality of write data requests, each write data request including data, an identifier of the data node to which the data is to be written, and a logical address to be written by the data.
  • the content of step S201 is similar to step S101 in the method shown in FIG. 4, and details are not described herein again.
  • Step S202 The host 11 sends each write data request to the data node indicated by the identifier of the data node to which the data included in the write data request is to be written.
  • Step S203 The data node receives a plurality of write data requests sent by the host 11, and writes the write write data request into the memory 226. Specifically, this step can be performed by the processor 225 in the controller 221.
  • Step S204 When the size of all the write data requests held in the memory 226 of the data node reaches the preset data amount, the data node sends the write data request set to the check computing node.
  • the write data request set includes a plurality of write data requests, and the size of the plurality of write data requests reaches a preset data amount.
  • this step can be performed by the processor 225 in the controller 221.
  • the processor 225 in the controller 221.
  • the write data request is written to memory 226. Therefore, the memory 226 gradually accumulates the write data request.
  • the data node sends the accumulated write data requests as a set to the check computing node.
  • the preset data amount is preset by the storage system 10, for example, 16 KB.
  • the data node For any one of the data nodes, as long as the size of all write data requests in the memory 226 reaches the preset data amount, all write data requests in the memory 226 are sent to the check computing node without considering other data nodes. Whether the write data request accumulated in the memory reaches the preset data amount. In practical applications, the data node may require a time limit for accumulating write data requests. If the size of all write data requests in the memory 226 has not reached the preset data amount when the preset time limit is reached, then 0 may be used. Or other special markup data to make up.
  • each data node 22 first fills up a set of write data requests that reach a preset amount of data locally, and then these The write data request set is sent to the check compute node.
  • the verification computing node calculates the verification data according to the received plurality of write data request sets to form a stripe.
  • Step S205 The verification computing node receives a plurality of write data request sets, and selects a set number of write data request sets from the plurality of write data request sets, where the set number of write data request sets are the plurality of Writing a subset of the data request set, and the set number of write data request sets includes identifiers of different data nodes, and calculating check data of the set number of write data request sets.
  • storage node C can communicate with any one of storage nodes 22 in storage system 10. After storage node C receives a plurality of write data request sets, And selecting a set number of write data request sets from the plurality of write data request sets according to a preset verification mode, and calculating verification data of the set number of write data request sets. Taking the 5+1 mode as an example, when the storage node C selects five write data request sets from different data nodes from the received plurality of write data request sets, the check data of the five write data request sets is calculated. In some cases, the set of write data requests saved locally by the storage node C may also participate in the calculation of the check data, and only need to receive the set of write data requests sent by the remaining four data nodes.
  • the write data request set is saved, an identifier is assigned to the write data request set, and then the identifier of the write data request set is written into a linked list or other data structure.
  • the identifier of the write data request set may be the same as the identifier of the data node included in the write data request in the write data request set, or may be different from the identifier of the data node included in the write data request in the write data request set. .
  • the check computing node saves between the identifier of the write data request set and the identifier of the data node Correspondence.
  • the identifier of the write data request set is allocated by the check computing node, and the identifier of the write data request set is sent to the write data request set after verifying the compute node allocation identifier. a data node that holds an identification of the set of write data requests.
  • the identifier of the write data request set is allocated by the data node where the write data request set is located, and the data node assigns the identifier to send the identifier of the write data request set to the identifier.
  • a compute node is verified, the check compute node maintaining an identification of the set of write data requests.
  • the verification computing node In order to ensure the reliability of the data, when the verification computing node selects a certain number of write data request sets to form a stripe, it is necessary to ensure that a plurality of write data request sets constituting the same stripe are from different data nodes. Taking the 5+1 mode as an example, the verification computing node needs to select five write data request sets from the linked list. Since the verification computing node may receive multiple sets of write data requests within a certain period of time, some of the write data request sets may be from the same data node. Therefore, the check compute node also needs to ensure that the five write data request sets come from different data nodes.
  • the computing node may select a correspondence between the identifier of the set and the identifier of the data node according to the saved write data, or directly select five write data request sets from the linked list according to the identifier of the write data request set. It should be noted that, when the verification computing node selects the write data request set that constitutes the stripe, it does not need to consider the order in which each write data request set enters the linked list, and only needs to ensure multiple write data constituting the stripe. The request set comes from a different data node.
  • Step S206 The verification computing node sends the verification data to the verification node, and the verification node cannot be the same as the data node that saves the write data request set.
  • the verification computing node may select one or more of the remaining storage nodes (excluding the data nodes that have saved the write data request set) for saving the verification data.
  • a storage node for storing check data is referred to as a check node in this embodiment.
  • the check node and any one of the data nodes in which the set number of write data request sets are located cannot be repeated. This is to prevent the loss of a write data request set and the check data contained in one stripe when a storage node fails.
  • the selected storage node for saving the verification data cannot be a data node that holds the current data packet of the write data request, it may be a data node that holds a set of write data requests with other stripes. This embodiment does not limit a storage node to store a write data request set or check data.
  • the verification calculation node assigns a stripe identifier to each stripe, and different strips have different stripe identifiers.
  • the host sends each of the write data request sets and the check data allocation identifiers included in each stripe, and sends the identifier of the write data request set to the data node where the write data request set is located.
  • each data node assigns an identifier to the saved write data set, and sends the identifier of the write data request set to the check computing node. For each stripe, the correspondence between the check node identifier and the identifier of each write data request set included in the stripe and the correspondence between the stripe identifier and the identifier of the check data are verified. relationship.
  • each data node sends the write data request as a write data request set to the check computing node when the accumulated size of all the write data requests reaches the preset data amount.
  • the verification computing node selects the set number from the received plurality of write data request sets
  • the quantity write data request set calculates the check data of the set number of write data request sets, and sends the check data to the check node storage, thereby ensuring the reliability of the data.
  • each write data request set includes a plurality of write data requests having the same data node identifier, and each write data request is to be written to the data node, the host is based on the data in the write data request or The logical address to which the data is to be written is selected.
  • each write data request set holds the same type of write data request. Since the data in the same type of write data request is more likely to be read at the same time, the data can be read in one data node when the data is read, and the read operation is not performed across nodes, thereby improving data reading. s efficiency.
  • the data verification method shown in FIG. 4 or FIG. 5 or FIG. 6 implements writing the same type of write data request to the same data node. After the write data request is written to the data node, the data node also needs to create and save the metadata.
  • the metadata includes a correspondence between an identifier of the write data request set and a logical address to be written in each write data request.
  • the metadata also includes the location of each write data request in the set of write data requests. The position in which each write data request is located in the write data request set in this embodiment is referred to as an internal offset.
  • the correspondence between the write data request set and the logical address to which the data of each write data request to be written is to be written is as shown in Table 1:
  • the write data request set identified as "1" consists of three write data requests, each of which includes a logical address to be written in addition to the data, and each write data request. There is a corresponding location inside the write data request collection.
  • the logical address of the write data request includes a volume ID, LBA, and length, where the volume ID is 0x1, the LBA is 0x100, and the length is 4096.
  • the write data request belongs to the identifier "1"
  • the write data request set, and the location in the write data request set is 0x1000.
  • the host If the plurality of write data requests are divided into a plurality of data units by the host, after the check data of the plurality of data units is calculated, the respective data units and the check data are sent to a plurality of storage nodes in the system for storage. Then, the metadata (the correspondence between the write data request and the data unit) is saved in the host. When the data unit is garbage collected in a storage node, the message modification metadata needs to be sent to the host, which is complicated and consumes bandwidth.
  • the metadata associated with the write data request set is also stored locally in the data node. If the garbage data recovery operation is performed on the write data request set, the metadata can be modified locally.
  • the metadata can be saved in the memory 226 of the data node. In order to ensure the reliability of the metadata, it is also necessary to calculate and save the verification data of the metadata in each data node. Specifically, as the number of write data requests received by the data node increases, the metadata stored in the memory 226 also increases. When the metadata accumulated in the memory 226 reaches a preset amount of data, the data node transmits the metadata set to the metadata check computing node.
  • a metadata collection refers to metadata whose size reaches a preset amount of data. The preset amount of data may be preset by the storage system 10, for example, 16 KB.
  • the metadata check calculation node refers to a storage node in the storage node for calculating the check data of the metadata set.
  • the data node 22 in addition to transmitting the metadata set to the metadata check computing node, the data node 22 needs to send the identifier of the data node to the metadata check computing node.
  • the metadata set and the identifier may be encapsulated in one message or sent separately.
  • the metadata check calculation node receives the metadata set and the identifier of the data node sent by each data node, assigns an identifier to each metadata set, and saves a correspondence between the identifier of the metadata set and the identifier of the data node.
  • the metadata check calculation node calculates the check data of the metadata set. Specifically, the metadata check calculation node selects settings from the plurality of metadata sets according to the correspondence between the identifier of the metadata set and the identifier of the data node. A quantity metadata set, and the verification data of the selected set number of metadata sets is calculated. The selected metadata set needs to correspond to the identifiers of different data nodes.
  • the so-called set number is determined by the check mode set in advance by the storage system 10. It should be noted that the verification mode of the metadata set may be the same as or different from the verification mode of the write data request set.
  • the metadata check computing node may select a storage node in the storage system 10 as a metadata check node.
  • the check data of the metadata set is sent to the metadata check node.
  • the selected metadata check node is different from the data node used to store the metadata set.
  • the selected metadata check node may be a data node that holds other stripped metadata sets. This embodiment does not limit the verification data that a certain data node is dedicated to storing a metadata set or a metadata set.
  • the controller 221 in the data node may allocate a logical address (referred to as a hard disk logical address in the present embodiment) written to the solid state hard disk to the set for each set of write data request, and save the write data request.
  • a logical address referred to as a hard disk logical address in the present embodiment
  • the solid state hard disk 222 receives the logical address of the write data request set sent by the controller 221 and the write solid state hard disk allocated for the set, and writes the write data request set into one or more blocks (English: block). .
  • the solid state hard disk 222 stores the correspondence between the logical address and the actual address of the write solid state hard disk allocated for the set.
  • the actual address refers to a physical address that can be the data in the SSD, or an address that is virtualized on the basis of the physical address and is only visible to the SSD.
  • part of the data in the write data request set may become invalid data.
  • the validity of the data is determined by whether the data is modified. If the data is written for the first time, the data can be recorded as valid (referred to as valid data). If the data is modified, the data before the modification is recorded as none Effect (called invalid data).
  • invalid data Taking Table 1 as an example, when the data node where the write data request set with the identifier 1 is received receives the fourth write data request, and the logical address to be written in the fourth write data request and the first write data The logical address of the data in the request to be written is the same, then the fourth write data request is used to cover the first write data request.
  • the data (new data) in the fourth write data request does not directly cover the data (old data) of the first write data request, but allocates a blank block on the SSD. Write the new data.
  • the data in the first write data request becomes invalid data. It also means that part of the data in the set of write data requests identified as 1 becomes invalid data.
  • the data node may record, by the bitmap, information of invalid data included in each write data request set, the information of the invalid data including the logical address of the invalid data to be written and the data amount of the invalid data.
  • each "bit" of the bitmap corresponds to a logical address to be written with a size of 1 KB of data. When "bit" is 1, it represents that the data stored in the logical address is valid, when "bit" is 0, The data stored in the logical address is invalid.
  • FIG. 7 is a schematic flowchart of a system garbage collection method, as shown in FIG. 7, which may include the following steps:
  • the garbage collection node selects a stripe to be recycled from the plurality of strips according to the bitmap of the write data request set included in each stripe.
  • the host 11 sends the stripe identifier and the correspondence between the stripe identifier and the identifier of each write data request set included in the stripe to the garbage collection node.
  • the verification computing node sends the correspondence between the stripe identifier and the identifier of the stripe identifier and the identifier of each write data request set included in the stripe to the garbage collection node.
  • the data node needs to write data.
  • the bitmap of the request collection is sent to the garbage collection node.
  • the garbage collection node needs to select one of the multiple strips to be recycled. In order to maximize the recycling efficiency of the system garbage collection operation, the garbage collection node usually selects the stripe containing the most invalid data as the stripe to be recycled. Therefore, it is necessary to count the data amount of the invalid data contained in each stripe. Specifically, the garbage collection node may determine, according to the correspondence between the stripe identifier and the identifier of each write data request set included in the stripe, all the write data request sets included in the stripe, and then according to each write data. The bitmap of the request set determines the amount of data of invalid data contained in each write data request set, thereby confirming the amount of invalid data contained in the stripe.
  • the garbage collection node can use the stripe containing the most invalid data as the stripe to be recycled.
  • the strips to be recycled may be selected according to other conditions, for example, the strips of the system garbage collection operation are not performed for a long time, or the system garbage collection is performed on each stripe in a preset order.
  • the garbage collection node sends a garbage collection notification message to the data node where each data request set is written.
  • the garbage collection node may determine the write data request included in each stripe according to the correspondence between the stripe identifier and the identifier of the stripe identifier and each write data request set included in the stripe. The identity of the collection. Then, according to the correspondence between the identifier of the write data request set and the identifier of the data node, determining the data node where each write data request set is located (if the identifier of the write data request set is consistent with the identifier of the data node, the direct The identifier of the write data request set determines a data node where each write data request set is located, thereby sending a garbage collection notification message to the data node where each write data request set is located, the message being used to notify the data node of the write data Request a collection for system garbage collection operations.
  • the message includes an identification of the set of write data requests.
  • Each data node receives the garbage collection notification message, and determines a write data request set to be recovered according to the identifier of the write data request set carried in the message. Then, each data node determines the valid data and the invalid data included in the write data request set to be recovered according to the bitmap of the write data request set to be recovered. For example, the first data in Table 1 to write data requests It became invalid data. The second data written by the data request and the third data written by the data request are still valid data.
  • Each data node migrates the valid data in the set of write data requests to be reclaimed into a new set of write data requests, and reclaims the set of write data requests.
  • Migrating valid data to a new set of write data requests means that the write data requests in which the valid data resides are pieced together in a new set of write data requests.
  • the second write data request and the third write data request in Table 1 are read from the SSD 222 into the memory 226 (if the second write data request or the third write data request is still stored in the memory 226) , this operation is not required.
  • the size of all the write data requests stored in the memory 226 reaches the preset data amount, a new write data request set is filled.
  • the data node determines, according to the bitmap of the set of write data requests to be recovered, the logical address to be written in the write data request set (the logical address herein refers to the logic to be written) Addressing, deleting a correspondence between a logical address to be written of the valid data and an identifier of the write data request set, and saving an identifier of the valid data to be written and an identifier of the new write data request set Correspondence between them. Since these metadata are stored locally on the data node, the data node can modify the metadata directly after performing system garbage collection without cross-node operations.
  • the data node sends the logical address of the hard disk of the set of write data requests to be recovered to the solid state hard disk, and the block corresponding to the logical address of the hard disk is marked as invalid by the solid state hard disk.
  • subsequent SSDs perform internal garbage collection, they can be directly erased, eliminating the need to copy valid data again, reducing the number of write-ups in the SSD.
  • the data node also needs to delete the correspondence between the identifier of the write data request set to be recovered and the assigned hard disk logical address. When all the write data request sets included in one strip are reclaimed, the check data contained in the stripe does not have any meaning, and the garbage collection node can notify the check node to delete the check data.
  • the system garbage collection provided in this embodiment is that each data node splicing the valid data into other local write data request sets, so the data node independently completes the system garbage collection, and does not need to perform data interaction with other data nodes, thereby saving The bandwidth between data nodes.
  • aspects of the present invention, or possible implementations of various aspects may be embodied as a system, method, or computer program product.
  • aspects of the invention, or possible implementations of various aspects may be in the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, etc.), or a combination of software and hardware aspects, They are collectively referred to herein as "circuits," “modules,” or “systems.”
  • aspects of the invention, or possible implementations of various aspects may take the form of a computer program product, which is a computer readable program code stored in a computer readable medium.
  • Computer readable media include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing, such as random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM), optical disc.
  • RAM random access memory
  • ROM read only memory
  • EPROM erasable Programmable read only memory
  • the processor in the computer reads the computer readable program code stored in the computer readable medium such that the processor can perform the functional actions specified in each step or combination of steps in the flowchart.
  • the computer readable program code can execute entirely on the user's computer, partly on the user's computer, as a separate software package, partly on the user's computer and partly on the remote computer, or entirely on the remote computer or server.
  • the functions noted in the various steps in the flowcharts or in the blocks in the block diagrams may not occur in the order noted. For example, two steps, or two blocks, shown in succession may be executed substantially concurrently or the blocks may be executed in the reverse order.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Memory System (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)
  • Detection And Correction Of Errors (AREA)

Abstract

一种存储系统包括主机、校验节点和多个数据节点。主机用于根据写数据请求中包括的数据节点的标识,将多个写数据请求划分为多个写数据请求集合,每个写数据请求集合包括具有相同的数据节点的标识的多个写数据请求。当设定数量个写数据请求集合中的每个写数据请求集合中的所有写数据请求的大小达到预设数据量时,计算所述设定数量个写数据请求集合的校验数据。主机还用于将每个写数据请求集合发送给标识所指示的数据节点,将所述校验数据发送给所述校验节点。能够在保证数据的可靠性的基础上,将相同类别的数据存储在同一个存储节点中,在读取这些数据时只需在一个存储节点内读取,从而提高读取数据的效率。

Description

一种数据校验的方法及存储系统 技术领域
本发明实施例涉及存储技术领域,特别是一种数据校验的方法及存储系统。
背景技术
在包含多个存储节点的分布式系统中,主机将生成的多个写数据请求发送给多个存储节点,每个存储节点各自存储一些写数据请求。写数据请求中包括数据以及所述数据待写入存储节点的逻辑地址(以下简称为逻辑地址)。为了防止某个存储节点发生故障导致其存储的写数据请求丢失,所述主机会计算所述生成的多个写数据请求的校验数据,并将校验数据发送给一个或多个存储节点存储。为了计算校验数据,主机将达到设定大小的多个写数据请求按照预定尺寸进行划分,得到多个数据单元,再计算所述多个数据单元的校验单元。这些数据单元和校验单元组成一个分条。最后,主机将每个数据单元或者校验单元发送给一个存储节点存储。由于这些写数据请求被随机地划分在多个数据单元中,从而发送给不同的存储节点存储。因此,在读取数据的时候,往往需要从不同的存储节点收集多个写数据请求携带的数据。跨存储节点的读取操作会影响读取数据的效率。
发明内容
本申请第一方面提供了一种存储系统。存储系统包括主机、校验节点和多个数据节点。每个数据节点具有唯一的标识。所述主机用于根据写数据请求中包括的数据节点的标识,将多个写数据请求划分为多个写数据请求集合。每个写数据请求集合包括具有相同的数据节点的标识的一个或多个写数据请求。每个写数据请求包括数据、所述数据待写入的逻辑地址以及所述数据待写入的数据节点的标识。所述数据待写入的数据节点是所述主机根据所 述数据或所述逻辑地址选择的,或者,用户在主机11中触发多个写数据请求时,向主机11发送指令要求将这些写数据请求写入同一个数据节点。主机11可以根据用户的需求,为这些写数据请求选择一个数据节点,将这个数据节点携带在每个写数据请求。当设定数量个写数据请求集合中的每个写数据请求集合中的所有写数据请求的大小达到预设数据量时,所述主机计算所述设定数量个写数据请求集合的校验数据。所述设定数量个写数据请求集合是所述多个写数据请求集合的子集。所述主机将所述设定数量个写数据请求集合中的每个写数据请求集合发送给所述写数据请求集合包括的数据节点的标识所指示的数据节点,将所述校验数据发送给所述校验节点。
在本申请提供的存储系统中,主机按照写数据请求包含的数据节点的标识,将多个写数据请求划分为多个写数据请求集合。当设定数量个写数据请求集合中所有写数据请求的大小达到预设数据量时,主机计算所述设定数量个写数据请求集合的校验数据,并将校验数据发送给校验节点存储,由此保证了数据的可靠性。并且,由于每个写数据请求集合包含待写入相同的数据节点的多个写数据请求,而每个写数据请求待写入的数据节点是所述主机根据所述写数据请求中的数据或者所述数据待写入的逻辑地址选择的。因此,每个写数据请求集合中保存的是相同类别的写数据请求。当主机计算校验数据之后,将每个写数据请求集合发送给所述写数据请求集合包含的标识所指示的数据节点,相同类别的写数据请求就被存储在了同一个数据节点中。由于相同类别的写数据请求中的数据被同时读取的可能性较大,那么在这些数据被读取时可以在一个数据节点内读取,不用跨节点执行读取操作,提高了数据读取的效率。
结合第一方面,在第一方面的第一种实现中,所述主机还用于为所述设定数量个写数据请求集合中的每个写数据请求分配标识,并且将每个写数据请求集合的标识发送给该写数据请求集合包含的数据节点的标识所述指示的数据节点。所述数据节点用于接收写数据请求集合以及所述写数据请求集 合的标识。所述数据节点还用于创建并保存元数据。所述元数据包括所述写数据请求集合的标识与每个写数据请求中的数据待写入的逻辑地址之间的对应关系,以及每个写数据请求中的数据待写入的逻辑地址与内部偏移量之间的对应关系。在本申请中,由于相同类别的写数据请求作为一个写数据请求集合保存在一个数据节点,那么与这个写数据请求集合相关的元数据也保存在数据节点本地。后续如果对写数据请求集合进行垃圾回收等操作,可以直接在本地对元数据进行修改。
结合第一方面的第一种实现,在第一方面的第二种实现中,所述存储系统还包括元数据校验计算节点以及元数据校验节点。所述数据节点还用于在确定累积的元数据达到所述预设数据量时,将元数据集合以及所述数据节点的标识发送给所述元数据校验计算节点。所述元数据集合包括所述累积的达到所述预设数据量的元数据。所述元数据校验计算节点用于接收多个数据节点中每个数据节点发送的元数据集合以及所述数据节点的标识,保存每个元数据集合与数据节点的标识之间的对应关系。并且,根据所述对应关系从接收的多个元数据集合中选择设定数量个元数据集合。所述选择出的设定数据量个元数据集合对应不同的数据节点的标识。所述元数据校验计算节点还用于计算所述选择出的设定数量个元数据集合的校验数据。然后,所述元数据校验计算节点将所述设定数量个元数据集合的校验数据发送给所述元数据校验节点。所述元数据校验节点不同于存储所述设定数量个元数据集合中的每个元数据集合所在的数据节点。由此,本申请提供的存储系统保证了各个数据节点中存储的元数据集合的可靠性。
结合第一方面的第一种实现,在第一方面的第三种实现中,所述存储系统还包括垃圾回收节点。垃圾回收节点用于执行系统垃圾回收操作。具体的,垃圾回收节点根据所述主机发送的分条标识与所述分条包括的每个写数据请求集合的标识之间的对应关系,以及数据节点发送的写数据请求集合的位图从多个分条中选择出包含无效数据最多的分条。本申请中的分条包括设定 数量个写数据请求集合以及根据这些写数据请求集合计算出的校验数据。本申请中的位图用于指示写数据请求集合包含的无效数据的数据量。所述垃圾回收节点将所述包含无效数据最多的分条作为待回收的分条,向所述待回收的分条所包含的每个写数据请求集合所在的数据节点分别发送垃圾回收通知消息。每个垃圾回收通知消息用于通知数据节点进行系统垃圾回收,每个垃圾回收通知消息中包括写数据请求集合的标识。由于垃圾回收单元在执行系统垃圾回收操作时会选择包含无效数据的分条,因此提高了系统垃圾回收的效率。
结合第一方面的第三种实现,在第一方面的第四种实现中,数据节点在接收垃圾回收通知消息之后,根据写数据请求集合的标识以及保存的所述写数据请求集合的位图,对所述写数据请求集合执行系统垃圾回收操作。具体的,数据节点根据所述写数据请求集合的标识确定待回收的写数据请求集合。数据节点再根据所述写数据请求集合的位图确定所述写数据请求集合中包含的有效数据待写入的逻辑地址,删除所述有效数据待写入的逻辑地址与所述写数据请求集合的标识之间的对应关系,并且保存所述有效数据待写入的逻辑地址与重新拼凑的写数据请求集合的标识之间的对应关系。然后,数据节点将所述写数据请求集合的硬盘逻辑地址发送给固态硬盘,由固态硬盘将与所述硬盘逻辑地址对应的块标记为无效。当后续固态硬盘进行内部的垃圾回收时可以直接擦除这些块,无需再次进行有效数据的复制,减少了固态硬盘内部写放大的次数。另外,本申请提供的系统垃圾回收是由每个数据节点将有效数据拼凑在本地的其他写数据请求集合中,因此所述数据节点独立完成系统垃圾回收,无需和其他数据节点进行数据交互,节省了数据节点之间的带宽。
本申请第二方面提供了一种数据校验方法,应用在第一方面或者第一方面的任意一种实现的存储系统中。
本申请第三方面提供了一种主机,所述主机的功能与第一方面的第一种 实现所提供的存储系统一致。
本申请第四方面提供了一种存储系统。所述存储系统包括校验节点、校验计算节点和多个数据节点。每个数据节点,用于将写数据请求集合发送给所述校验计算节点。所述写数据请求集合包括一个或多个写数据请求,每个写数据请求包括数据以及所述数据待写入的数据节点的标识。所述写数据请求集合的大小等于预设数据量。所述校验计算节点,用于接收多个写数据请求集合,从所述多个写数据请求集合中选择设定数量个写数据请求集合。所述设定数量个写数据请求集合是所述多个写数据请求集合的子集,并且所述设定数量个写数据请求集合包括不同的数据节点的标识。然后,所述校验计算节点计算所述设定数量个写数据请求集合的校验数据,将所述校验数据发送给所述校验节点。所述校验节点不同于所述设定数量个写数据请求集合中的每个写数据请求集合所在的数据节点。
本申请提供的数据校验方法,每个数据节点在累积的所有写数据请求的大小达到预设数据量时,将这些写数据请求作为一个写数据请求集合发送给校验计算节点。校验计算节点从接收的多个写数据请求集合中选择设定数量个写数据请求集合计算所述设定数量个写数据请求集合的校验数据,并将校验数据发送给校验节点存储,由此保证了数据的可靠性。
结合第四方面,在第四方面的第一种实现中,每个写数据请求还包括所述数据待写入的逻辑地址,所述数据待写入的数据节点是所述存储系统中的主机根据所述数据或者所述数据待写入的逻辑地址选择的。或者,所述数据待写入的数据节点也可能不是所述主机选择的,而是用户在所述主机中触发写数据请求时直接指定的。每个数据节点,还用于接收所述主机发送的多个写数据请求。由于每个写数据请求集合包含具有相同的数据节点的标识的多个写数据请求,而每个写数据请求待写入的数据节点是所述主机根据所述写数据请求中的数据或者所述数据待写入的逻辑地址选择的。因此,每个写数据请求集合中保存的是相同类别的写数据请求。由于相同类别的写数据请求 中的数据被同时读取的可能性较大,那么在这些数据被读取时可以在一个数据节点内读取,不用跨节点执行读取操作,提高了数据读取的效率。
结合第四方面的第一种实现,在第四方面的第二种实现中,所述数据节点,还用于为保存的写数据请求集合分配标识,将所述写数据请求集合的标识发送给所述校验计算节点。然后,所述数据节点创建并保存元数据。所述元数据包括所述写数据请求集合的标识与每个写数据请求中的数据待写入的逻辑地址之间的对应关系,以及每个写数据请求中的数据待写入的逻辑地址与内部偏移量之间的对应关系。在本申请中,由于相同类别的写数据请求作为一个写数据请求集合保存在一个数据节点,那么与这个写数据请求集合相关的元数据也保存在数据节点本地。后续如果对写数据请求集合进行垃圾回收等操作,可以直接在本地对元数据进行修改。
结合第四方面的第二种实现,在第四方面的第三种实现中,所述存储系统还包括元数据校验计算节点以及元数据校验节点。所述数据节点,还用于在确定累积的元数据达到所述预设数据量时,将元数据集合以及数据节点的标识发送给所述元数据校验计算节点。所述元数据集合包括所述累积的达到所述预设数据量的元数据。所述元数据校验计算节点,用于接收每个数据节点发送的元数据集合以及所述数据节点的标识,保存每个元数据集合与数据节点的标识之间的对应关系。并且,所述元数据校验计算节点根据所述对应关系从接收的多个元数据集合中选择所述设定数量个元数据集合。所述设定数据量个元数据集合对应不同的数据节点的标识。所述元数据校验计算节点计算所述设定数量个元数据集合的校验数据,将所述设定数量个元数据集合的校验数据发送给所述元数据校验节点。所述元数据校验节点不同于存储所述设定数量个元数据集合中的每个元数据集合的数据节点。由此,本申请提供的存储系统保证了各个数据节点中存储的元数据集合的可靠性。
结合第四方面的第二种实现,在第四方面的第四种实现中,所述存储系统还包括垃圾回收节点。所述校验计算节点,还用于为分条分配标识。所述 分条包括所述设定数量个写数据请求集合以及所述设定数量个写数据请求集合的校验数据。然后,所述校验计算节点将所述分条标识与所述分条包括的每个写数据请求集合的标识之间的对应关系,发送给所述垃圾回收节点。每个数据节点,还用于将保存的写数据请求集合的位图发送给所述垃圾回收节点,所述位图用于指示所述写数据请求集合的无效数据的数据量。所述垃圾回收节点,用于根据所述分条标识与所述分条包括的每个写数据请求集合的标识之间的对应关系,以及每个写数据请求集合的位图从多个分条中选择包含无效数据最多的分条,向所述分条包含的每个写数据请求集合所在的数据节点分别发送垃圾回收通知消息。每个垃圾回收通知消息包括写数据请求集合的标识。由于垃圾回收单元在执行系统垃圾回收操作时会选择包含无效数据的分条,因此提高了系统垃圾回收的效率。
结合第四方面的第四种实现,在第四方面的第五种实现中,数据节点在接收垃圾回收通知消息之后,根据写数据请求集合的标识以及保存的所述写数据请求集合的位图,对所述写数据请求集合执行系统垃圾回收操作。具体的,数据节点根据所述写数据请求集合的标识确定待回收的写数据请求集合。数据节点再根据所述写数据请求集合的位图确定所述写数据请求集合中包含的有效数据待写入的逻辑地址,删除所述有效数据待写入的逻辑地址与所述写数据请求集合的标识之间的对应关系,并且保存所述有效数据待写入的逻辑地址与重新拼凑的写数据请求集合的标识之间的对应关系。然后,数据节点将所述写数据请求集合的硬盘逻辑地址发送给固态硬盘,由固态硬盘将与所述硬盘逻辑地址对应的块标记为无效。当后续固态硬盘进行内部的垃圾回收时可以直接擦除这些块,无需再次进行有效数据的复制,减少了固态硬盘内部写放大的次数。另外,本申请提供的系统垃圾回收是由每个数据节点将有效数据拼凑在本地的其他写数据请求集合中,因此所述数据节点独立完成系统垃圾回收,无需和其他数据节点进行数据交互,节省了数据节点之间的带宽。
本申请第五方面提供了一种数据校验方法,应用在第四方面或者第四方面的任意一种实现的存储系统中。
本申请第六方面提供了一种存储系统。存储系统包括主机、校验计算节点和多个数据节点。每个数据节点具有唯一的标识。所述主机将生成的多个写数据请求发送给校验计算节点。每个写数据请求集合包括具有相同的数据节点的标识的一个或多个写数据请求。每个写数据请求包括数据、所述数据待写入的逻辑地址以及所述数据待写入的数据节点的标识。所述数据待写入的数据节点是所述主机根据所述数据或所述逻辑地址选择的。或者,所述数据待写入的数据节点也可能不是所述主机选择的,而是用户在所述主机中触发写数据请求时直接指定的。所述校验计算节点用于根据写数据请求中包括的数据节点的标识,将多个写数据请求划分为多个写数据请求集合。当设定数量个写数据请求集合中的每个写数据请求集合中的所有写数据请求的大小达到预设数据量时,所述校验计算节点计算所述设定数量个写数据请求集合的校验数据。所述设定数量个写数据请求集合是所述多个写数据请求集合的子集。所述校验计算节点将所述设定数量个写数据请求集合中的每个写数据请求集合发送给所述写数据请求集合包括的数据节点的标识所指示的数据节点。所述校验计算节点保存所述校验数据。
在本申请提供的存储系统中,校验计算节点按照写数据请求包含的数据节点的标识,将多个写数据请求划分为多个写数据请求集合。当设定数量个写数据请求集合中所有写数据请求的大小达到预设数据量时,校验计算节点计算所述设定数量个写数据请求集合的校验数据,并将校验数据发送给校验节点存储,由此保证了数据的可靠性。并且,由于每个写数据请求集合包含待写入相同的数据节点的多个写数据请求,而每个写数据请求待写入的数据节点是所述主机根据所述写数据请求中的数据或者所述数据待写入的逻辑地址选择的。因此,每个写数据请求集合中保存的是相同类别的写数据请求。当主机计算校验数据之后,将每个写数据请求集合发送给所述写数据请求集 合包含的标识所指示的数据节点,相同类别的写数据请求就被存储在了同一个数据节点中。由于相同类别的写数据请求中的数据被同时读取的可能性较大,那么在这些数据被读取时可以在一个数据节点内读取,不用跨节点执行读取操作,提高了数据读取的效率。结合第六方面,在第六方面的第一种实现中,所述校验计算节点还用于为所述设定数量个写数据请求集合中的每个写数据请求分配标识,并且将每个写数据请求集合的标识发送给该写数据请求集合包含的数据节点的标识所述指示的数据节点。所述数据节点用于接收写数据请求集合以及所述写数据请求集合的标识。所述数据节点还用于创建并保存元数据。所述元数据包括所述写数据请求集合的标识与每个写数据请求中的数据待写入的逻辑地址之间的对应关系,以及每个写数据请求中的数据待写入的逻辑地址与内部偏移量之间的对应关系。在本申请中,由于相同类别的写数据请求作为一个写数据请求集合保存在一个数据节点,那么与这个写数据请求集合相关的元数据也保存在数据节点本地。后续如果对写数据请求集合进行垃圾回收等操作,可以直接在本地对元数据进行修改。
结合第六方面的第一种实现,在第六方面的第二种实现中,所述存储系统还包括元数据校验计算节点以及元数据校验节点。所述数据节点还用于在确定累积的元数据达到所述预设数据量时,将元数据集合以及所述数据节点的标识发送给所述元数据校验计算节点。所述元数据集合包括所述累积的达到所述预设数据量的元数据。所述元数据校验计算节点用于接收多个数据节点中每个数据节点发送的元数据集合以及所述数据节点的标识,保存每个元数据集合与数据节点的标识之间的对应关系。并且,根据所述对应关系从接收的多个元数据集合中选择设定数量个元数据集合。所述选择出的设定数据量个元数据集合对应不同的数据节点的标识。所述元数据校验计算节点还用于计算所述选择出的设定数量个元数据集合的校验数据。然后,所述元数据校验计算节点将所述设定数量个元数据集合的校验数据发送给所述元数据校验节点。所述元数据校验节点不同于存储所述设定数量个元数据集合中的 每个元数据集合所在的数据节点。由此,本申请提供的存储系统保证了各个数据节点中存储的元数据集合的可靠性。
结合第六方面的第一种实现,在第六方面的第三种实现中,所述存储系统还包括垃圾回收节点。垃圾回收节点用于执行系统垃圾回收操作。具体的,垃圾回收节点根据所述校验计算节点发送的分条标识与所述分条包括的每个写数据请求集合的标识之间的对应关系,以及数据节点发送的写数据请求集合的位图从多个分条中选择出包含无效数据最多的分条。本申请中的分条包括设定数量个写数据请求集合以及根据这些写数据请求集合计算出的校验数据。本申请中的位图用于指示写数据请求集合包含的无效数据的数据量。所述垃圾回收节点将所述包含无效数据最多的分条作为待回收的分条,向所述待回收的分条所包含的每个写数据请求集合所在的数据节点分别发送垃圾回收通知消息。每个垃圾回收通知消息用于通知数据节点进行系统垃圾回收,每个垃圾回收通知消息中包括写数据请求集合的标识。由于垃圾回收单元在执行系统垃圾回收操作时会选择包含无效数据的分条,因此提高了系统垃圾回收的效率。
结合第六方面的第三种实现,在第六方面的第四种实现中,数据节点在接收垃圾回收通知消息之后,根据写数据请求集合的标识以及保存的所述写数据请求集合的位图,对所述写数据请求集合执行系统垃圾回收操作。具体的,数据节点根据所述写数据请求集合的标识确定待回收的写数据请求集合。数据节点再根据所述写数据请求集合的位图确定所述写数据请求集合中包含的有效数据待写入的逻辑地址,删除所述有效数据待写入的逻辑地址与所述写数据请求集合的标识之间的对应关系,并且保存所述有效数据待写入的逻辑地址与重新拼凑的写数据请求集合的标识之间的对应关系。然后,数据节点将所述写数据请求集合的硬盘逻辑地址发送给固态硬盘,由固态硬盘将与所述硬盘逻辑地址对应的块标记为无效。当后续固态硬盘进行内部的垃圾回收时可以直接擦除这些块,无需再次进行有效数据的复制,减少了固态 硬盘内部写放大的次数。另外,本申请提供的系统垃圾回收是由每个数据节点将有效数据拼凑在本地的其他写数据请求集合中,因此所述数据节点独立完成系统垃圾回收,无需和其他数据节点进行数据交互,节省了数据节点之间的带宽。
本申请第七方面提供了一种数据校验方法,应用在第七方面或者第七方面的任意一种实现的存储系统中。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍。
图1是本发明实施例提供的存储系统的组成图;
图2是本发明实施例提供的主机的结构图;
图3是本发明实施例提供的闪存阵列的结构图;
图4是本发明实施例提供的一种数据校验方法的流程示意图;
图5是本发明实施例提供的另一种数据校验方法的流程示意图;
图6是本发明实施例提供的再一种数据校验方法的流程示意图;
图7是本发明实施例提供的系统垃圾回收方法的流程示意图。
具体实施方式
本发明实施例提出了一种数据校验的方法及存储系统,能够在保证数据的可靠性的基础上,将相同类别的数据存储在同一个存储节点中,在读取这些数据时只需在一个存储节点内读取,从而提高读取数据的效率。
图1描绘了本发明实施例提供的存储系统10的组成图,图1所示的存储系统10包括主机11和多个存储节点22。图1仅是示例性说明,并不限定具体的组网方式,如:级联树形组网、环状组网都可以。只要主机11和存储节点22之间能够相互通信。
主机11可以包括任何计算设备,如服务器、台式计算机等等。用户可以通过主机11触发读写指令,向存储节点22发送写数据请求或读数据请求。 在本实施例中,主机11可以与任意一个存储节点22通信,并且任意两个存储节点22之间也可以通信。
如图2所示,主机11主要包括处理器(英文:processor)118、缓存(英文:cache)120、存储器(英文:memory)122、通信总线(简称总线)126以及通信接口(英文:communication interface)128。处理器118、缓存120、存储器122以及通信接口128通过通信总线126完成相互间的通信。
通信接口128,用于与存储节点22通信。
存储器122,用于存放程序124,存储器122可能包含高速RAM存储器,也可能还包括非易失性存储器(英文:non-volatile memory),例如至少一个磁盘存储器。可以理解的是,存储器122可以为随机存储器(英文:Random-Access Memory,RAM)、磁碟、硬盘、固态硬盘(英文:solid state disk,SSD)或者非易失性存储器等各种可以存储程序代码的机器可读介质。
程序124可以包括程序代码。
缓存120(英文:cache)用于缓存从应用服务器10接收的数据或从存储节点22中读取的数据。缓存120可以是RAM、ROM、闪存(英文:flash memory)或固态硬盘(Solid State Disk,SSD)等各种可以存储数据的机器可读介质,在此不做限定。
另外,存储器122和缓存120可以合设或者分开设置,本发明实施例对此不做限定。
处理器118可能是一个中央处理器(英文:central processing unit,CPU)用于生成并处理写数据请求等等。
存储节点22可以是磁盘阵列或者闪存阵列或者存储服务器。磁盘阵列是指包括至少一个控制器和多个磁盘的存储设备。闪存阵列是指包括至少一个控制器和多个固态硬盘(英文:solid state device,SSD)的存储设备。固态硬盘是以闪存(英文:flash memory)芯片为存储介质的存储器,又名固态驱动器(英文:solid state drive,SSD)。另外,存储节点22还可以是存储 服务器,所述存储服务器包含至少一个固态硬盘。
以闪存阵列为例,如图3所示,存储节点22包括控制器221和一个或多个固态硬盘222。其中,控制器221至少包括接口223、接口224、处理器225和存储器226。
接口223,用于和主机11通信。接口224,用于和固态硬盘222通信。处理器225可能是一个中央处理器(英文:central processing unit,CPU)。
处理器225,用于接收来自主机11的写数据请求或者读数据请求、处理所述写数据请求或者读数据请求,处理器225还可以将写数据请求中的数据发送给固态硬盘222。
存储器226,用于存放程序。另外,存储器226还用于临时存储从主机11接收的写数据请求或从固态硬盘222读取的数据。控制器221接收主机发送的多个写数据请求时,可以将所述多个写数据请求暂时保存在存储器226中。当多个写数据请求的大小达到预设数据量时,将存储器226存储的多个写数据请求发送给校验计算节点。存储器226包含随机访问存储器(英文:random-access memory,RAM)。可选地,存储器226还包括非易失性存储器(英文:non-volatile memory),例如至少一个磁存储器。可以理解的是,存储器226可以为随机存储器(Random-Access Memory,RAM)、磁碟、硬盘、固态硬盘(Solid State Disk,SSD)或者非易失性存储器等各种可以存储程序代码的机器可读介质。
在本实施例中的多个存储节点中,用于存储主机11发送的写数据请求的存储节点被称为数据节点,用于计算多个写数据请求的校验数据的存储节点被称为校验计算节点,用于存储多个写数据请求的校验数据的存储节点被称为校验节点,用于计算元数据的校验数据的存储节点被称为元数据校验计算节点,用于进行系统垃圾回收的节点被称为垃圾回收节点。然而,这种划分并不是绝对的。例如,存储校验数据的校验节点也可以作为数据节点存储写数据请求。
本申请提供了至少两种应用场景,在一种应用场景中,主机11具有计算多个写数据请求的校验数据的功能。举例来说,主机11根据写数据请求包括的数据待写入的数据节点的标识,将多个写数据请求划分为多个写数据请求集合,每个集合包括将要被写入相同的数据节点的写数据请求。当部分或者全部集合中的所有写数据请求的大小达到预设数据量时,主机11计算这些写数据请求集合的校验数据。
在另一种应用场景中,主机11不具有计算多个写数据请求的校验数据的功能,计算校验数据的操作是由存储节点中的校验计算节点来完成的。主机11根据每个写数据请求携带的数据待写入的数据节点的标识,将每个写数据请求发送给所述数据节点。每个数据节点接收主机11发送的多个写数据请求,当写数据请求集合(写数据请求集合包括多个写数据请求)的大小达到预设数据量时,数据节点将所述写数据请求集合发送给校验计算节点,由校验计算节点计算多个写数据请求集合的校验数据。
下面将分别讨论基于以上两种应用场景的数据校验方法。
图4是应用于第一种场景的数据校验方法的流程图,如图4所示,所述方法包括如下步骤。示例性的,步骤S101-步骤S104可以由主机11的处理器118执行。
步骤S101:主机11生成多个写数据请求,每个写数据请求包括数据、所述数据待写入的数据节点的标识以及所述数据待写入的逻辑地址。数据待写入的逻辑地址包括逻辑单元(英文:Logic Unit)的标识、逻辑块地址(英文:Logical Block Address,LBA)和长度(英文:length)。其中,逻辑单元的标识用于指示所述数据待写入的逻辑单元,逻辑块地址用于指示所述数据位于所述逻辑单元内的位置,长度表示所述数据的大小。所述数据待写入的数据节点的标识用于指示所述数据待写入的数据节点。所述数据待写入的数据节点是主机11根据所述数据或者所述数据待写入的逻辑地址从存储系统10中的多个数据节点中选择的。
具体的,首先,主机11预先收集存储系统10中的数据节点的信息。所述数据节点的信息包括存储系统10中包括的数据节点的个数,以及每个数据节点的标识。数据节点的标识用于唯一识别所述数据节点。例如,存储系统10包含5个数据节点,每个数据节点的标识分别是A、B、C、D和E。主机11可以向每个数据节点发送查询请求以获得其标识,也可以由各个数据节点主动向主机11上报其标识,还可以在存储系统10中指定一个主节点,由所述主节点汇总各个数据节点的标识之后,向主机11上报所述各个数据节点的标识。
其次,主机11确定数据待写入的数据节点。一种实现方式是主机11根据所述数据待写入的逻辑地址确定所述数据待写入的数据节点。例如:主机11将每个写数据请求包括的数据待写入的数据节点的地址作为输入项,利用预先设定的哈希算法得到哈希值,所述哈希值唯一对应一个数据节点的标识。另外,主机11还可以利用散列算法或者取余等方式确定数据节点的标识。本实施例并不对确定数据节点的标识的算法进行任何限定,只要能根据所述逻辑地址唯一确定一个数据节点的标识即可。另一种实现方式是主机11根据所述数据确定所述数据待写入的数据节点。例如:主机11将数据作为输入项,利用预先设定的哈希算法得到哈希值,所述哈希值唯一对应一个数据节点的标识。同样的,主机11还可以利用散列算法或者取余等方式确定数据节点的标识。再一种实现方式是,主机11根据触发所述写数据请求的用户对多个写数据请求进行分类,为相同用户触发的写数据请求选择同一个数据节点。或者,主机11还可以根据数据的类型对这些写数据请求进行分类,为相同类型的数据选择同一个数据节点。
主机11在为数据选择了数据节点之后,将所述数据节点的标识写入携带所述数据的写数据请求中。所述数据节点的标识可以位于所述写数据请求的头部,也可以是所述写数据请求的载荷(英文:payload)。
用户在主机11中触发多个写数据请求时,向主机11发送指令要求将这 些写数据请求写入同一个数据节点。主机11可以根据用户的需求,为这些写数据请求选择一个数据节点,将这个数据节点携带在每个写数据请求。
由于每个写数据请求中的数据节点的标识是主机11根据所述写数据请求包括的数据或者数据待写入的逻辑地址分配的,因此具有同样的标识的写数据请求属于相同类别的写数据请求,这些写数据请求所携带的数据被同时读取的可能性较大。
步骤S102:主机根据写数据请求中包括的数据节点的标识,将所述多个写数据请求划分为多个写数据请求集合,每个集合包括具有相同的数据节点的标识的多个写数据请求。
需要说明的是,在实际实现时,并不需要将所述多个写数据请求存储在不同的地方,这里的“划分”仅仅是逻辑上的,以不同的数据节点的标识对所述多个写数据请求进行分类。例如,一个写数据请求集合中的多个写数据请求均包含数据节点A的标识,另一个写数据请求集合中的多个写数据请求均包含数据节点B的标识,再一个写数据请求集合中的多个写数据请求均包含数据节点C的标识。
步骤S103:当设定数量个写数据请求集合中的每个写数据请求集合中的所有写数据请求的大小达到预设数据量时,主机11计算所述设定数量个写数据请求集合的校验数据。
通常情况下,存储系统10包含若干个数据节点,相应地,主机11中的写数据请求集合也有若干个。随着生成的写数据请求的不断增多,每个写数据请求集合的大小也不断累积。主机11需要从大小达到预设数据量的写数据请求集合中选择设定数量个写数据请求集合,计算所述设定数量个写数据请求集合的校验数据。所述预设数据量是预先设定的,例如16KB。所述设定数量是由存储系统10预设设定的校验模式确定的。
预先设定的校验模式包括5+1模式,或者6+2模式等等。5+1模式是指 根据5个写数据请求集合生成一个校验数据;6+2模式是指根据6个写数据请求集合生成2个校验数据。以5+1模式为例,主机11从凑满预设数据量的多个写数据请求集合中选择5个写数据请求集合,再计算这5个写数据请求集合的校验数据。这5个写数据请求集合和校验数据就组成了一个分条(英文:stripe)。大小达到预设数据量的写数据请求集合就是所述分条的一个数据单元。
步骤S104,主机11将所述设定数量个写数据请求集合中的每个写数据请求集合发送给写数据请求集合包含的标识所指示的数据节点。
由前面的讨论可知,由于每个写数据请求集合中的写数据请求都包含相同的数据节点的标识,,因此主机可以根据所述标识将每个写数据请求集合发送给所述标识所指示的数据节点。
步骤S105,主机11从存储系统10选择一个存储节点用于存储所述校验数据,将所述校验数据发送给所述选择出的存储节点,所述用于存储校验数据的存储节点与组成这个分条的每个写数据请求集合所在的数据节点不相同。为了描述方便,将存储校验数据的存储节点称为校验节点。
主机11可以从剩余的存储节点(排除用于保存写数据请求集合的数据节点)中选择出一个或多个,用于保存校验数据。通常情况下,对于一个分条而言,写数据请求集合和校验数据不能保存在同一个存储节点中。这是为了防止一个存储节点发生故障时,写数据请求集合和校验数据同时丢失。然而,在实际应用中,主机11中通常可以构建多个分条。虽然,对于一个分条而言,选择出的用于保存校验数据的存储节点不能是保存这个分条的写数据请求集合的存储节点,但可以是保存有其他分条的写数据请求集合的存储节点。本实施例并没有限定某个存储节点专门用于存储写数据请求集合或者校验数据。
另外,主机11给每个分条分配分条标识,不同的分条具有不同的分条标识。并且,主机给每个分条包含的各个写数据请求集合以及校验数据分配 标识,将写数据请求集合的标识发送给所述写数据请求集合所在的数据节点。对于每个分条,主机11记录分条标识与所述分条包含的每个写数据请求集合的标识之间的对应关系以及所述分条标识与校验数据的标识之间的对应关系。
按照图4所示的数据校验方法,主机11按照写数据请求包含的数据节点的标识,将多个写数据请求划分为多个写数据请求集合。当设定数量个写数据请求集合中所有写数据请求的大小达到预设数据量时,主机11计算所述设定数量个写数据请求集合的校验数据,并将校验数据发送给校验节点存储,由此保证了数据的可靠性。并且,由于每个写数据请求集合包含待写入相同的数据节点的多个写数据请求,而每个写数据请求待写入的数据节点是所述主机根据所述写数据请求中的数据或者所述数据待写入的逻辑地址选择的。因此,每个写数据请求集合中保存的是相同类别的写数据请求。当主机11计算校验数据之后,将每个写数据请求集合发送给所述写数据请求集合包含的标识所指示的数据节点,相同类别的写数据请求就被存储在了同一个数据节点中。由于相同类别的写数据请求中的数据被同时读取的可能性较大,那么在这些数据被读取时可以在一个数据节点内读取,不用跨节点执行读取操作,提高了数据读取的效率。
应用于第一种场景的数据校验方法还有另外一种实现方式,与图4所示的实现方式不同之处在于,在这种实现方式中,是由校验计算节点按照写数据请求包含的数据节点的标识将接收的多个写数据请求划分为多个写数据请求集合,当设定数量个写数据请求集合达到预设数据量时,计算校验数据。而这些操作在图4所示的实现方式中是由主机11完成的。具体的,如图5所示,这种数据校验方法可以通过以下步骤实现。
步骤S401:该步骤与图4所示的步骤S101类似,这里不再赘述。
步骤S402:主机11将生成的写数据请求发送给校验计算节点。
步骤S403:所述校验计算节点根据写数据请求中包括的数据节点的标 识,将所述多个写数据请求划分为多个写数据请求集合,每个集合包括具有相同的数据节点的标识的多个写数据请求。与图4不同之处在于该步骤的执行主体是校验计算节点,其余部分与图4所示的步骤S102类似,这里不再赘述。
步骤S404:当设定数量个写数据请求集合中的每个写数据请求集合中的所有写数据请求的大小达到预设数据量时,所述校验计算节点计算所述设定数量个写数据请求集合的校验数据。与图4不同之处在于该步骤的执行主体是校验计算节点,其余部分与图4所示的步骤S103类似,这里不再赘述。
步骤S405:所述校验计算节点将所述设定数量个写数据请求集合中的每个写数据请求集合发送给写数据请求集合包含的标识所指示的数据节点。与图4不同之处在于该步骤的执行主体是校验计算节点,其余部分与图4所示的步骤S104类似,这里不再赘述。
所述校验计算节点在计算出校验数据之后,可以直接将所述校验数据保存在本地,不用转发给其他存储节点存储。如果按照预先设定的校验模式,计算得出的检验数据不止一个,那么所述校验计算节点可以从存储系统10选择一个存储节点用于存储另一个校验数据。选择方式与与图4所示的步骤S105类似,这里不再赘述。图6是应用于第二种场景的数据校验方法的流程图,如图6所示,所述方法包括如下步骤:
步骤S201:主机11生成多个写数据请求,每个写数据请求包括数据、所述数据待写入的数据节点的标识以及所述数据待写入的逻辑地址。步骤S201的内容与图4所示的方法中的步骤S101类似,这里不再赘述。
步骤S202:主机11将每个写数据请求发送给所述写数据请求包含的数据待写入的数据节点的标识所指示的数据节点。
步骤S203:数据节点接收主机11发送的多个写数据请求,将所述写入写数据请求写入存储器226中。具体的,该步骤可以由控制器221中的处理器225执行。
步骤S204:当数据节点的存储器226中保存的所有写数据请求的大小达到预设数据量时,数据节点将写数据请求集合发送给校验计算节点。所述写数据请求集合包括多个写数据请求,并且所述多个写数据请求的大小达到预设数据量。
具体的,该步骤可以由控制器221中的处理器225执行。每当数据节点接收主机11发送的写数据请求,就会将该写数据请求写入存储器226。因此存储器226会逐渐积累写数据请求,当积累的写数据请求的大小达到预设数据量时,该数据节点将这些积累的写数据请求作为一个集合发送给校验计算节点。所述预设数据量是所述存储系统10预先设定的,例如16KB。
对于任意一个数据节点而言,只要其存储器226中所有写数据请求的大小达到所述预设数据量,就会将存储器226中所有写数据请求发送给校验计算节点,不会考虑其他数据节点的存储器中累积的写数据请求是否达到所述预设数据量。在实际应用中,数据节点内部可能对积累写数据请求有时限的要求,那么如果当预设时限到达时,存储器226中所有写数据请求的大小尚未到达所述预设数据量,那么可以用0或者其他特殊标记的数据补足。
上面的描述是针对一个数据节点而言的,那么对于存储系统10中的多个数据节点来说,每个数据节点22都会先在本地凑满达到预设数据量的写数据请求集合,然后这些写数据请求集合发送给校验计算节点。校验计算节点根据接收的多个写数据请求集合计算校验数据,从而组成一个分条。
步骤S205:校验计算节点接收多个写数据请求集合,从所述多个写数据请求集合中选择设定数量个写数据请求集合,所述设定数量个写数据请求集合是所述多个写数据请求集合的子集,并且所述设定数量个写数据请求集合包括不同的数据节点的标识,计算所述设定数量个写数据请求集合的校验数据。
以存储节点C为校验计算节点为例,存储节点C可以与存储系统10中的任意一个存储节点22通信。存储节点C接收多个写数据请求集合之后, 按照预设设定的校验模式从所述多个写数据请求集合中选择设定数量个写数据请求集合,计算所述设定数量个写数据请求集合的校验数据。以5+1模式为例,当存储节点C从接收的多个写数据请求集合中选择5个来自不同数据节点的写数据请求集合,计算这5个写数据请求集合的校验数据。在某些情况下,存储节点C本地保存的写数据请求集合也可以参与校验数据的计算,则只需要接收其余4个数据节点发送的写数据请求集合即可。
当校验计算节点接收一个写数据请求集合时,保存所述写数据请求集合,为所述写数据请求集合分配标识,然后将所述写数据请求集合的标识写入链表或者其他数据结构中。所述写数据请求集合的标识可以与所述写数据请求集合中的写数据请求包括的数据节点的标识相同,也可以与所述写数据请求集合中的写数据请求包括的数据节点的标识不同。当所述写数据请求集合的标识与所述写数据请求集合中的写数据请求包括的数据节点的标识不同时,校验计算节点保存所述写数据请求集合的标识与数据节点的标识之间的对应关系。在一种实现方式中,所述写数据请求集合的标识是由校验计算节点分配,校验计算节点分配标识以后再将所述写数据请求集合的标识发送给所述写数据请求集合所在的数据节点,所述数据节点保存所述写数据请求集合的标识。在另一种实现方式中,所述写数据请求集合的标识是由所述写数据请求集合所在的数据节点分配的,数据节点分配标识以后再将所述写数据请求集合的标识发送给所述校验计算节点,所述校验计算节点保存所述写数据请求集合的标识。
为了保证数据的可靠性,校验计算节点在选择一定数目的写数据请求集合组成分条时,需保证组成同一个分条的多个写数据请求集合均来自不同的数据节点。以5+1模式为例,校验计算节点需要从所述链表中选择5个写数据请求集合。由于校验计算节点在一定时间内可能会接收多个写数据请求集合,其中某些写数据请求集合可能是来自同一个数据节点。因此,校验计算节点还需要保证这5个写数据请求集合来自不同的数据节点。具体的,校验 计算节点可以根据保存的写数据请求集合的标识与数据节点的标识之间的对应关系,或者直接根据写数据请求集合的标识从链表中选择5个写数据请求集合。需要说明的是,校验计算节点在选择组成分条的写数据请求集合时,并不需要考虑各个写数据请求集合进入所述链表的先后顺序,只需保证组成该分条的多个写数据请求集合来自不同的数据节点。
步骤S206:校验计算节点将校验数据发送给校验节点,所述校验节点不能和保存写数据请求集合的数据节点相同。
具体的,校验计算节点可以从剩余的存储节点(排除已经保存有写数据请求集合的数据节点)中选择出一个或多个,用于保存校验数据。用于保存校验数据的存储节点,在本实施例中称为校验节点。校验节点和所述设定数量个写数据请求集合所在的任意一个数据节点不能重复,这是为了防止一个存储节点发生故障时,一个分条所包含的写数据请求集合和校验数据同时丢失。然而,虽然选择出的用于保存校验数据的存储节点不能是保存是当前分条的写数据请求集合的数据节点,但可以是保存有其他分条的写数据请求集合的数据节点。本实施例并没有限定某个存储节点专门用于存储写数据请求集合或者校验数据。
另外,校验计算节点给每个分条分配分条标识,不同的分条具有不同的分条标识。并且,主机给每个分条包含的各个写数据请求集合以及校验数据分配标识,将写数据请求集合的标识发送给所述写数据请求集合所在的数据节点。或者,每个数据节点为保存的写数据集合分配标识,将所述写数据请求集合的标识发送给校验计算节点。对于每个分条,校验计算节点记录分条标识与所述分条包含的每个写数据请求集合的标识之间的对应关系以及所述分条标识与校验数据的标识之间的对应关系。
按照图6所示的数据校验方法,每个数据节点在累积的所有写数据请求的大小达到预设数据量时,将这些写数据请求作为一个写数据请求集合发送给校验计算节点。校验计算节点从接收的多个写数据请求集合中选择设定数 量个写数据请求集合计算所述设定数量个写数据请求集合的校验数据,并将校验数据发送给校验节点存储,由此保证了数据的可靠性。并且,由于每个写数据请求集合包含具有相同的数据节点的标识的多个写数据请求,而每个写数据请求待写入的数据节点是所述主机根据所述写数据请求中的数据或者所述数据待写入的逻辑地址选择的。因此,每个写数据请求集合中保存的是相同类别的写数据请求。由于相同类别的写数据请求中的数据被同时读取的可能性较大,那么在这些数据被读取时可以在一个数据节点内读取,不用跨节点执行读取操作,提高了数据读取的效率。
图4或图5或图6所示的数据校验方法实现了将相同类别的写数据请求写入同一个数据节点。在写数据请求写入数据节点之后,数据节点还需要创建并保存元数据。
元数据包括写数据请求集合的标识与每个写数据请求中的数据待写入的逻辑地址之间的对应关系。元数据还包括每个写数据请求在所述写数据请求集合中的位置。本实施例将每个写数据请求位于所述写数据请求集合中的位置称为内部偏移量。示例性的,写数据请求集合与其包含的各个写数据请求的数据待写入的逻辑地址之间的对应关系如表1所示:
逻辑地址 写数据请求集合的标识 内部偏移量
0x1+0x100+4096 1 0x1000
0x2+0x400+4096 1 0x2000
0x3+0x800+4096 1 0x3000
表1
如上表所示,标识为“1”的写数据请求集合由三个写数据请求组成,每个写数据请求除了包括数据之外还包括数据待写入的逻辑地址,另外,每个写数据请求在写数据请求集合内部都有自己对应的位置。以第一条写数据请求为例,所述写数据请求的逻辑地址包括卷ID、LBA和length,其中卷ID为0x1,LBA为0x100,length为4096。所述写数据请求属于标识为“1” 的写数据请求集合,并且在所述写数据请求集合中的位置为0x1000。
如果由主机将多个写数据请求划分为多个数据单元,计算多个数据单元的校验数据之后,把各个数据单元以及校验数据发送给系统中的多个存储节点存储。那么,元数据(写数据请求与数据单元之间的对应关系)是保存在主机中的。当后续在某个存储节点内对数据单元进行垃圾回收等操作时,需要给主机发送消息修改元数据,操作复杂并且耗费带宽。
然而,在本实施例中,由于相同类别的写数据请求作为一个写数据请求集合保存在一个数据节点,那么与这个写数据请求集合相关的元数据也保存在数据节点本地。后续如果对写数据请求集合进行垃圾回收等操作,可以直接在本地对元数据进行修改。
元数据可以保存在数据节点的存储器226中。为了保证元数据的可靠性,也需要计算并保存各个数据节点中的元数据的校验数据。具体的,随着数据节点接收的写数据请求增多,存储器226中保存的元数据也会增多。当存储器226中累积的元数据达到预设数据量时,数据节点将元数据集合发送给元数据校验计算节点。元数据集合是指大小达到预设数据量的元数据。所述预设数据量可以是所述存储系统10预先设定的,例如16KB。元数据校验计算节点是指存储节点中用于计算元数据集合的校验数据的存储节点。在实际实现时可以和校验计算节点相同。另外,在本实施例中,数据节点22除了将元数据集合发送给元数据校验计算节点,还需要将数据节点的标识发送给所述元数据校验计算节点。所述元数据集合和标识可以封装在一个消息中发送,也可以分别发送。
元数据校验计算节点接收每个数据节点发送的元数据集合和数据节点的标识,为每个元数据集合分配标识,并保存元数据集合的标识与数据节点的标识之间的对应关系。为了保证元数据的可靠性,元数据校验计算节点计算元数据集合的校验数据。具体的,元数据校验计算节点根据所述元数据集合的标识与数据节点的标识之间的对应关系,从多个元数据集合中选择设定 数量个元数据集合,计算选择出的设定数量个元数据集合的校验数据。所述选择出的元数据集合需对应不同的数据节点的标识。
与前面的描述类似,所谓设定数量是由存储系统10预先设定的校验模式决定的。需要说明的是,元数据集合的校验模式可以与写数据请求集合的校验模式相同,也可以不同。
元数据校验计算节点可以在存储系统10选择一个存储节点作为元数据校验节点。将所述元数据集合的校验数据发送给所述元数据校验节点。所述选择出的元数据校验节点不同于用于存储所述元数据集合的数据节点。然而,所述选择出的元数据校验节点可以是保存其他分条的元数据集合的数据节点。本实施例并没有限定某个数据节点专门用于存储元数据集合或者元数据集合的校验数据。
对于每个数据节点来说,当存储器226中保存的数据总量达到一定阈值时,则数据节点需要将存储器226中的数据写入固态硬盘222中。此时,数据节点中的控制器221可以以每个写数据请求集合为粒度为这个集合分配写入固态硬盘的逻辑地址(本实施例中称为硬盘逻辑地址),并保存所述写数据请求集合的标识与分配的硬盘逻辑地址之间的对应关系。固态硬盘222接收控制器221发送到写数据请求集合以及所述为所述集合分配的写入固态硬盘的逻辑地址,将所述写数据请求集合写入一个或多个块(英文:block)中。并且,固态硬盘222保存所述为所述集合分配的写入固态硬盘的逻辑地址与实际地址之间的对应关系。实际地址是指可以是固态硬盘中该数据的物理地址,也可以是在所述物理地址的基础上经过虚拟化,只对固态硬盘可见的地址。
对于每个数据节点保存的写数据请求集合,经过一段时间之后,写数据请求集合中的部分数据可能成为无效数据。数据的有效性是以所述数据是否被修改来确定的。如果所述数据是第一次写入,可以将所述数据记录为有效(称为有效数据)。如果所述数据被修改,则将所述修改前的数据记录为无 效(称为无效数据)。以表1为例,当标识为1的写数据请求集合所在的数据节点接收到第四条写数据请求,并且第四条写数据请求中的数据待写入的逻辑地址与第一条写数据请求中的数据待写入的逻辑地址相同,那么说明第四条写数据请求是用于覆盖所述第一条写数据请求的。由于固态硬盘的写入特性,第四条写数据请求中的数据(新数据)不会直接覆盖第一条写数据请求的数据(旧数据),而是在固态硬盘上分配一块空白的块,写入所述新数据。从而,第一条写数据请求中的数据就成了无效数据。也意味着,标识为1的写数据请求集合中的部分数据成为无效数据。数据节点可以用位图记录每个写数据请求集合包含的无效数据的信息,所述无效数据的信息包括无效数据待写入的逻辑地址以及无效数据的数据量。例如,位图的每个“位”对应大小为1KB的数据待写入的逻辑地址,当“位”为1时,代表所述逻辑地址中存储的数据有效,当“位”为0时,代表所述逻辑地址中存储的数据无效。
随着无效数据的增多,数据节点需要进行系统垃圾回收操作。系统垃圾回收以分条为单位执行,由于分条包含的写数据请求集合分布在数据节点中。因此,在数据节点内部以写数据请求集合为单位执行系统垃圾回收。存储系统10还包括垃圾回收节点,垃圾回收节点是存储节点22中用于执行系统垃圾回收操作的节点。图7是系统垃圾回收方法的流程示意图,如图7所示,可以包括如下步骤:
S301,垃圾回收节点根据每个分条包含的写数据请求集合的位图,从多个分条选择待回收的分条。
在第一种应用场景中,主机11将分条标识与所述分条标识与所述分条包含的每个写数据请求集合的标识之间的对应关系发送给垃圾回收节点。在第二种应用场景中,校验计算节点将分条标识与所述分条标识与所述分条包含的每个写数据请求集合的标识之间的对应关系发送给垃圾回收节点。另外,无论是第一种应用场景还是第二种应用场景,数据节点都需要将写数据 请求集合的位图发送给垃圾回收节点。
垃圾回收节点需要从多个分条中选择一个待回收的分条。为了使系统垃圾回收操作的回收效率最高,垃圾回收节点通常选择包含无效数据最多的分条作为待回收的分条,因此,需要统计每个分条包含的无效数据的数据量。具体的,垃圾回收节点可以根据分条标识与所述分条包含的每个写数据请求集合的标识之间的对应关系确定该分条包含的所有的写数据请求集合,再根据每个写数据请求集合的位图确定每个写数据请求集合所包含的无效数据的数据量,从而确认所述分条所包含的无效数据量。由此,垃圾回收节点可以将包含无效数据最多的分条作为待回收的分条。此外,本实施例也可以根据其他条件选择待回收的分条,例如,较长时间内未进行系统垃圾回收操作的分条,或者按照预设的顺序依次对各个分条进行系统垃圾回收。
S302,垃圾回收节点向每个写数据请求集合所在的数据节点发送垃圾回收通知消息。
如前面所述,垃圾回收节点可以根据分条标识与所述分条标识与所述分条包含的每个写数据请求集合的标识之间的对应关系确定每个分条所包含的写数据请求集合的标识。然后,根据写数据请求集合的标识与数据节点的标识之间的对应关系,确定每个写数据请求集合所在的数据节点(如果写数据请求集合的标识与数据节点的标识一致,则可以直接根据写数据请求集合的标识确定每个写数据请求集合所在的数据节点),从而向每个写数据请求集合所在的数据节点发送垃圾回收通知消息,所述消息用于通知数据节点对所述写数据请求集合进行系统垃圾回收操作。所述消息包括所述写数据请求集合的标识。
S303,每个数据节点接收所述垃圾回收通知消息,根据所述消息中携带的写数据请求集合的标识确定待回收的写数据请求集合。然后,每个数据节点再根据所述待回收的写数据请求集合的位图确定所述待回收的写数据请求集合包含的有效数据和无效数据。例如,表1的第一条写数据请求的数据 变成了无效数据。第二条写数据请求的数据、第三条写数据请求的数据仍然是有效数据。
S304,每个数据节点将所述待回收的写数据请求集合中的有效数据迁移到新的写数据请求集合中,并回收所述写数据请求集合。
将有效数据迁移到新的写数据请求集合是指将所述有效数据所在的写数据请求拼凑在新的写数据请求集合中。例如,将表1中的第二条写数据请求以及第三条写数据请求从固态硬盘222中读入存储器226(如果第二条写数据请求或者第三条写数据请求尚且保存在存储器226中,则不需要执行此操作),待存储器226中存储的所有写数据请求的大小重新达到预设数据量时,就凑满一个新的写数据请求集合。具体的,数据节点再根据所述待回收的写数据请求集合的位图确定所述写数据请求集合中包含的有效数据待写入的逻辑地址(这里的逻辑地址是指数据待写入的逻辑地址),删除所述有效数据待写入的逻辑地址与所述写数据请求集合的标识之间的对应关系,并且保存所述有效数据待写入的逻辑地址与新的写数据请求集合的标识之间的对应关系。由于这些元数据保存在数据节点本地,因此数据节点在执行系统垃圾回收之后可以直接在本地修改元数据,无需跨节点操作。然后,数据节点将所述待回收的写数据请求集合的硬盘逻辑地址发送给固态硬盘,由固态硬盘将与所述硬盘逻辑地址对应的块标记为无效。当后续固态硬盘进行内部的垃圾回收时可以直接擦除这些块,无需再次进行有效数据的复制,减少了固态硬盘内部写放大的次数。另外,数据节点还需要删除所述待回收的写数据请求集合的标识与分配的硬盘逻辑地址之间的对应关系。当一个分条所包含的所有写数据请求集合都被回收了,那么该分条所包含的校验数据也没有存在的意义了,垃圾回收节点可以通知校验节点删除所述校验数据。
本实施例提供的系统垃圾回收是由每个数据节点将有效数据拼凑在本地的其他写数据请求集合中,因此所述数据节点独立完成系统垃圾回收,无需和其他数据节点进行数据交互,节省了数据节点之间的带宽。
本领域普通技术人员将会理解,本发明的各个方面、或各个方面的可能实现方式可以被具体实施为系统、方法或者计算机程序产品。因此,本发明的各方面、或各个方面的可能实现方式可以采用完全硬件实施例、完全软件实施例(包括固件、驻留软件等等),或者组合软件和硬件方面的实施例的形式,在这里都统称为“电路”、“模块”或者“系统”。此外,本发明的各方面、或各个方面的可能实现方式可以采用计算机程序产品的形式,计算机程序产品是指存储在计算机可读介质中的计算机可读程序代码。
计算机可读介质包含但不限于电子、磁性、光学、电磁、红外或半导体系统、设备或者装置,或者前述的任意适当组合,如随机访问存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM)、光盘。
计算机中的处理器读取存储在计算机可读介质中的计算机可读程序代码,使得处理器能够执行在流程图中每个步骤、或各步骤的组合中规定的功能动作。
计算机可读程序代码可以完全在用户的计算机上执行、部分在用户的计算机上执行、作为单独的软件包、部分在用户的计算机上并且部分在远程计算机上,或者完全在远程计算机或者服务器上执行。也应该注意,在某些替代实施方案中,在流程图中各步骤、或框图中各块所注明的功能可能不按图中注明的顺序发生。例如,依赖于所涉及的功能,接连示出的两个步骤、或两个块实际上可能被大致同时执行,或者这些块有时候可能被以相反顺序执行。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。本领域普通技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,本领域普通技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应所述以权利要求的保护范围为准。

Claims (27)

  1. 一种存储系统,其特征在于,包括主机、校验节点和多个数据节点,每个数据节点具有唯一的标识,
    所述主机,用于根据写数据请求中包括的数据节点的标识,将多个写数据请求划分为多个写数据请求集合,每个写数据请求集合包括具有相同的数据节点的标识的多个写数据请求,每个写数据请求包括数据、所述数据待写入的逻辑地址以及所述数据待写入的数据节点的标识,所述数据待写入的数据节点是所述主机根据所述数据或所述逻辑地址选择的;
    当设定数量个写数据请求集合中的每个写数据请求集合中的所有写数据请求的大小达到预设数据量时,计算所述设定数量个写数据请求集合的校验数据,所述设定数量个写数据请求集合是所述多个写数据请求集合的子集;
    将所述设定数量个写数据请求集合中的每个写数据请求集合发送给所述写数据请求集合包括的数据节点的标识所指示的数据节点;
    将所述校验数据发送给所述校验节点。
  2. 根据权利要求1所述的存储系统,其特征在于,
    所述主机,还用于为所述设定数量个写数据请求集合中的每个写数据请求集合分配标识,并且将每个写数据请求集合的标识发送给该写数据请求集合包含的数据节点的标识所指示的数据节点;
    所述数据节点,用于接收写数据请求集合以及所述写数据请求集合的标识;创建元数据,所述元数据包括所述写数据请求集合的标识与每个写数据请求中的数据待写入的逻辑地址之间的对应关系,以及每个写数据请求中的数据待写入的逻辑地址与内部偏移量之间的对应关系。
  3. 根据权利要求2所述的存储系统,其特征在于,所述存储系统还包 括元数据校验计算节点以及元数据校验节点;
    所述数据节点,还用于在确定累积的元数据达到所述预设数据量时,将元数据集合以及所述数据节点的标识发送给所述元数据校验计算节点,所述元数据集合包括所述累积的达到所述预设数据量的元数据;
    所述元数据校验计算节点,用于接收多个数据节点中每个数据节点发送的元数据集合以及所述数据节点的标识,保存每个元数据集合与数据节点的标识之间的对应关系,根据所述对应关系从接收的多个元数据集合中选择设定数量个元数据集合,所述选择出的设定数据量个元数据集合对应不同的数据节点的标识;计算所述选择出的设定数量个元数据集合的校验数据;将所述设定数量个元数据集合的校验数据发送给所述元数据校验节点,所述元数据校验节点不同于存储所述设定数量个元数据集合中的每个元数据集合所在的数据节点。
  4. 根据权利要求2所述的存储系统,其特征在于,所述存储系统还包括垃圾回收节点;
    所述主机,还用于为分条分配标识,所述分条包括所述设定数量个写数据请求集合以及所述设定数量个写数据请求集合的校验数据;将所述分条标识与所述分条包括的每个写数据请求集合的标识之间的对应关系,发送给垃圾回收节点;
    所述数据节点,还用于将保存的写数据请求集合的位图发送给所述垃圾回收节点,所述位图用于指示所述写数据请求集合的无效数据的数据量;
    所述垃圾回收节点,用于根据所述分条标识与所述分条包括的每个写数据请求集合的标识之间的对应关系,以及每个写数据请求集合的位图确定所述分条包含的无效数据的数据量大于其他任意一个分条包含的无效数据的数据量,并且在确定所述分条包含的无效数据的数据量大于其他任意一个分条包含的无效数据的数据量时,向所述分条包含的每个写数据请求集合所在 的数据节点分别发送垃圾回收通知消息,每个垃圾回收通知消息包括该写数据请求集合的标识。
  5. 根据权利要求4所述的存储系统,其特征在于,
    所述数据节点,还用于在接收所述垃圾回收通知消息之后,根据所述写数据请求集合的标识以及保存的所述写数据请求集合的位图,对所述写数据请求集合执行系统垃圾回收操作。
  6. 根据权利要求1-5任一所述的存储系统,其特征在于,所述设定数量是根据预先设定的校验模式确定的。
  7. 一种数据校验方法,其特征在于,所述方法用于存储系统中,所述存储系统包括主机、校验节点和多个数据节点,每个数据节点具有唯一的标识,
    所述主机根据写数据请求中包括对数据节点的标识,将多个写数据请求划分为多个写数据请求集合,每个写数据请求集合包括具有相同的数据节点的标识的多个写数据请求,每个写数据请求包括数据、所述数据待写入的逻辑地址以及所述数据待写入的数据节点的标识,所述数据待写入的数据节点是所述主机根据所述数据或所述逻辑地址选择的;
    当设定数量个写数据请求集合中的每个写数据请求集合中的所有写数据请求的大小达到预设数据量时,所述主机计算所述设定数量个写数据请求集合的校验数据,所述设定数量个写数据请求集合是所述多个写数据请求集合的子集;
    所述主机将所述设定数量个写数据请求集合中的每个写数据请求集合发送给所述写数据请求集合包括的数据节点的标识所指示的数据节点;
    所述主机将所述校验数据发送给所述校验节点。
  8. 根据权利要求7所述的方法,其特征在于,还包括:
    所述主机为所述设定数量个写数据请求集合中的每个写数据请求集合分配标识,将每个写数据请求集合的标识发送给该写数据请求集合包含的数据节点的标识所述指示的数据节点;
    所述数据节点接收写数据请求集合以及所述写数据请求集合的标识;创建元数据,所述元数据包括所述写数据请求集合的标识与每个写数据请求中的数据待写入的逻辑地址之间的对应关系,以及每个写数据请求中的数据待写入的逻辑地址与内部偏移量之间的对应关系。
  9. 根据权利要求8所述的方法,其特征在于,所述存储系统还包括元数据校验计算节点以及元数据校验节点,所述方法还包括:
    所述数据节点在确定累积的元数据达到所述预设数据量时,将元数据集合以及所述数据节点的标识发送给所述元数据校验计算节点,所述元数据集合包括所述累积的达到所述预设数据量的元数据;
    所述元数据校验计算节点接收多个数据节点中每个数据节点发送的元数据集合以及所述数据节点的标识;
    所述元数据校验计算节点保存每个元数据集合与数据节点的标识之间的对应关系;
    所述元数据校验计算节点根据所述对应关系从接收的多个元数据集合中选择设定数量个元数据集合,所述选择出的设定数据量个元数据集合对应不同的数据节点的标识;
    所述元数据校验计算节点计算所述选择出的设定数量个元数据集合的校验数据,将所述设定数量个元数据集合的校验数据发送给所述元数据校验节点,所述元数据校验节点不同于存储所述设定数量个元数据集合中的每个元数据集合所在的数据节点。
  10. 根据权利要求8所述的方法,其特征在于,所述存储系统还包括垃圾回收节点,所述方法还包括:
    所述主机为分条分配标识,所述分条包括所述设定数量个写数据请求集合以及所述设定数量个写数据请求集合的校验数据;
    所述主机将所述分条标识与所述分条包括的每个写数据请求集合的标识之间的对应关系,发送给垃圾回收节点;
    所述数据节点将保存的写数据请求集合的位图发送给所述垃圾回收节点,所述位图用于指示所述写数据请求集合的无效数据的数据量;
    所述垃圾回收节点根据所述分条标识与所述分条包括的每个写数据请求集合的标识之间的对应关系,以及每个写数据请求集合的位图确定所述分条包含的无效数据的数据量大于其他任意一个分条包含的无效数据的数据量;
    所述垃圾回收节点在确定所述分条包含的无效数据的数据量大于其他任意一个分条包含的无效数据的数据量时,向所述分条包含的每个写数据请求集合所在的数据节点分别发送垃圾回收通知消息,每个垃圾回收通知消息包括该写数据请求集合的标识。
  11. 根据权利要求10所述的方法,其特征在于,还包括:
    所述数据节点在接收所述垃圾回收通知消息之后,根据所述写数据请求集合的标识以及保存的所述写数据请求集合的位图,对所述写数据请求集合执行系统垃圾回收操作。
  12. 根据权利要求7-11任一所述的方法,其特征在于,所述设定数量是根据预先设定的校验模式确定的。
  13. 一种主机,其特征在于,包括通信接口和处理器;
    所述通信接口,用于与校验节点、以及多个数据节点通信,其中,每个 数据节点具有唯一的标识;
    所述处理器,用于根据写数据请求中包括的数据节点的标识,将多个写数据请求划分为多个写数据请求集合,每个写数据请求集合包括具有相同的数据节点的标识的多个写数据请求,每个写数据请求包括数据、所述数据待写入的逻辑地址以及所述数据待写入的数据节点的标识,所述数据待写入的数据节点是所述主机根据所述数据或所述逻辑地址选择的;
    当设定数量个写数据请求集合中的每个写数据请求集合中的所有写数据请求的大小达到预设数据量时,计算所述设定数量个写数据请求集合的校验数据,所述设定数量个写数据请求集合是所述多个写数据请求集合的子集;
    将所述设定数量个写数据请求集合中的每个写数据请求集合发送给所述写数据请求集合包括的数据节点的标识所指示的数据节点;
    将所述校验数据发送给所述校验节点。
  14. 一种存储系统,其特征在于,所述存储系统包括校验节点、校验计算节点和多个数据节点,每个数据节点具有唯一的标识,
    每个数据节点,用于将写数据请求集合发送给所述校验计算节点,所述写数据请求集合包括一个或多个写数据请求,每个写数据请求包括数据以及所述数据待写入的数据节点的标识,所述写数据请求集合的大小等于预设数据量;
    所述校验计算节点,用于接收多个写数据请求集合;从所述多个写数据请求集合中选择设定数量个写数据请求集合,所述设定数量个写数据请求集合是所述多个写数据请求集合的子集,并且所述设定数量个写数据请求集合包括不同的数据节点的标识;计算所述设定数量个写数据请求集合的校验数据;将所述校验数据发送给所述校验节点,所述校验节点不同于所述设定数量个写数据请求集合中的每个写数据请求集合所在的数据节点。
  15. 根据权利要求14所述的存储系统,其特征在于,每个写数据请求还包括所述数据待写入的逻辑地址,所述数据待写入的数据节点是所述存储系统中的主机根据所述数据或者所述数据待写入的逻辑地址选择的;
    每个数据节点,还用于接收所述主机发送的多个写数据请求。
  16. 根据权利要求15所述的存储系统,其特征在于,
    所述数据节点,还用于为保存的写数据请求集合分配标识,将所述写数据请求的标识发送给所述校验计算节点;创建元数据,所述元数据包括所述写数据请求集合的标识与每个写数据请求中的数据待写入的逻辑地址之间的对应关系,以及每个写数据请求中的数据待写入的逻辑地址与内部偏移量之间的对应关系。
  17. 根据权利要求16所述的存储系统,其特征在于,所述存储系统还包括元数据校验计算节点以及元数据校验节点,
    所述数据节点,还用于在确定累积的元数据达到所述预设数据量时,将元数据集合以及数据节点的标识发送给所述元数据校验计算节点,所述元数据集合包括所述累积的达到所述预设数据量的元数据;
    所述元数据校验计算节点,用于接收每个数据节点发送的元数据集合以及所述数据节点的标识,保存每个元数据集合与数据节点的标识之间的对应关系,根据所述对应关系从接收的多个元数据集合中选择所述设定数量个元数据集合,所述设定数据量个元数据集合对应不同的数据节点的标识;计算所述设定数量个元数据集合的校验数据;将所述设定数量个元数据集合的校验数据发送给所述元数据校验节点,所述元数据校验节点不同于存储所述设定数量个元数据集合中的每个元数据集合的数据节点。
  18. 根据权利要求16所述的存储系统,其特征在于,所述存储系统还包括垃圾回收节点;所述校验计算节点,还用于为分条分配标识,所述分条 包括所述设定数量个写数据请求集合以及所述设定数量个写数据请求集合的校验数据;将所述分条标识与所述分条包括的每个写数据请求集合的标识之间的对应关系,发送给所述垃圾回收节点;
    每个数据节点,还用于将保存的写数据请求集合的位图发送给所述垃圾回收节点,所述位图用于指示所述写数据请求集合的无效数据的数据量;
    所述垃圾回收节点,用于根据所述分条标识与所述分条包括的每个写数据请求集合的标识之间的对应关系,以及每个写数据请求集合的位图确定所述分条包含的无效数据的数据量大于其他任意一个分条包含的无效数据的数据量;在确定所述分条包含的无效数据的数据量大于其他任意一个分条包含的无效数据的数据量时,向所述分条包含的每个写数据请求集合所在的数据节点分别发送垃圾回收通知消息,每个垃圾回收通知消息包括写数据请求集合的标识。
  19. 根据权利要求18所述的存储系统,其特征在于,
    所述数据节点,还用于在接收所述垃圾回收通知消息之后,根据写数据请求集合的标识以及保存的所述写数据请求集合的位图,对所述写数据请求集合执行系统垃圾回收操作。
  20. 根据权利要求14-19任一所述的存储系统,其特征在于,所述设定数量是根据预先设定的校验模式确定的。
  21. 一种数据校验方法,其特征在于,所述方法应用于存储系统中,所述存储系统包括校验节点、校验计算节点和多个数据节点,每个数据节点具有唯一的标识,
    每个数据节点将写数据请求集合发送给所述校验计算节点,所述写数据请求集合包括多个写数据请求,每个写数据请求包括数据以及所述数据待写入的数据节点的标识,所述写数据请求集合的大小等于预设数据量;
    所述校验计算节点接收多个写数据请求集合;从所述多个写数据请求集合中选择设定数量个写数据请求集合,所述设定数量个写数据请求集合是所述多个写数据请求集合的子集,并且所述设定数量个写数据请求集合来自不同的数据节点;计算所述设定数量个写数据请求集合的校验数据;将所述校验数据发送给所述校验节点,所述校验节点不同于所述设定数量个写数据请求集合中的每个写数据请求集合所在的数据节点。
  22. 根据权利要求21所述的存储系统,其特征在于,每个写数据请求还包括所述数据待写入的逻辑地址,所述数据待写入的数据节点所述存储系统中的主机根据所述数据或者所述数据待写入的逻辑地址选择的;所述方法还包括:
    每个数据节点接收所述主机发送的多个写数据请求。
  23. 根据权利要求22所述的方法,其特征在于,还包括:
    所述数据节点为保存的写数据请求集合分配标识,将所述写数据请求集合的标识发送给所述校验计算节点;创建元数据,所述元数据包括所述写数据请求集合的标识与每个写数据请求中的数据待写入的逻辑地址之间的对应关系,以及每个写数据请求中的数据待写入的逻辑地址与内部偏移量之间的对应关系。
  24. 根据权利要求23所述的方法,其特征在于,所述存储系统还包括元数据校验计算节点以及元数据校验节点,所述方法还包括:
    所述数据节点在确定累积的元数据达到所述预设数据量时,将元数据集合以及数据节点的标识发送给所述元数据校验计算节点,所述元数据集合包括所述累积的达到所述预设数据量的元数据;
    所述元数据校验计算节点接收每个数据节点发送的元数据集合以及所述数据节点的标识,保存每个元数据集合与数据节点的标识之间的对应关 系,根据所述对应关系从接收的多个元数据集合中选择所述设定数量个元数据集合,所述设定数量个元数据集合对应不同的数据节点的标识;计算所述设定数量个元数据集合的校验数据;将所述设定数量个元数据集合的校验数据发送给所述元数据校验节点,所述元数据校验节点不同于存储所述设定数量个元数据集合中每个元数据集合的数据节点。
  25. 根据权利要求23所述的方法,其特征在于,所述存储系统还包括垃圾回收节点,所述方法还包括:
    所述校验计算节点为分条分配标识,所述分条包括所述设定数量个写数据请求集合以及所述设定数量个写数据请求集合的校验数据;将所述分条标识与所述分条包括的每个写数据请求集合的标识之间的对应关系发送给垃圾回收节点;
    每个数据节点将保存的写数据请求集合的位图发送给所述垃圾回收节点,所述位图用于指示所述写数据请求集合的无效数据的数据量;
    所述垃圾回收节点根据所述分条标识与所述分条包括的每个写数据请求集合的标识之间的对应关系,以及每个写数据请求集合的位图确定所述分条包含的无效数据的数据量大于其他任意一个分条包含的无效数据的数据量;在确定所述分条包含的无效数据的数据量大于其他任意一个分条包含的无效数据的数据量时,向所述分条包含的每个写数据请求集合所在的数据节点分别发送垃圾回收通知消息,每个垃圾回收通知消息包括写数据请求集合的标识。
  26. 根据权利要求25所述的方法,其特征在于,还包括:
    所述数据节点在接收所述垃圾回收通知消息之后,根据写数据请求集合的标识以及保存的所述写数据请求集合的位图,对所述写数据请求集合执行系统垃圾回收操作。
  27. 根据权利要求21-26任一所述的方法,其特征在于,所述设定数量是根据预先设定的校验模式确定的。
PCT/CN2016/107355 2016-11-25 2016-11-25 一种数据校验的方法及存储系统 WO2018094704A1 (zh)

Priority Applications (9)

Application Number Priority Date Filing Date Title
SG11201707304RA SG11201707304RA (en) 2016-11-25 2016-11-25 Data check method and storage system
EP16897464.0A EP3352071B1 (en) 2016-11-25 2016-11-25 Data check method and storage system
JP2017552874A JP6526235B2 (ja) 2016-11-25 2016-11-25 データチェック方法および記憶システム
CN201680003337.5A CN109074227B (zh) 2016-11-25 2016-11-25 一种数据校验的方法及存储系统
PCT/CN2016/107355 WO2018094704A1 (zh) 2016-11-25 2016-11-25 一种数据校验的方法及存储系统
CA2978927A CA2978927C (en) 2016-11-25 2016-11-25 Data check method and storage system
AU2016397189A AU2016397189B2 (en) 2016-11-25 2016-11-25 Data check method and storage system
BR112017020736A BR112017020736B8 (pt) 2016-11-25 2016-11-25 método de verificação de dados e sistema de armazenamento
US16/110,504 US10303374B2 (en) 2016-11-25 2018-08-23 Data check method and storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/107355 WO2018094704A1 (zh) 2016-11-25 2016-11-25 一种数据校验的方法及存储系统

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/110,504 Continuation US10303374B2 (en) 2016-11-25 2018-08-23 Data check method and storage system

Publications (1)

Publication Number Publication Date
WO2018094704A1 true WO2018094704A1 (zh) 2018-05-31

Family

ID=62188856

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/107355 WO2018094704A1 (zh) 2016-11-25 2016-11-25 一种数据校验的方法及存储系统

Country Status (9)

Country Link
US (1) US10303374B2 (zh)
EP (1) EP3352071B1 (zh)
JP (1) JP6526235B2 (zh)
CN (1) CN109074227B (zh)
AU (1) AU2016397189B2 (zh)
BR (1) BR112017020736B8 (zh)
CA (1) CA2978927C (zh)
SG (1) SG11201707304RA (zh)
WO (1) WO2018094704A1 (zh)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111949434B (zh) * 2019-05-17 2022-06-14 华为技术有限公司 磁盘冗余阵列raid管理方法、raid控制器和系统
JP7490394B2 (ja) 2020-03-06 2024-05-27 株式会社日立製作所 情報共有支援方法、及び情報共有支援システム
US12066980B2 (en) * 2020-05-12 2024-08-20 Hewlett Packard Enterprise Development Lp File system metadata
CN112365124A (zh) * 2020-10-16 2021-02-12 重庆恢恢信息技术有限公司 智慧工地中施工材料快速分配工作方法
CN114697349A (zh) * 2020-12-28 2022-07-01 华为技术有限公司 使用中间设备对数据处理的方法、计算机系统、及中间设备
CN113419684B (zh) * 2021-07-09 2023-02-24 深圳大普微电子科技有限公司 一种数据处理方法、装置、设备及可读存储介质
CN116166179A (zh) * 2021-11-25 2023-05-26 华为技术有限公司 数据存储系统、智能网卡及计算节点
CN115933995B (zh) * 2023-01-09 2023-05-09 苏州浪潮智能科技有限公司 固态硬盘中数据写入方法、装置、电子设备及可读介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102546755A (zh) * 2011-12-12 2012-07-04 华中科技大学 云存储系统的数据存储方法
CN102968498A (zh) * 2012-12-05 2013-03-13 华为技术有限公司 数据处理方法及装置
CN103761058A (zh) * 2014-01-23 2014-04-30 天津中科蓝鲸信息技术有限公司 Raid1和raid4混合结构网络存储系统及方法
CN104216664A (zh) * 2013-06-26 2014-12-17 华为技术有限公司 网络卷创建方法、数据存储方法、存储设备和存储系统
CN105404469A (zh) * 2015-10-22 2016-03-16 浙江宇视科技有限公司 一种视频数据的存储方法和系统

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3713788B2 (ja) 1996-02-28 2005-11-09 株式会社日立製作所 記憶装置および記憶装置システム
US6742081B2 (en) * 2001-04-30 2004-05-25 Sun Microsystems, Inc. Data storage array employing block checksums and dynamic striping
JP4383321B2 (ja) * 2004-11-09 2009-12-16 富士通株式会社 記憶制御装置および外部記憶装置
US8332608B2 (en) * 2008-09-19 2012-12-11 Mediatek Inc. Method of enhancing command executing performance of disc drive
JP5317827B2 (ja) * 2009-05-19 2013-10-16 日本電信電話株式会社 分散データ管理装置及び方法及びプログラム
JP6005533B2 (ja) * 2013-01-17 2016-10-12 株式会社東芝 記憶装置および記憶方法
JP2014203233A (ja) * 2013-04-04 2014-10-27 株式会社日立製作所 ストレージシステム及びストレージシステムにおいてデータを更新する方法
CN103647797A (zh) * 2013-11-15 2014-03-19 北京邮电大学 一种分布式文件系统及其数据访问方法
CN103699494B (zh) 2013-12-06 2017-03-15 北京奇虎科技有限公司 一种数据存储方法、数据存储设备和分布式存储系统
EP2933733A4 (en) 2013-12-31 2016-05-11 Huawei Tech Co Ltd DATA PROCESSING METHOD AND DEVICE IN A DISTRIBUTED FILE STORAGE SYSTEM
JP2016184372A (ja) * 2015-03-27 2016-10-20 富士通株式会社 ストレージシステム,情報処理装置,パリティ生成プログラム及びパリティ生成方法
CN107844268B (zh) * 2015-06-04 2021-09-14 华为技术有限公司 一种数据分发方法、数据存储方法、相关装置以及系统
CN105930103B (zh) * 2016-05-10 2019-04-16 南京大学 一种分布式存储ceph的纠删码覆盖写方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102546755A (zh) * 2011-12-12 2012-07-04 华中科技大学 云存储系统的数据存储方法
CN102968498A (zh) * 2012-12-05 2013-03-13 华为技术有限公司 数据处理方法及装置
CN104216664A (zh) * 2013-06-26 2014-12-17 华为技术有限公司 网络卷创建方法、数据存储方法、存储设备和存储系统
CN103761058A (zh) * 2014-01-23 2014-04-30 天津中科蓝鲸信息技术有限公司 Raid1和raid4混合结构网络存储系统及方法
CN105404469A (zh) * 2015-10-22 2016-03-16 浙江宇视科技有限公司 一种视频数据的存储方法和系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3352071A4 *

Also Published As

Publication number Publication date
US20180364920A1 (en) 2018-12-20
BR112017020736B1 (pt) 2021-01-19
BR112017020736A2 (pt) 2018-07-17
AU2016397189B2 (en) 2019-07-25
EP3352071A1 (en) 2018-07-25
AU2016397189A1 (en) 2018-06-14
JP2019504369A (ja) 2019-02-14
CN109074227B (zh) 2020-06-16
EP3352071A4 (en) 2018-07-25
JP6526235B2 (ja) 2019-06-05
US10303374B2 (en) 2019-05-28
CA2978927A1 (en) 2018-05-25
CN109074227A (zh) 2018-12-21
CA2978927C (en) 2019-09-17
SG11201707304RA (en) 2018-06-28
EP3352071B1 (en) 2019-08-28
BR112017020736B8 (pt) 2021-08-17

Similar Documents

Publication Publication Date Title
WO2018094704A1 (zh) 一种数据校验的方法及存储系统
US11243706B2 (en) Fragment management method and fragment management apparatus
JP2020035300A (ja) 情報処理装置および制御方法
US20200183831A1 (en) Storage system and system garbage collection method
US11640244B2 (en) Intelligent block deallocation verification
CN112527186A (zh) 一种存储系统、存储节点和数据存储方法
CN112513804B (zh) 一种数据处理方法及装置
US10282116B2 (en) Method and system for hardware accelerated cache flush
US10310758B2 (en) Storage system and storage control method
US20200097396A1 (en) Storage system having non-volatile memory device
CN117631968A (zh) 存储服务器以及存储服务器的操作方法
WO2018075676A1 (en) Efficient flash management for multiple controllers

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 2978927

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: 11201707304R

Country of ref document: SG

ENP Entry into the national phase

Ref document number: 2017552874

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2016397189

Country of ref document: AU

Date of ref document: 20161125

Kind code of ref document: A

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112017020736

Country of ref document: BR

ENP Entry into the national phase

Ref document number: 112017020736

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20170927

NENP Non-entry into the national phase

Ref country code: DE