WO2018165943A1 - 存储控制器、数据处理芯片及数据处理方法 - Google Patents

存储控制器、数据处理芯片及数据处理方法 Download PDF

Info

Publication number
WO2018165943A1
WO2018165943A1 PCT/CN2017/076954 CN2017076954W WO2018165943A1 WO 2018165943 A1 WO2018165943 A1 WO 2018165943A1 CN 2017076954 W CN2017076954 W CN 2017076954W WO 2018165943 A1 WO2018165943 A1 WO 2018165943A1
Authority
WO
WIPO (PCT)
Prior art keywords
chunk
check
data
matrix
column
Prior art date
Application number
PCT/CN2017/076954
Other languages
English (en)
French (fr)
Inventor
曾雁星
沈建强
吕温
谈晓东
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201780088333.6A priority Critical patent/CN110431531B/zh
Priority to PCT/CN2017/076954 priority patent/WO2018165943A1/zh
Publication of WO2018165943A1 publication Critical patent/WO2018165943A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's

Definitions

  • the present application relates to the field of storage technologies, and in particular, to a storage controller, a data processing chip, and a data processing method.
  • the storage system in a large-scale storage scenario includes a storage controller and a plurality of storage media.
  • the storage medium may be a hard disk drive (English: hard disk drive, HDD) or a solid state drive (English: solid state drive, abbreviation: SSD) or two.
  • the client sends the data to be written to the storage controller through the communication network, and the data to be written by the storage controller is processed and stored in the storage medium.
  • the existing storage system generally adopts a redundancy array (English: redundant arrays of independent disks, abbreviated: RAID) technology, and the core of the RAID technology is an erasure code (English: erasure code).
  • the parameters used in the existing erasure codes are more limited. For example, the number R of data blocks into which each chunk of data is divided in the encoding process needs to be equal to the prime number minus one. These parameter restrictions lead to the inflexibility of the selection of parameters using the erasure code, which further leads to the inefficiency of the storage system using the erasure code.
  • the present application provides a memory controller that uses less erasure codes, for example, R does not need to be equal to the prime number minus one.
  • a first aspect of the present application provides a memory controller including a processor, a memory, and a communication interface.
  • the processor continuously receives the data to be written by the client through the communication interface and caches the data into the memory. After the preset amount of data to be written is buffered in the memory, the processor divides the preset number of data to be written into K data chunks to be encoded, and each data chunk includes R data blocks.
  • the processor encodes the K data chunks according to the code and the check matrix stored in the memory to obtain the first check chunk and the second check chunk, and each check chunk includes R data blocks. .
  • the erasure code used by the storage controller has fewer constraints in use, and can better match the settings of the storage array, such as the size of the chunk and the value of K. Moreover, the recovery overhead of the erasure code is low, which improves the working efficiency of the storage controller.
  • the processor is further configured to: use the communication interface to the K data chunks
  • the first parity chunk and the second parity chunk are respectively stored in K+2 storage media of the storage system where the storage controller is located.
  • the different chunks in a chunk group are respectively stored in different storage media, so that when a subsequent storage medium is damaged, the chunks stored on the storage medium can be restored, thereby improving the data security of the storage system.
  • the processor is further configured to: when the storage medium is damaged in the K+2 storage media, according to the check matrix And recovering the damaged storage medium with the data chunk stored on the undamaged storage medium in the K+2 storage medium and the first parity chunk and the second parity chunk.
  • a storage medium If a storage medium is damaged, the chunk stored on the storage medium is also damaged.
  • the damaged storage medium is restored, that is, the chunk stored on the damaged storage medium is restored.
  • it is necessary to determine which undamaged data blocks are used for the recovery of each data block of the damaged chunk according to the check matrix used by the storage controller. In the case of any chunk corruption, it is not necessary to use all the undamaged chunks in the chunk group where the damaged chunk is located.
  • a second aspect of the present application provides a data processing chip, including a circuit and a read/write interface.
  • the circuit is configured to obtain K chunks of chunks to be encoded through the read/write interface, and each data chunk includes R data.
  • the data processing chip is used in a storage controller; the circuit is further configured to: the K data chunks, the first school through the read/write interface
  • the check chunk and the second check chunk are stored in the memory of the storage controller, so that the storage controller stores the K data chunk, the first check chunk and the second check chunk respectively into the storage control K+2 storage media of the storage system where the device is located.
  • the circuit is further configured to: when the storage medium is damaged in the K+2 storage medium, according to the check matrix and The data chunk stored on the undamaged storage medium in the K+2 storage medium and the first check chunk and the second check chunk restore the damaged storage medium.
  • the method further includes: storing the K data chunks, the first parity chunks, and the second parity chunks in the storage controller
  • the storage system is in K+2 storage media.
  • the method further includes: when the storage medium is damaged in the K+2 storage medium, according to the check matrix and the The data chunk stored on the undamaged storage medium in the K+2 storage medium and at least one of the first check chunk and the second check chunk recover the damaged storage medium.
  • the (k-1)*R+1th column to the k*Rth column are the chunk column set of the kth data chunk in the K data chunks, K ⁇ k ⁇ 1,
  • the K*R+1 column to the (K+1)*R column in the check matrix is a chunk column set corresponding to the first check chunk, and the (K+1)*R+1 in the check matrix
  • the column to the (K+2)*R column is the chunk column set of the second check chunk.
  • the check matrix is obtained after the standard check matrix H or the N-time swap operation is performed by the standard check matrix, and N ⁇ 1.
  • the swap operation refers to swapping any two chunk column sets.
  • the chunk column set of the kth data chunk in the K data chunks is composed of a positive diagonal matrix and M k
  • the chunk column set of the first check chunk is composed of a diagonal matrix and a M K +1 composition
  • the chunk column set of the second check chunk is composed of a 0 matrix and a positive diagonal matrix
  • the M k and the M K+1 are binary matrices corresponding to different elements in the Galois field GF(2 R ).
  • the (k-1)*R+1th column to the k*Rth column respectively correspond to R data blocks of the kth data chunk in the K data chunks
  • the K*R in the check matrix +1 column to (K+1)*R column respectively correspond to R data blocks of the first check chunk
  • the *R column corresponds to the R data blocks of the second parity chunk.
  • a fourth aspect of the present application provides a memory controller including a processor, a memory, and a communication interface.
  • the processor continuously receives the data to be written by the client through the communication interface and caches the data into the memory. After the preset amount of data to be written is buffered in the memory, the processor divides the preset number of data to be written into K data chunks to be encoded, and each data chunk includes R data blocks.
  • the processor encodes the K data chunks according to the code and the check matrix stored in the memory to obtain the first check chunk and the second check chunk, and each check chunk includes R data blocks. .
  • the erasure code used by the storage controller has fewer constraints in use, and can better match the settings of the storage array, such as the size of the chunk and the value of K. Moreover, the recovery overhead of the erasure code is low, which improves the working efficiency of the storage controller.
  • the processor is further configured to: send the K data chunks through the communication interface
  • the first parity chunk and the second parity chunk are respectively stored in K+2 storage media of the storage system where the storage controller is located.
  • the different chunks in a chunk group are respectively stored in different storage media, so that when a subsequent storage medium is damaged, the chunks stored on the storage medium can be restored, thereby improving the data security of the storage system.
  • the processor is further configured to: when the storage medium is damaged in the K+2 storage medium, according to the check matrix And recovering the damaged storage medium with the data chunk stored on the undamaged storage medium in the K+2 storage medium and the first parity chunk and the second parity chunk.
  • a storage medium If a storage medium is damaged, the chunk stored on the storage medium is also damaged.
  • the damaged storage medium is restored, that is, the chunk stored on the damaged storage medium is restored.
  • it is necessary to determine which undamaged data blocks are used for the recovery of each data block of the damaged chunk according to the check matrix used by the storage controller. In the case of any chunk corruption, it is not necessary to use all the undamaged chunks in the chunk group where the damaged chunk is located.
  • a fifth aspect of the present application provides a data processing chip, including a circuit and a read/write interface.
  • the circuit is configured to obtain K chunks of chunks to be encoded through the read/write interface, and each data chunk includes R data.
  • the data processing chip is applied to storage control
  • the circuit is further configured to store the K data chunks, the first parity chunk, and the second parity chunk into the memory of the storage controller through the read/write interface, so that the storage controller
  • the K data chunks, the first parity chunks, and the second parity chunks are respectively stored in K+2 storage media of the storage system where the storage controller is located.
  • the circuit is further configured to: when the storage medium is damaged in the K+2 storage medium, according to the check matrix and The data chunk stored on the undamaged storage medium in the K+2 storage medium and the first check chunk and the second check chunk restore the damaged storage medium.
  • the method further includes: storing the K data chunks, the first parity chunks, and the second parity chunks in the storage controller
  • the storage system is in K+2 storage media.
  • the method further includes: when the storage medium is damaged in the K+2 storage media, according to the check matrix and the The data chunk stored on the undamaged storage medium in the K+2 storage medium and at least one of the first check chunk and the second check chunk recover the damaged storage medium.
  • the check matrix is 2*R rows, in the check matrix, the (k-1)*R+1th column to the k*Rth column are the chunk column set of the kth data chunk in the K data chunks, K ⁇ k ⁇ 1,
  • the K*R+1 column to the (K+1)*R column in the check matrix is a chunk column set corresponding to the first check chunk, and the (K+1)*R+1 in the check matrix
  • the column to the (K+2)*R column is the chunk column set of the second check chunk.
  • the check matrix is obtained after the standard check matrix H or the N-time swap operation is performed by the standard check matrix, and N ⁇ 1.
  • the swap operation refers to swapping any two chunk column sets.
  • the chunk column set of the hth data chunk in the K data chunks is composed of a positive diagonal matrix and M h , K ⁇ h ⁇ 1 and h is an odd number, and the jth of the K data chunks
  • the chunk column set of the data chunk is composed of the anti-angle matrix and M j , K ⁇ j ⁇ 1 and j is an even number
  • the chunk column set of the first check chunk is composed of a diagonal matrix and M K+1
  • the first second check the chunk chunk matrix column set consisting of 0 and positive diagonal matrices, the M h, and the M j M K + 1 of the Galois field GF (2 R) corresponding to different elements in the binary matrix. That is, a binary matrix corresponding to K+1 different elements in GF( 2R ) is used in the standard check matrix.
  • the (k-1)*R+1th column to the k*Rth column respectively correspond to R data blocks of the kth data chunk in the K data chunks
  • the K*R in the check matrix +1 column to (K+1)*R column respectively correspond to R data blocks of the first check chunk
  • the *R column corresponds to the R data blocks of the second parity chunk.
  • the seventh aspect of the present application provides a storage medium, where the program is stored, and when the program is run by the computing device, the computing device performs the data processing method provided by any of the foregoing third or third aspects.
  • the storage medium includes, but is not limited to, a read only memory, a random access memory, a flash memory, an HDD, or an SSD.
  • the eighth aspect of the present application provides a computer program product, comprising: program instructions, when the computer program product is executed by a storage controller, the storage controller performs any of the foregoing third aspect or the third aspect
  • the data processing method provided by the method may be a software installation package, and if the data processing method provided by any of the foregoing third aspect or the third aspect is required, the computer program product may be downloaded and executed on the storage controller.
  • Computer program product may be a software installation package, and if the data processing method provided by any of the foregoing third aspect or the third aspect is required, the computer program product may be downloaded and executed on the storage controller.
  • the ninth aspect of the present application provides a storage medium, where the program is stored, and when the program is run by the computing device, the computing device performs the data processing method provided by any of the foregoing sixth or sixth aspects.
  • the storage medium includes, but is not limited to, a read only memory, a random access memory, a flash memory, an HDD, or an SSD.
  • a tenth aspect of the present application provides a computer program product, comprising: program instructions, when the computer program product is executed by a storage controller, the memory controller performs any of the foregoing sixth or sixth aspects
  • the data processing method provided by the method may be a software installation package, and if the data processing method provided by any of the foregoing sixth aspect or the sixth aspect is required, the computer program product may be downloaded and executed on the storage controller.
  • Computer program product may be a software installation package, and if the data processing method provided by any of the foregoing sixth aspect or the sixth aspect is required, the computer program product may be downloaded and executed on the storage controller.
  • 1-1 is a schematic structural diagram of a storage system according to an embodiment of the present disclosure
  • 1-2 is a schematic structural diagram of a storage system according to another embodiment of the present disclosure.
  • 2-1 is a schematic diagram of a diagonal matrix provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a storage system according to an embodiment of the present disclosure.
  • FIG. 4 is a schematic diagram of a check matrix according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a check matrix according to another embodiment of the present application.
  • 6-1 is a schematic diagram of a check matrix provided by an embodiment of the present application.
  • FIG. 6-2 is a schematic diagram of a check matrix provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a storage controller according to an embodiment of the present disclosure.
  • FIG. 8 is a schematic structural diagram of a storage controller according to an embodiment of the present disclosure.
  • FIG. 9 is a schematic structural diagram of a storage controller according to an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a storage controller according to an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of a data processing chip according to an embodiment of the present application.
  • the XOR operation between two data blocks refers to the exclusive OR operation of each bit of data of two data blocks.
  • the first bit of the data block 1 is XORed with the first bit of the data block 2, and the first bit of the data block 3 is obtained, and so on, until the last bit of the data block 1 is XORed with the last bit of the data block 2. , get the last bit of data block 3.
  • the recovery overhead is a measure of the storage medium required to recover a damaged storage medium in the case of any storage medium in which K+2 storage media storing a large chunk of data is damaged.
  • the recovery overhead is equal to the ratio of the size of the data block read from the uncorrupted storage medium when the damaged storage medium is restored to the size of the data block of all the data chunks in the chunk group. Therefore, the smaller the recovery overhead, the shorter the recovery time required in the event of storage media corruption.
  • the definition of the chunk group will be explained in detail below.
  • a 0 matrix is a matrix having R rows and R columns, and each element in the 0 matrix is 0.
  • the binary matrix corresponding to an element GF(t) in GF(2 R ) has R rows and R columns.
  • the first behavior of the binary matrix a binary expression of t.
  • the first column of the first row of the binary matrix represents 2 0
  • the second column of the first row represents 2 1
  • the third column of the first row represents 2 2
  • Column R represents 2 R-1 , so that all of the values of 1 to 2 R -1 can be expressed in binary form by all R columns of the first row of the binary matrix.
  • the polynomial corresponding to the binary expression of t is a polynomial with respect to x and the maximum number of stages is R-1. If the coefficient of 2 r in the binary expression of t is 1, the coefficient of x r in the polynomial is 1, and if the coefficient of 2 r in the binary expression of t is 0, the coefficient of x r in the polynomial is 0, 0 ⁇ r ⁇ R-1.
  • the source polynomial is a polynomial with a maximum number of stages for x, and the source polynomial may not divide all polynomials other than 1 with respect to x. In the case of R determination, there may be multiple source polynomials. Before calculating the binary matrix corresponding to the elements of GF(2 R ), one of the multiple source polynomials is selected, and then each of GF(2 R ) is calculated. The elements use the selected source polynomial.
  • the binary matrix corresponding to GF(1) is:
  • the first row 1000 is a binary expression corresponding to 1.
  • the second behavior is an expression corresponding to the result of 1*x/1+x+x 4 , that is, an expression corresponding to x.
  • the third behavior is an expression corresponding to the result of 1*x*x/1+x+x 4 , that is, an expression corresponding to x 2 .
  • the fourth behavior is the expression corresponding to the result of 1*x*x*x/1+x+x 4 , that is, the expression corresponding to x 3 .
  • the binary matrix corresponding to GF(2) is:
  • the first line 0100 is a binary expression corresponding to 2.
  • the expression corresponding to the result of the second behavior x*x/1+x+x 4 that is, the expression corresponding to x 2 .
  • the expression corresponding to the result of the third behavior x*x*x/1+x+x 4 that is, the expression corresponding to x 3 .
  • the expression corresponding to the result of the fourth behavior x*x*x*x/1+x+x 4 that is, the expression corresponding to 1+x.
  • the binary matrix corresponding to GF(3) is:
  • the first row 1100 is a binary expression corresponding to 3.
  • the expression corresponding to the result of the second behavior (1+x)*x/1+x+x 4 that is, the expression corresponding to x+x 2 .
  • the expression corresponding to the result of the third behavior (1+x)*x*x/1+x+x 4 that is, the expression corresponding to x 2 +x 3 .
  • the fourth row (1 + x) * x * x * x * x / 1 + x + x 4 corresponds to the results of the expression, i.e. 1 + x + x 3 corresponding expression.
  • the check matrix used in this application needs to adopt the binary matrix corresponding to K+1 different elements of GF(2 R ).
  • Each element of GF (2 R) corresponding to a total of R rows binary matrix, K + parity check matrix used in a different matrix elements corresponding binary, between lines inside the respective binary matrix can be exchanged,
  • the binary matrix corresponding to K+1 different elements obtained after the exchange also belongs to the binary matrix corresponding to K+1 different elements of GF(2 R ).
  • the first and second rows of each binary matrix are interchanged, and the K+1 binary matrix after the interchange can also be used as GF( 2R ).
  • K+1 binary elements corresponding to the binary matrix are interchanged, and the K+1 binary matrix after the interchange can also be used as GF( 2R ).
  • Figure 1-1 and Figure 1-2 show two different architecture storage systems.
  • the storage system in Figure 1-1 is also referred to as a storage array, and the storage controller and storage medium are both disposed inside the storage array.
  • 1-2 are distributed storage systems including a plurality of storage nodes, each of which may actually be a server. At least one storage node of the storage system includes a storage controller, each storage node includes a storage medium, and each storage node establishes a communication connection through the communication network.
  • the storage controller in the storage array of FIG. 1-1 processes the data to be written sent by the client to the storage array, and stores each data chunk and check chunk obtained by the encoding into the storage medium of the storage array.
  • Each storage controller in Figure 1-2 can receive the data to be written by the client and process it.
  • each data chunk and check chunk obtained by a storage controller can be stored not only in the storage medium of the storage node where the storage controller is located, but also sent to other storage networks through the communication network.
  • a storage medium of a storage node to implement distributed storage.
  • each of the plurality of storage controllers is responsible for one storage node group in the storage system, and each storage node group includes at least A storage node.
  • the storage controller in a storage node group is responsible for receiving data to be written sent by the client, and storing each chunk obtained by the encoding into different storage nodes of the storage node group.
  • the memory controller described in this application may refer to any of the memory controllers in FIG. 1-1 or FIG. 1-2.
  • the storage controller continuously receives the data to be written sent by the client, and after receiving a preset amount of data to be written, the storage controller sets the preset quantity.
  • the data to be written is divided into K data chunks to be encoded, and K is a parameter set by the user.
  • K is a parameter set by the user.
  • Each data chunk is divided into R data blocks, and according to the K*R data blocks and an encoding method of the erasure code preset in the storage controller, two check chunks are generated, and each check chunk includes R data blocks.
  • the K data chunks and the two check chunks form a chunk group.
  • the size of each chunk can be set as needed, for example, 4096 bytes.
  • the storage controller After a chunk group is generated, the storage controller stores each chunk in the chunk group into an SSD.
  • the storage medium used by the storage system is similar to HDD or other types of devices. After the storage controller stores each chunk in a chunk group into the corresponding SSD, it continues to form another chunk group according to the data to be written sent by the client and processes it in a similar manner.
  • each chunk is divided into R data blocks for storage in the SSD, as shown in Figure 3. Although all R data blocks of each chunk are stored in the same SSD, the storage addresses (physical storage addresses or logical storage addresses) of the R data blocks may be discontinuous. Generally, each data block in a chunk group has the same size.
  • RDP row-diagonal parity
  • R+1 is prime
  • R+1>K or more is 256. If the value of R is too large, it will lead to a large increase in the computational complexity of encoding and decoding, thereby affecting the performance of the storage system. In order to make the value of R more flexible, it is necessary to break the R+1 of the traditional erasure code as the prime number and the constraint of R+1>K.
  • the data block corresponding to each check chunk is obtained by XORing the U-1 data blocks of other chunks other than the check chunk, and U is an integer greater than 3.
  • the value of U may be different.
  • the storage controller determines, by using the check matrix preset in the storage controller, which U-1 data block is operated by the data block corresponding to each check chunk.
  • U-1 data blocks of one data block are generated and the remaining one data block can be obtained by performing an exclusive OR operation on any U-1 data blocks. Therefore, when any SSD is damaged, the memory controller can also know through which parity block that each data block of the chunk stored on the damaged SSD can be operated by which data block.
  • the number of rows of the check matrix is 2*R and the number of columns is (K+2)*R.
  • Each column of the check matrix corresponds to one data block, and each row corresponds to an exclusive OR equation.
  • X-Y refers to the Yth data block of the data chunk X, which will be referred to as a data block X-Y, K ⁇ X ⁇ 1, and R ⁇ Y ⁇ 1.
  • the two check chunks are called check chunk P and check chunk Q respectively, so PY refers to the Yth data block of the check chunk P, which is hereinafter referred to as data block PY, and QY refers to checksum.
  • the Yth data block of chunk Q hereinafter referred to as data block QY, R ⁇ Y ⁇ 1.
  • the R columns corresponding to each chunk in the check matrix are collectively referred to as a chunk column set. Therefore, in a check matrix with a row number of 2*R and a column number of (K+2)*R, there is a total of K+2.
  • a chunk column collection as shown in Figure 4.
  • the first to the fifth columns of the check matrix belong to the chunk column set of the data chunk 1, and the R+1th to the second R columns of the check matrix belong to the chunk column set of the data chunk 2, and so on, and the check matrix
  • the K*R+1 to (K+1)*R columns of the checksum belong to the chunk column set of the check chunk P, and the (K+1)*R to (K+2)*R columns of the check matrix A chunk column set belonging to the check chunk Q.
  • Each row of the check matrix has U coordinates of 1, indicating that any (U-1) data blocks in the U data blocks corresponding to the U coordinates are XORed to obtain an XOR in the U data blocks.
  • a block of data for the operation As shown in FIG. 5, the first row of the check matrix indicates that any one of the data block 1-1, the data block 2-1, the data block 3-1, and the data block P-1 can be obtained by performing an exclusive OR operation. A block of data that is not involved in an exclusive OR operation.
  • the check matrix provided by the embodiment of the present application may be a standard check matrix, or may be obtained by performing N times of replacement operations by the standard check matrix, and N ⁇ 1.
  • a swap operation refers to swapping any two chunk column sets in a matrix with 2*R rows and columns (K+2)*R. Since the standard check matrix actually provides 2*R XOR equations, each XOR equation is used to derive 1 data block, so the matrix obtained after performing N swap operations on the standard check matrix, It is still possible to provide 2*R functions with the same XOR equation.
  • each set of column data chunk is a chunk is composed of the I n, the lower half of M k as an element of GF (2 R) corresponding binary Matrix, K ⁇ k ⁇ 1.
  • the upper half of the set of row parity chunk P I-chunk consists of a positive, the lower half of M K + 1 of GF (2 R) corresponding to the elements of a binary matrix.
  • the upper half of the column is a set of parity chunk chunk Q 0 is constituted by a matrix, the lower half of an I-positive configuration.
  • the binary matrix corresponding to the K+1 elements used above is a binary matrix corresponding to K+1 different elements in GF(2 R ).
  • the upper half of the column is a set of data chunk h chunk I consists of a positive
  • the lower half of M h is an element of GF (2 R) corresponding to the binary matrix
  • K ⁇ k ⁇ 1 and h is an odd number.
  • the upper half of the column j of the set of data chunk consists of a chunk anti-I
  • M j is the lower half of an element (2 R) corresponding to the binary matrix GF
  • K ⁇ j ⁇ 1 and j is an even number.
  • the upper half of the set of row parity chunk P I-chunk consists of a positive
  • the upper half of the column is a set of parity chunk chunk Q 0 is constituted by a matrix, the lower half of an I-positive configuration.
  • the binary matrix corresponding to the K+1 elements used above is a binary matrix corresponding to K+1 different elements in GF(2 R ).
  • the first storage form is a matrix with 2*R rows and (K+2)*R columns. Since each row of the check matrix represents an exclusive OR equation, the check matrix is equivalent to 2*R exclusive OR equations, and therefore, the second storage form of the check matrix is the check matrix, etc.
  • the 2*R XOR equation of the price is the first storage form.
  • the memory controller, the data processing method, and the data processing chip provided by the present application can be encoded and decoded by using any one of the check matrixes shown in FIG. 6-1 or FIG. 6-2.
  • the verification matrix provided in FIG. 5 is taken as an example to introduce the encoding process of the check chunk.
  • the parameter R used in the check matrix breaks the R+1 of the traditional erasure code and needs to be a prime constraint.
  • the column corresponding to the data block P-1 (that is, the 25th column of the check matrix) has two coordinates of 1, and the two coordinates respectively correspond to the first row and the first of the check matrix. 13 lines.
  • the coordinates corresponding to the data block Q-5 in the 13th row of the parity check matrix are also 1, so that in the process of generating the data block P-1, only The XOR equation corresponding to the first row of the check matrix is used, namely:
  • Data block P-1 data block 1-1XOR data block 2-1XOR data block 3-1.
  • Data block P-2 data block 1-2XOR data block 2-2XOR data block 3-2.
  • Data block P-3 data block 1-3XOR data block 2-3XOR data block 3-3.
  • Data block P-4 data block 1-4XOR data block 2-4XOR data block 3-4.
  • Data block P-5 data block 1-5XOR data block 2-5XOR data block 3-5.
  • Data block P-6 data block 1-6XOR data block 2-6XOR data block 3-6.
  • Data block P-7 data block 1-7XOR data block 2-7XOR data block 3-7.
  • Data block Q-1 data block 1-2XOR data block 2-3XOR data block 3-5.
  • Data block Q-2 data block 1-1XOR data block 2-4XOR data block 3-6.
  • Data block Q-3 data block 1-1 XOR data block 1-4 XOR data block 2-1 XOR data block 3-7.
  • Data block Q-4 data block 1-2XOR data block 1-3XOR data block 2-2XOR data block 3-8.
  • Data block Q-5 data block 1-6XOR data block 2-7XOR data block P-1.
  • Data block Q-6 data block 1-5XOR data block 2-8XOR data block P-2.
  • Data block Q-7 data block 1-5XOR data block 1-8XOR data block 2-5XOR data block P-3.
  • Data block Q-8 data block 1-6XOR data block 1-7XOR data block 2-6XOR data block P-4.
  • the check chunk P and the check chunk Q are encoded.
  • the following takes the chunk group obtained by using the check matrix code provided in FIG. 5 as an example to describe the process of restoring the chunk stored on the damaged SSD when the SSD of any chunk in the chunk group is damaged.
  • the three coordinates of the column in which the data block 1-1 is located are 1, the three coordinates correspond to the first row, the tenth row, and the eleventh row of the parity check matrix, respectively.
  • the XOR equation corresponding to the 11th line is required to participate in the data block 1-3, and the data block 1-3 is also damaged, so the data block 1-1 cannot be recovered by the XOR equation corresponding to the 11th line.
  • Data block 1-1 can be recovered by using the XOR equation corresponding to the 1st row or the 10th row, so the data block 1-1 actually has 2 alternative decoding methods. They are:
  • Data block 1-1 data block 2-1 XOR data block 3-1 XOR data block P-1.
  • Data block 1-1 data block 2-4XOR data block 3-6XOR data block Q-2.
  • the recovery process of the data block 1-2 is similar to the recovery process of the data block 1-1, and an exclusive OR equation corresponding to the 2nd row or the 9th row of the check matrix may be employed.
  • the exclusive OR equation corresponding to the 3rd line or the 12th line of the check matrix can be employed in the recovery process of the data block 1-3.
  • the exclusive OR equation corresponding to the 4th line or the 11th line of the check matrix can be employed in the recovery process of the data blocks 1-4.
  • the recovery process of the data block 1-5 is similar to the recovery process of the data block 1-1, and an exclusive OR equation corresponding to the 5th row or the 14th row of the check matrix may be employed.
  • the recovery process of the data blocks 1-6 is similar to the recovery process of the data block 1-1, and an exclusive OR equation corresponding to the 6th line or the 13th line of the check matrix may be employed.
  • the XOR equation corresponding to the 7th line or the 16th line of the check matrix can be employed in the recovery process of the data block 1-7.
  • the exclusive OR equation corresponding to the 8th line or the 15th line of the check matrix can be employed in the recovery process of the data block 1-8.
  • the recovery process of data chunk 1, 8 kinds of a total of 2 decoding method 8 kinds of a total of 2 decoding method.
  • These two methods although eight kinds of decoding can recover corrupted chunk, but the recovery data block for each data chunk 1, it is necessary to recover a data block read from the SSD memory controller, then the storage control The device completes the recovery process. Therefore, different decoding methods may result in different numbers of data blocks that need to be read from the SSD in the process of restoring all 8 data blocks of the data chunk 1, so that for a certain check matrix, any chunk is damaged.
  • a decoding method that requires the least number of data blocks to be read out from the SSD can be employed to reduce the recovery overhead.
  • the first row, the second row, the 12th row, the eleventh row, the 14th row, the thirteenth row, the seventh row, the first row in the check matrix provided in FIG. 5 are respectively used.
  • the XOR equation corresponding to the 8 rows restores the data block 1-1 to the data block 1-8.
  • the first row, the second row, the 9th row, the 10th row, the 5th row, the 6th row, the 13th row, the 14th row in the check matrix provided in FIG. 5 are respectively used.
  • the corresponding XOR equation recovers data block 2-1 to data block 2-8.
  • the first row, the second row, the third row, the fourth row, the ninth row, the tenth row, the eleventh row, the twelfth row in the check matrix provided in FIG. 5 are respectively used.
  • the corresponding XOR equation recovers data block 3-1 to data block 3-8.
  • the fifth row, the sixth row, the seventh row, the eighth row, the 13th row, the 14th row, the fifteenth row, the 16th row in the check matrix provided in FIG. 5 are respectively used.
  • the XOR equation corresponding to the row restores the data block P-1 to the data block P-8.
  • the 9th row, the 10th row, the 11th row, the 12th row, the 13th row, the 14th row, the 15th row, and the 16th in the check matrix provided in FIG. 5 are respectively used.
  • the XOR equation corresponding to the row restores the data block P-1 to the data block P-8.
  • the storage controller can recover the damaged chunk by using a decoding method with a lower recovery cost in case any chunk is damaged. For example, if the data chunk 1 is corrupted, the recovery overhead is 0.75.
  • FIG. 7 provides a memory controller 200 that can be used in the memory system shown in FIG. 1-1 or FIG. 1-2.
  • the memory controller 200 includes a bus 202, a processor 204, a memory 208, and a communication interface 206.
  • the processor 204, the memory 208, and the communication interface 206 communicate via a bus 202.
  • the processor 204 can be a central processing unit (English: central processing unit, abbreviation: CPU).
  • the memory 208 may include a volatile memory (English: volatile memory) (English: random access memory, abbreviation: RAM).
  • the memory 208 may also include a non-volatile memory such as a read-only memory (ROM), a flash memory, an HDD or an SSD.
  • the communication interface 206 includes a network interface and a storage medium read/write interface, which are respectively used to acquire data to be written sent by the client and write the obtained chunk group into the storage medium.
  • the memory 208 stores an encoding program and K data chunks.
  • the processor 204 reads the encoding program and the K data chunks from the memory 208 to execute the foregoing encoding process to generate a chunk group, and through the communication interface 206, each of the chunk groups is The chunk is stored in different storage media.
  • the memory 208 stores therein the decoding program and the data blocks required in the process of restoring the chunk stored on the damaged storage medium.
  • the processor 204 reads the decoding program and the data block required to restore the chunk stored on the damaged storage medium from the memory 208 to execute the foregoing decoding method to recover the damage. Chunk stored on the storage medium.
  • the encoding program and the decoding program can be combined into one program.
  • the check matrix is stored in the memory 208 in a plurality of ways and can be stored in the encoding program and the decoding program in the form of a matrix. It may also be stored in the memory 208 in the form of 2*R XOR equations, and the 2*R XOR equations are fused to the encoding program and the decoding program.
  • the processor 204 executes the encoding program, and obtains two check chunks in accordance with the encoding process described above.
  • the decoding process is similar to the encoding process.
  • the encoding process and the decoding process are preset, so that the check matrix may not be stored in the memory 208, but 2*R XOR equations are stored directly in the encoding program and the decoding program.
  • the memory 208 also stores which XOR equations of the 2*R XOR equations are used in the encoding or decoding process for different chunks, and the order in which the XOR equations are used.
  • the encoding program in order to obtain a total of 16 data blocks of the check chunk P and the check chunk Q, the encoding program directly instructs execution of the 16 exclusive OR equations described above. Similarly, if a chunk is corrupted, in order to recover 8 blocks of corrupted chunks, the decoding program directly indicates the execution of the chunk damage described above, and the eight XOR equations used in the recovery process.
  • the memory controller provided above has fewer restrictions on the use of erasure codes and is more compatible with the storage system.
  • the memory controller 400 includes a bus 402, a processor 404, a memory 408, a data processing chip 410, and a communication interface 406.
  • the processor 404, the memory 408, and the communication interface 406 communicate via a bus 402.
  • the processor 404 can be a CPU.
  • Memory 408 can include volatile memory.
  • Memory 408 can also include non-volatile memory.
  • the communication interface 406 includes a network interface and a storage medium read/write interface, which are respectively used to acquire data to be written sent by the client and store the chunk group obtained after the encoding into the storage medium.
  • the data processing chip 410 can be implemented by a circuit, which can be an application-specific integrated circuit (ASIC) or a programmable logic device (abbreviated as PLD).
  • the above PLD may be a complex programmable logic device (English: complex programmable logic device, abbreviation: CPLD), a field programmable gate array (English: field programmable gate array, abbreviated: FPGA), general array logic (English: general array logic, Abbreviation: GAL) or any combination thereof.
  • the data processing chip 410 may specifically include an address unit 4102, an operation unit 4104, a storage unit 4106, and a read/write interface 4108.
  • the location unit 4102, the operation unit 4104, and the storage unit 4106 can be actually integrated into one circuit.
  • the read/write interface 4108 is connected to the bus 402 for passing through the scene in which the data processing chip 410 performs encoding.
  • the bus 402 acquires the data block stored in the memory 408 and stores it in the storage unit 4106, and sends the encoded data block to the memory 208 via the bus 402, so that the storage controller 200 generates a chunk group and stores each chunk in the chunk group.
  • the read/write interface 4108 is further configured to acquire, in the scenario where the data processing chip 410 performs decoding, the data block required in the recovery process through the bus 402 and store it in the storage unit 4106, and send the restored data block to the memory 208.
  • the function of the location unit 4102 is similar to the check matrix.
  • the location unit 4102 indicates which data blocks in the storage unit 4106 should be XORed in the process of performing an exclusive OR operation by the operation unit 4104, so that the operation unit 4104 is from the storage unit 4106. Get the corresponding data block to complete the XOR operation.
  • the operation unit 4104 acquires a plurality of data blocks that need to be XORed in the process of the exclusive OR operation from the storage unit 4106, and stores the obtained data block in the storage unit 4106 after performing an exclusive OR operation, and then executes the next time. XOR operation.
  • the data processing chip provided above has fewer restrictions on the use of erasure codes and is more compatible with the storage system.
  • the methods described in connection with the present disclosure can be implemented by a processor executing software instructions.
  • the software instructions can be composed of corresponding software modules, which can be stored in RAM, flash memory, ROM, erasable programmable read only memory (English: erasable programmable read only memory, abbreviation: EPROM), electrically erasable Programming an audio-only memory (English: electrically erasable programmable read only memory, EEPROM), HDD, SSD, optical disc, or any other form of storage medium known in the art.
  • the functions described herein may be implemented in hardware or software.
  • the functions may be stored in a computer readable medium or transmitted as one or more instructions or code on a computer readable medium.
  • a storage medium may be any available media that can be accessed by a general purpose or special purpose computer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)
  • Detection And Correction Of Errors (AREA)

Abstract

一种存储控制器,该存储控制器运行时,根据校验矩阵对从客户端获取的待编码的K个数据大块chunk进行编码,以生成2个校验chunk,每个数据chunk包括R个数据块,R为2 Q,Q为正整数。该存储控制器还将这K+2个chunk分别存储到不同的存储介质。后续如果有任一chunk损坏的情况下,该存储控制器可以通过该校验矩阵和未损坏的chunk恢复损坏的chunk。

Description

存储控制器、数据处理芯片及数据处理方法 技术领域
本申请涉及存储技术领域,尤其涉及一种存储控制器,数据处理芯片以及数据处理方法。
背景技术
大规模存储场景中的存储系统包括存储控制器和多个存储介质,存储介质可以由硬盘(英文:hard disk drive,缩写:HDD)或固态硬盘(英文:solid state drive,缩写:SSD)或两者的组合构成。客户端通过通信网络,将待写入数据发送至存储控制器,存储控制器对待写入的数据进行处理并存入存储介质中。现有的存储系统一般采用了由独立磁盘构成的具有冗余能力的阵列(英文:redundant arrays of independent disks,缩写:RAID)技术,而RAID技术的核心就是纠删码(英文:erasure code)。
现有的纠删码所使用的参数的限制较多,例如,每个数据大块(英文:chunk)在编码过程中被分为的数据块的数量R需要等于素数减去1。这些参数限制导致了运用纠删码的参数的选取不够灵活,从而进一步导致了存储系统使用纠删码的效率低下。
发明内容
本申请提供了一种存储控制器,该存储控制器采用的纠删码的使用限制较少,例如,R无须等于素数减去1。
本申请的第一方面提供了一种存储控制器,包括处理器、存储器和通信接口。该处理器,用于通过该通信接口获取待编码的K个数据chunk,并将该K个数据chunk缓存入该存储器,每个数据chunk包括R个数据块,R=2Q,Q和K均为正整数。
该处理器通过该通信接口持续接收客户端发来的待写入数据并缓存入该存储器。该存储器中缓存了预设数量大小的待写入数据后,该处理器将该预设数量的待写入数据分成K个待编码的数据chunk,每个数据chunk包括R个数据块。
随后,该处理器根据该存储器中存储的代码和校验矩阵,对该K个数据chunk进行编码,以获取第一校验chunk和第二校验chunk,每个校验chunk包括R个数据块。
以上提供的存储控制器采用的纠删码在使用过程中,约束条件较少,能够比较好的兼容存储阵列的设置,例如chunk的大小、K的取值。并且,该纠删码的恢复开销低,提升了该存储控制器的工作效率。
结合第一方面,在第一方面的第一种实现方式中,在获取该第一校验chunk和第二校验chunk后,该处理器还用于,通过该通信接口将该K个数据chunk、该第一校验chunk和该第二校验chunk分别存入该存储控制器所在的存储系统的K+2个存储介质中。
将一个chunk group中的不同chunk分别存入不同的存储介质中,保证了后续某一存储介质损坏的情况下,该存储介质上存储的chunk都能被恢复,提升了存储系统的数据安全。
结合第一方面的第一种实现方式,在第一方面的第二种实现方式中,该处理器还用于,当该K+2个存储介质中有存储介质损坏时,根据该校验矩阵和该K+2存储介质中未损坏的存储介质上存储的数据chunk和该第一校验chunk和该第二校验chunk,恢复该损坏的存储介质。
如果某一存储介质损坏,该存储介质上存储的chunk也损坏了。恢复该损坏的存储介质,也即恢复该损坏的存储介质上存储的chunk。恢复损坏的chunk的过程中需要根据该存储控制器使用的校验矩阵,判断该损坏的chunk的每一个数据块的恢复要用到哪些未损坏的数据块。任一chunk损坏的情况下,不一定会使用到该损坏的chunk所在的chunk group中全部未损坏的chunk。
本申请的第二方面提供了一种数据处理芯片,包括电路和读写接口;该电路用于,通过该读写接口获取待编码的K个数据大块chunk,每个数据chunk包括R个数据块,R=2Q,Q和K均为正整数;该电路还用于,根据校验矩阵和该K数据chunk生成第一校验chunk和第二校验chunk,每个校验chunk包括R个数据块。
结合第二方面,在第二方面的第一种实现方式中,该数据处理芯片运用于存储控制器中;该电路,还用于通过该读写接口将该K个数据chunk、该第一校验chunk和该第二校验chunk存入该存储控制器的存储器中,以便该存储控制器将该K个数据chunk、该第一校验chunk和该第二校验chunk分别存入该存储控制器所在的存储系统的K+2个存储介质中。
结合第二方面的第一种实现方式,在第二方面的第二种实现方式中,该电路还用于,当该K+2个存储介质中有存储介质损坏时,根据该校验矩阵和该K+2存储介质中未损坏的存储介质上存储的数据chunk和该第一校验chunk和该第二校验chunk,恢复该损坏的存储介质。
本申请的第三方面提供了一种数据处理方法,该方法适用于存储控制器,该方法包括:获取待编码的K个数据大块chunk并缓存该K个数据chunk,每个数据chunk包括R个数据块,R=2Q,Q和K均为正整数;根据校验矩阵和该K个数据chunk生成第一校验chunk和第二校验chunk,每个校验chunk包括R个数据块。
结合第三方面,在第三方面的第一种实现方式中,该方法还包括:将该K个数据chunk、该第一校验chunk和该第二校验chunk分别存入该存储控制器所在的存储系统的K+2个存储介质中。
结合第三方面的第一种实现方式,在第三方面的第二种实现方式中,该方法还包括:当该K+2个存储介质中有存储介质损坏时,根据该校验矩阵和该K+2个存储介质中未损坏的存储介质上存储的数据chunk和该第一校验chunk和该第二校验chunk中的至少一个,恢复该损坏的存储介质。
在第一方面或第一方面的任一种实现方式或第二方面或第二方面的任一种实现方式或第三方面或第三方面的任一种实现方式中,采用的校验矩阵有2*R行,该校验矩阵中第(k-1)*R+1列至第k*R列为该K个数据chunk中第k个数据chunk的chunk列集合,K≥k≥1,该校验矩阵中第K*R+1列至第(K+1)*R列为对应该第一校验chunk的chunk列集合,该校验矩阵中第(K+1)*R+1列至第(K+2)*R列为该第二校验chunk的chunk列集合。该校验矩阵为标准校验矩阵H或由标准校验矩阵执行N次调换操作后得到,N≥1,该 调换操作指将任意两个chunk列集合调换。该标准校验矩阵中,该K个数据chunk中第k个数据chunk的chunk列集合由正对角矩阵和Mk构成,该第一校验chunk的chunk列集合由正对角矩阵和MK+1构成,该第二校验chunk的chunk列集合由0矩阵和正对角矩阵构成,该Mk和该MK+1为伽罗华域GF(2R)中不同元素对应的二进制矩阵。
该校验矩阵中第(k-1)*R+1列至第k*R列分别对应该K个数据chunk中第k个数据chunk的R个数据块,该校验矩阵中第K*R+1列至第(K+1)*R列分别对应该第一校验chunk的R个数据块,该校验矩阵中第(K+1)*R+1列至第(K+2)*R列分别对应该第二校验chunk的R个数据块。
本申请的第四方面提供了一种存储控制器,包括处理器、存储器和通信接口。该处理器,用于通过该通信接口获取待编码的K个数据chunk,并将该K个数据chunk缓存入该存储器,每个数据chunk包括R个数据块,R=2Q,Q和K均为正整数。
该处理器通过该通信接口持续接收客户端发来的待写入数据并缓存入该存储器。该存储器中缓存了预设数量大小的待写入数据后,该处理器将该预设数量的待写入数据分成K个待编码的数据chunk,每个数据chunk包括R个数据块。
随后,该处理器根据该存储器中存储的代码和校验矩阵,对该K个数据chunk进行编码,以获取第一校验chunk和第二校验chunk,每个校验chunk包括R个数据块。
以上提供的存储控制器采用的纠删码在使用过程中,约束条件较少,能够比较好的兼容存储阵列的设置,例如chunk的大小、K的取值。并且,该纠删码的恢复开销低,提升了该存储控制器的工作效率。
结合第四方面,在第四方面的第一种实现方式中,在获取该第一校验chunk和第二校验chunk后,该处理器还用于,通过该通信接口将该K个数据chunk、该第一校验chunk和该第二校验chunk分别存入该存储控制器所在的存储系统的K+2个存储介质中。
将一个chunk group中的不同chunk分别存入不同的存储介质中,保证了后续某一存储介质损坏的情况下,该存储介质上存储的chunk都能被恢复,提升了存储系统的数据安全。
结合第四方面的第一种实现方式,在第四方面的第二种实现方式中,该处理器还用于,当该K+2个存储介质中有存储介质损坏时,根据该校验矩阵和该K+2存储介质中未损坏的存储介质上存储的数据chunk和该第一校验chunk和该第二校验chunk,恢复该损坏的存储介质。
如果某一存储介质损坏,该存储介质上存储的chunk也损坏了。恢复该损坏的存储介质,也即恢复该损坏的存储介质上存储的chunk。恢复这些损坏的chunk的过程中需要根据该存储控制器使用的校验矩阵,判断该损坏的chunk的每一个数据块的恢复要用到哪些未损坏的数据块。任一chunk损坏的情况下,不一定会使用到该损坏的chunk所在的chunk group中全部未损坏的chunk。
本申请的第五方面提供了一种数据处理芯片,包括电路和读写接口;该电路用于,通过该读写接口获取待编码的K个数据大块chunk,每个数据chunk包括R个数据块,R=2Q,Q和K均为正整数;该电路还用于,根据校验矩阵和该K数据chunk生成第一校验chunk和第二校验chunk,每个校验chunk包括R个数据块。
结合第五方面,在第五方面的第一种实现方式中,该数据处理芯片运用于存储控制 器中;该电路,还用于通过该读写接口将该K个数据chunk、该第一校验chunk和该第二校验chunk存入该存储控制器的存储器中,以便该存储控制器将该K个数据chunk、该第一校验chunk和该第二校验chunk分别存入该存储控制器所在的存储系统的K+2个存储介质中。
结合第五方面的第一种实现方式,在第五方面的第二种实现方式中,该电路还用于,当该K+2个存储介质中有存储介质损坏时,根据该校验矩阵和该K+2存储介质中未损坏的存储介质上存储的数据chunk和该第一校验chunk和该第二校验chunk,恢复该损坏的存储介质。
本申请的第六方面提供了一种数据处理方法,该方法适用于存储控制器,该方法包括:获取待编码的K个数据大块chunk并缓存该K个数据chunk,每个数据chunk包括R个数据块,R=2Q,Q和K均为正整数;根据校验矩阵和该K个数据chunk生成第一校验chunk和第二校验chunk,每个校验chunk包括R个数据块。
结合第六方面,在第六方面的第一种实现方式中,该方法还包括:将该K个数据chunk、该第一校验chunk和该第二校验chunk分别存入该存储控制器所在的存储系统的K+2个存储介质中。
结合第六方面的第一种实现方式,在第六方面的第二种实现方式中,该方法还包括:当该K+2个存储介质中有存储介质损坏时,根据该校验矩阵和该K+2个存储介质中未损坏的存储介质上存储的数据chunk和该第一校验chunk和该第二校验chunk中的至少一个,恢复该损坏的存储介质。
在第四方面或第四方面的任一种实现方式或第五方面或第五方面的任一种实现方式或第六方面或第六方面的任一种实现方式中,采用的校验矩阵有2*R行,该校验矩阵中第(k-1)*R+1列至第k*R列为该K个数据chunk中第k个数据chunk的chunk列集合,K≥k≥1,该校验矩阵中第K*R+1列至第(K+1)*R列为对应该第一校验chunk的chunk列集合,该校验矩阵中第(K+1)*R+1列至第(K+2)*R列为该第二校验chunk的chunk列集合。该校验矩阵为标准校验矩阵H或由标准校验矩阵执行N次调换操作后得到,N≥1,该调换操作指将任意两个chunk列集合调换。该标准校验矩阵中,该K个数据chunk中第h个数据chunk的chunk列集合由正对角矩阵和Mh构成,K≥h≥1且h为奇数,该K个数据chunk中第j个数据chunk的chunk列集合由反对角矩阵和Mj构成,K≥j≥1且j为偶数,该第一校验chunk的chunk列集合由正对角矩阵和MK+1构成,该第二校验chunk的chunk列集合由0矩阵和正对角矩阵构成,该Mh、该Mj和该MK+1为伽罗华域GF(2R)中不同元素对应的二进制矩阵。也即,该标准校验矩阵中采用了GF(2R)中K+1个不同元素对应的二进制矩阵。
该校验矩阵中第(k-1)*R+1列至第k*R列分别对应该K个数据chunk中第k个数据chunk的R个数据块,该校验矩阵中第K*R+1列至第(K+1)*R列分别对应该第一校验chunk的R个数据块,该校验矩阵中第(K+1)*R+1列至第(K+2)*R列分别对应该第二校验chunk的R个数据块。
本申请第七方面提供了一种存储介质,该存储介质中存储了程序,该程序被计算设备运行时,该计算设备执行前述第三方面或第三方面的任一实现方式提供的数据处理方法。该存储介质包括但不限于只读存储器,随机访问存储器,快闪存储器、HDD或SSD。
本申请第八方面提供了一种计算机程序产品,该计算机程序产品包括程序指令,当该计算机程序产品被存储控制器执行时,该存储控制器执行前述第三方面或第三方面的任一实现方式提供的数据处理方法。该计算机程序产品可以为一个软件安装包,在需要使用前述第三方面或第三方面的任一实现方式提供的数据处理方法的情况下,可以下载该计算机程序产品并在存储控制器上执行该计算机程序产品。
本申请第九方面提供了一种存储介质,该存储介质中存储了程序,该程序被计算设备运行时,该计算设备执行前述第六方面或第六方面的任一实现方式提供的数据处理方法。该存储介质包括但不限于只读存储器,随机访问存储器,快闪存储器、HDD或SSD。
本申请第十方面提供了一种计算机程序产品,该计算机程序产品包括程序指令,当该计算机程序产品被存储控制器执行时,该存储控制器执行前述第六方面或第六方面的任一实现方式提供的数据处理方法。该计算机程序产品可以为一个软件安装包,在需要使用前述第六方面或第六方面的任一实现方式提供的数据处理方法的情况下,可以下载该计算机程序产品并在存储控制器上执行该计算机程序产品。
附图说明
为了更清楚地说明本申请实施例的技术方法,下面将对实施例中所需要使用的附图作以简单地介绍。
图1-1为本申请一实施例提供的存储系统的结构示意图;
图1-2为本申请另一实施例提供的存储系统的结构示意图;
图2-1为本申请实施例提供的正对角矩阵的示意图;
图2-2为本申请实施例提供的反对角矩阵的示意图;
图3为本申请实施例提供的存储系统的结构示意图;
图4为本申请一实施例提供的校验矩阵的示意图;
图5为本申请另一实施例提供的校验矩阵的示意图;
图6-1为本申请实施例提供的校验矩阵的示意图;
图6-2为本申请实施例提供的校验矩阵的示意图;
图7为本申请实施例提供的存储控制器的结构示意图;
图8为本申请实施例提供的存储控制器的结构示意图;
图9为本申请实施例提供的存储控制器的结构示意图;
图10为本申请实施例提供的存储控制器的结构示意图;
图11为本申请实施例提供的数据处理芯片的结构示意图。
具体实施方式
下面结合本申请实施例中的附图,对本申请实施例中的技术方法进行描述。
本申请中各个“第一”、“第二”、“第n”之间不具有逻辑或时序上的依赖关系。
贯穿本说明书,两个数据块(英文:block)之间的异或运算(英文:exclusive OR,缩写: XOR),指代两个数据块的每一bit数据依次进行异或运算。例如数据块1的第1bit与数据块2的第1bit进行异或运算,得到数据块3的第1bit,依次类推,直至数据块1的最后一个bit与数据块2的最后一个bit进行异或运算,得到数据块3的最后一个bit。则,数据块3由数据块1和数据块2之间进行异或运算得到,也即数据块3=数据块1XOR数据块2。
贯穿本说明书,恢复开销为衡量存储一个大块组(英文:chunk group)的数据的K+2个存储介质中任一存储介质损坏的情况下,恢复损坏的存储介质所需的对存储介质的访问开销的参数。恢复开销等于恢复损坏的存储介质时从未损坏的存储介质读取的数据块的大小与该chunk group中全部数据chunk的数据块的大小之比。因此,恢复开销越小,说明了在有存储介质损坏的情况下,所需的恢复时间越短。chunk group的定义,将在下文详细说明。
贯穿本说明书,正对角矩阵I为有R行和R列的矩阵,且I中的元素除第r行第r列为1外其余元素均为0,1≤r≤R,R=4时的I如图2-1所示。
贯穿本说明书,反对角矩阵I为有R行和R列的矩阵,且I中的元素除第r行第R-r+1列为1外其余元素均为0,1≤r≤R,R=4时的I如图2-2所示。
贯穿本说明书,0矩阵为有R行和R列的矩阵,且0矩阵中每一个元素均为0。
贯穿本说明书,伽罗瓦域(英文:Galois field,缩写:GF)(2R)内除GF(0)之外任意一个元素GF(t)对应的二元矩阵的定义如下,其中1≤t≤2R-1。
GF(2R)内的一个元素GF(t)对应的二元矩阵为有R行和R列。
该二元矩阵的第一行为:t的二进制表达式。
该二元矩阵的第一行的第一列表示20,第一行的第二列表示21,第一行的第三列表示22,依次类推,该二元矩阵的第一行的第R列表示2R-1,因此通过该二元矩阵的第一行的全部R列能够以二进制的形式表达1至2R-1中的每个值。
该二元矩阵的第二行为:(t的二进制表达式对应的多项式*x/本源多项式(英文:primitive polynomial))对应的表达式。
其中,t的二进制表达式对应的多项式为一个关于x的且级数最高为R-1的多项式。如果t的二进制表达式中2r的系数为1,则该多项式中的xr的系数为1,如果t的二进制表达式中2r的系数为0,则该多项式中的xr的系数为0,0≤r≤R-1。
本源多项式为一个关于x的级数最高为R的多项式,且本源多项式不可以整除除了1之外的其余关于x的多项式。在R确定的情况下,本源多项式可以有多个,在计算GF(2R)的元素对应的二元矩阵前,从这多个本源多项式中选取一个,然后计算GF(2R)的每一个元素均采用该被选取的本源多项式。
该二元矩阵的第三行为:(t的二进制表达式对应的多项式*x2/本源多项式)对应的表达式。
依此类推,该二元矩阵的第r+1行为:(t的二进制表达式对应的多项式*xr/本源多项式)对应的表达式,0≤r≤R-1。
由于t的二进制表达式对应的多项式*xr/本源多项式的结果,为一个关于x的且级数 小于或等于R-1的多项式,该多项式对应的表达式为一个长度为R的二进制表达式,如果该多项式包括xm项,则该二进制表达式的第m+1列为1,如果该多项式不包括xm项,则该二进制表达式的第m+1列为0,0≤m≤R-1。例如,R=4的情况下,如果该多项式为1+x+x3,则该多项式对应的二进制表达式为1101。
GF(2R)的元素GF(t)转换为对应的二元矩阵的过程,以及上述多项式的除法过程,具体可以参考Error Control Coding(second edition):Shu Lin and Daniel J.Costello,Jr以及Matrix Representation of Finite Fields:William P.Wardlaw,Vol.67,No.4,October 1994,Mathematics Magazine。
以R=4且本源多项式选取1+x+x4为例,介绍GF(24)中每一个元素对应的二进制矩阵。
GF(1)对应的二进制矩阵为:
1000
0100
0010
0001
其中,第一行1000为1对应的二进制表达式。第二行为1*x/1+x+x4的结果对应的表达式,即x对应的表达式。第三行为1*x*x/1+x+x4的结果对应的表达式,即x2对应的表达式。第四行为1*x*x*x/1+x+x4的结果对应的表达式,即x3对应的表达式。
GF(2)对应的二进制矩阵为:
0100
0010
0001
1100
其中,第一行0100为2对应的二进制表达式。第二行为x*x/1+x+x4的结果对应的表达式,即x2对应的表达式。第三行为x*x*x/1+x+x4的结果对应的表达式,即x3对应的表达式。第四行为x*x*x*x/1+x+x4的结果对应的表达式,即1+x对应的表达式。
GF(3)对应的二进制矩阵为:
1100
0110
0011
1101
其中,第一行1100为3对应的二进制表达式。第二行为(1+x)*x/1+x+x4的结果对应的表达式,即x+x2对应的表达式。第三行为(1+x)*x*x/1+x+x4的结果对应的表达式,即x2+x3对应的表达式。第四行为(1+x)*x*x*x/1+x+x4的结果对应的表达式,即1+x+x3对应的表达式。
依此类推,如表1,介绍了GF(4)至GF(15)分别对应的二进制矩阵。
Figure PCTCN2017076954-appb-000001
表1
本申请采用的校验矩阵需要采用GF(2R)的K+1个不同的元素对应的二进制矩阵。 GF(2R)中的每一个元素对应的二进制矩阵一共有R行,校验矩阵采用的K+1个不同的元素对应的二进制矩阵中,各个二进制矩阵内部的行与行之间可以交换,交换后获得的K+1个不同的元素对应的二进制矩阵也属于GF(2R)的K+1个不同的元素对应的二进制矩阵。例如,这K+1个不同元素的对应的二进制矩阵中,每个二进制矩阵的第1和第2行互换,互换后的K+1个二进制矩阵也可以作为GF(2R)中的K+1个不同的元素对应的二进制矩阵。
本申请实施例所应用的架构
如图1-1和图1-2介绍了两种不同架构的存储系统。图1-1中的存储系统也称为存储阵列,存储控制器和存储介质均设置于存储阵列内部。图1-2为分布式的存储系统,该存储系统包括多个存储节点,每个存储节点实际可以为服务器。该存储系统的至少一个存储节点包括存储控制器,每个存储节点均包括存储介质,各个存储节点通过通信网络建立通信连接。
图1-1的存储阵列中的存储控制器对客户端发往该存储阵列的待写入数据进行处理,并将编码获取的各个数据chunk和校验chunk存入该存储阵列的存储介质。图1-2中的每个存储控制器均可以接收客户端发来的待写入数据并对其进行处理。图1-2所示的存储系统中,一个存储控制器编码获取的各个数据chunk和校验chunk不仅可以被存入该存储控制器所在的存储节点的存储介质,还可以通过通信网络发往其他存储节点的存储介质以实现分布式的存储。由于分布式的存储系统中,可能存在多个存储控制器并行工作,因此这多个存储控制器中的每个存储控制器负责存储系统中的一个存储节点组,每个存储节点组中包括至少一个存储节点。一个存储节点组中的存储控制器负责接收客户端发来的待写入数据,将编码获得的各个chunk存入该存储节点组的不同存储节点。本申请描述的存储控制器,可以指代图1-1或图1-2中的任一存储控制器。
如图3所示,在存储系统运行过程中,存储控制器持续接收客户端发来的待写入数据,接收到预设数量大小的待写入数据后,存储控制器将所述预设数量的待写入数据分成K个待编码的数据chunk,K为用户设置的参数。每个数据chunk被分为R个数据块,并根据这K*R个数据块和预设于存储控制器中的纠删码的编码方法,生成2个校验chunk,每个校验chunk包括R个数据块。这K个数据chunk和这2个校验chunk组成一个chunk group。其中,每个chunk的大小可以根据需要进行设置,例如为4096Byte。
生成一个chunk group后,存储控制器将该chunk group中的每个chunk存入一个SSD中,存储系统采用的存储介质为HDD或其他种类的设备的情况与之类似。存储控制器将一个chunk group中每个chunk存入对应的SSD中之后,继续根据客户端发来的待写入数据形成另一个chunk group并采用类似的方式处理。
如果任何一个SSD损坏了,那么需要用到损坏的SSD上的chunk所属的chunk group的其余未损坏的chunk来恢复损坏的chunk,恢复过程需要使用预设于存储控制器中的纠删码的解码方法。每个chunk在SSD中被分为R个数据块进行存储,如图3。虽然每个chunk的全部R个数据块都存储于同一个SSD,但这R个数据块的存储地址(物理存储地址或者逻辑存储地址)可以不连续。一般一个chunk group中的每个数据块的大小相同。
传统的行对角奇偶(英文:row-diagonal parity,缩写:RDP)纠删码要求R和K符合以 下约束条件:R+1为素数且R+1>K,R、K均为正整数。其中,K一般为用户设置的参数。为了提升每个chunk的空间的利用效率,R的取值需要能够被chunk的大小整除。
例如,如果一个chunk的大小为4096Byte且用户设置K=23,能够被4096整除、R+1为素数、R+1>K以上三个约束条件的R的最小取值为256。而R的取值如果太大,将导致编码和解码的计算复杂度的大幅增加,从而影响存储系统的性能。为了让R的取值更为灵活,需要打破传统纠删码的R+1为素数和R+1>K的约束。
每个校验chunk对应的数据块,是通过除该校验chunk之外的其他chunk的U-1个数据块进行异或运算得到的,U为大于3的整数。在获取每个校验chunk对应的数据块的过程中,U的取值可能不同。存储控制器在生成校验chunk的过程中通过预设于存储控制器中的校验矩阵来确定每个校验chunk对应的数据块由哪U-1个数据块运算得到。
由于异或运算的特性,生成一个数据块的U-1个数据块和该数据块之中,任意U-1个数据块进行异或运算都可以得到剩余的1个数据块。因此,当任一SSD损坏时,存储控制器通过该校验矩阵也可以得知该损坏的SSD上存储的chunk的每个数据块可以通过哪几个数据块运算得出。
该校验矩阵的行数为2*R且列数为(K+2)*R。校验矩阵的每一列对应一个数据块,每一行对应一个异或方程。如图4所示,X-Y指代数据chunk X的第Y个数据块,后文称之为数据块X-Y,K≥X≥1,R≥Y≥1。而两个校验chunk分别称之为校验chunk P和校验chunk Q,因此P-Y指代校验chunk P的第Y个数据块,后文称之为数据块P-Y,而Q-Y指代校验chunk Q的第Y个数据块,后文称之为数据块Q-Y,R≥Y≥1。
校验矩阵中每一个chunk对应的R列,合称为一个chunk列集合,因此一个行数为2*R且列数为(K+2)*R的校验矩阵中,一共有K+2个chunk列集合,如图4所示。该校验矩阵的第1至第R列属于数据chunk 1的chunk列集合,该校验矩阵的第R+1至第2R列属于数据chunk 2的chunk列集合,依次类推,该校验矩阵的第K*R+1至第(K+1)*R列属于校验chunk P的chunk列集合,该校验矩阵的第(K+1)*R至第(K+2)*R列属于校验chunk Q的chunk列集合。
校验矩阵的每一行有U个坐标为1,指示这U个坐标对应的U个数据块中任意(U-1)个数据块进行异或运算可以得到这U个数据块中未参与异或运算的一个数据块。如图5中,校验矩阵第1行指示:数据块1-1、数据块2-1、数据块3-1、数据块P-1之中,任意3个数据块进行异或运算可以获得没参与异或运算的数据块。
本申请实施例所提供的校验矩阵
以下提供的校验矩阵中R为2Q,Q为正整数,由于R不必符合R+1为素数和R+1>K这两个约束条件,因此采用该校验矩阵的纠删码的使用限制较少,且R的取值将很容易被chunk的大小整除,在K确定的情况下,R可以灵活取值,例如K=23,chunk的大小为4096Byte的情况下,R可以取值为8。
本申请实施例所提供的校验矩阵,可以为标准校验矩阵,或由该标准校验矩阵执行N次调换操作后得到,N≥1。一次调换操作指,将一个行数为2*R且列数为(K+2)*R的矩阵中任意两个chunk列集合调换。由于标准校验矩阵实际提供了2*R个异或方程,每个异或方程用于得出1个数据块,因此对标准校验矩阵执行N次调换操作后得到的矩阵, 仍然可以提供2*R个功能相同的异或方程。
该标准校验矩阵的结构如图6-1或图6-2所示。
如图6-1所示的标准校验矩阵,每个数据chunk的chunk列集合的上半部分由一个I构成,下半部分的Mk为GF(2R)中的一个元素对应的二进制矩阵,K≥k≥1。校验chunk P的chunk列集合的上半部分由一个I构成,下半部分的MK+1为GF(2R)中的一个元素对应的二进制矩阵。校验chunk Q的chunk列集合的上半部分由一个0矩阵构成,下半部分由一个I构成。以上使用的K+1个元素对应的二进制矩阵,为GF(2R)中的K+1个不同的元素对应的二进制矩阵。
如图6-2所示的标准校验矩阵,数据chunk h的chunk列集合的上半部分由一个I构成,下半部分的Mh为GF(2R)中的一个元素对应的二进制矩阵,K≥k≥1且h为奇数。数据chunk j的chunk列集合的上半部分由一个I构成,下半部分的Mj为GF(2R)中的一个元素对应的二进制矩阵,K≥j≥1且j为偶数。校验chunk P的chunk列集合的上半部分由一个I构成,下半部分的MK+1为GF(2R)中的一个元素对应的二进制矩阵。校验chunk Q的chunk列集合的上半部分由一个0矩阵构成,下半部分由一个I构成。以上使用的K+1个元素对应的二进制矩阵,为GF(2R)中的K+1个不同的元素对应的二进制矩阵。
以上提供的校验矩阵的存储形态可以有两种。第一种存储形态为一个行数为2*R,列数为(K+2)*R的矩阵。由于该校验矩阵的每一行代表了一个异或方程,因此该校验矩阵等价于2*R个异或方程,因此,所述校验矩阵的第二种存储形态为该校验矩阵等价的2*R个异或方程。
本申请提供的存储控制器、数据处理方法以及数据处理芯片可以采用图6-1或图6-2所示的任一种校验矩阵进行编码和解码。
以下,以图5提供的校验矩阵为例,介绍校验chunk的编码过程。图5所示的校验矩阵为K=3且R=8的情况下,前述6-1所述的标准校验矩阵。该校验矩阵采用的参数R打破了传统纠删码的R+1需要为素数的约束。
该校验矩阵中,数据块P-1对应的列(也即该校验矩阵的第25列)内有两个坐标为1,这两个坐标分别对应该校验矩阵的第1行和第13行。但在编码阶段,已知仅有3个数据chunk,而该校验矩阵的第13行中数据块Q-5对应的坐标也为1,因此在生成数据块P-1的过程中,仅能采用该校验矩阵第1行对应的异或方程,即:
数据块P-1=数据块1-1XOR数据块2-1XOR数据块3-1。
同理,数据块P-2、数据块P-3、数据块P-4的编码过程如下:
数据块P-2=数据块1-2XOR数据块2-2XOR数据块3-2。
数据块P-3=数据块1-3XOR数据块2-3XOR数据块3-3。
数据块P-4=数据块1-4XOR数据块2-4XOR数据块3-4。
该校验矩阵中,数据块P-5、数据块P-6、数据块P-7、数据块P-8,以及数据块Q-1、数据块Q-2、数据块Q-3、数据块Q-4、数据块Q-5、数据块Q-6、数据块Q-7、数据块Q-8对应的列内仅有1个坐标为1,因此编码过程如下:
数据块P-5=数据块1-5XOR数据块2-5XOR数据块3-5。
数据块P-6=数据块1-6XOR数据块2-6XOR数据块3-6。
数据块P-7=数据块1-7XOR数据块2-7XOR数据块3-7。
数据块P-8=数据块1-8XOR数据块2-8XOR数据块3-8。
数据块Q-1=数据块1-2XOR数据块2-3XOR数据块3-5。
数据块Q-2=数据块1-1XOR数据块2-4XOR数据块3-6。
数据块Q-3=数据块1-1XOR数据块1-4XOR数据块2-1XOR数据块3-7。
数据块Q-4=数据块1-2XOR数据块1-3XOR数据块2-2XOR数据块3-8。
数据块Q-5=数据块1-6XOR数据块2-7XOR数据块P-1。
数据块Q-6=数据块1-5XOR数据块2-8XOR数据块P-2。
数据块Q-7=数据块1-5XOR数据块1-8XOR数据块2-5XOR数据块P-3。
数据块Q-8=数据块1-6XOR数据块1-7XOR数据块2-6XOR数据块P-4。
以上,校验chunk P和校验chunk Q编码完毕。
以下,以采用了图5提供的校验矩阵编码获得的chunk group为例,介绍当该chunk group中任一chunk所在的SSD损坏时,恢复该损坏的SSD上存储的chunk的过程。
如果数据chunk 1所在的SSD损坏,为了恢复数据chunk 1,由于R=8,需要获取数据块1-1至数据块1-8。
由于数据块1-1所在的列中有3个坐标为1,这3个坐标分别对应校验矩阵的第1行、第10行和第11行。其中第11行对应的异或方程由于需要数据块1-3的参与,而数据块1-3也损坏了,因此无法采用第11行对应的异或方程恢复数据块1-1。采用第1行或第10行对应的异或方程均可以恢复数据块1-1,因此数据块1-1实际有2种可选的解码方法。分别为:
数据块1-1=数据块2-1XOR数据块3-1XOR数据块P-1。
数据块1-1=数据块2-4XOR数据块3-6XOR数据块Q-2。
数据块1-2的恢复过程与数据块1-1的恢复过程类似,可以采用对应于校验矩阵的第2行或第9行的异或方程。
由于数据块1-2已经恢复,因此数据块1-3的恢复过程中,可以采用对应于校验矩阵的第3行或第12行的异或方程。
由于数据块1-1已经恢复,因此数据块1-4的恢复过程中,可以采用对应于校验矩阵的第4行或第11行的异或方程。
数据块1-5的恢复过程与数据块1-1的恢复过程类似,可以采用对应于校验矩阵的第5行或第14行的异或方程。
数据块1-6的恢复过程与数据块1-1的恢复过程类似,可以采用对应于校验矩阵的第6行或第13行的异或方程。
由于数据块1-6已经恢复,因此数据块1-7的恢复过程中,可以采用对应于校验矩阵的第7行或第16行的异或方程。
由于数据块1-5已经恢复,因此数据块1-8的恢复过程中,可以采用对应于校验矩阵的第8行或第15行的异或方程。
可以看出,数据chunk 1的恢复过程中,一共有28种解码方法。这28种解码方法虽然都可以恢复损坏的chunk,但由于数据chunk 1的每个数据块的恢复过程中,需要将用 于恢复的数据块从SSD读到存储控制器中,再由存储控制器完成恢复过程。因此,不同的解码方法可能导致恢复数据chunk 1的全部8个数据块的过程中,需要从SSD中读出的数据块的数量不同,因此对于一个确定的校验矩阵,对于任一chunk损坏的情况下,可以采用一种需要从SSD中读出数据块的数量最少的解码方法,以降低恢复开销。
例如,数据chunk 1损坏的情况下,分别采用图5提供的校验矩阵中的第1行、第2行、第12行、第11行、第14行、第13行、第7行、第8行对应的异或方程恢复数据块1-1至数据块1-8。
数据chunk 2损坏的情况下,分别采用图5提供的校验矩阵中的第1行、第2行、第9行、第10行、第5行、第6行、第13行、第14行对应的异或方程恢复数据块2-1至数据块2-8。
数据chunk 3损坏的情况下,分别采用图5提供的校验矩阵中的第1行、第2行、第3行、第4行、第9行、第10行、第11行、第12行对应的异或方程恢复数据块3-1至数据块3-8。
校验chunk P损坏的情况下,分别采用图5提供的校验矩阵中的第5行、第6行、第7行、第8行、第13行、第14行、第15行、第16行对应的异或方程恢复数据块P-1至数据块P-8。
校验chunk Q损坏的情况下,分别采用图5提供的校验矩阵中的第9行、第10行、第11行、第12行、第13行、第14行、第15行、第16行对应的异或方程恢复数据块P-1至数据块P-8。
以上介绍了预先设计好并存储于存储控制器中的K+2种解码方法,分别在K+2个chunk中各个chunk损坏的情况下使用。通过这K+2种解码方法,使得存储控制器能够在任一chunk损坏的情况下,均能采用恢复开销较低的解码方法来恢复该损坏的chunk。例如,数据chunk 1损坏的情况下,恢复开销为0.75。
本申请实施例所提供的存储控制器
如图7提供了一种存储控制器200,存储控制器200可以运用于图1-1或图1-2所示的存储系统中。存储控制器200包括总线202、处理器204、存储器208和通信接口206。处理器204、存储器208和通信接口206之间通过总线202通信。
其中,处理器204可以为中央处理器(英文:central processing unit,缩写:CPU)。存储器208可以包括易失性存储器(英文:volatile memory),例如随机存取存储器(英文:random access memory,缩写:RAM)。存储器208还可以包括非易失性存储器(英文:non-volatile memory),例如只读存储器(英文:read-only memory,缩写:ROM),快闪存储器,HDD或SSD。
通信接口206包括网络接口和存储介质读写接口,分别用于获取客户端发来的待写入数据和将编码获得的chunk group写入存储介质中。
如图8,当存储控制器200在执行编码过程中,存储器208中存储有编码程序以及K个数据chunk。
存储控制器200运行时,处理器204从存储器208中读取编码程序和K个数据chunk,以执行前述编码过程生成chunk group,并通过通信接口206将该chunk group中的各个 chunk存入不同存储介质中。
如图9,当存储控制器200在执行解码过程中,存储器208中存储有解码程序以及恢复损坏的存储介质上存储的chunk的过程中所需的数据块。
当存储控制器200所在的存储系统的存储介质损坏时,处理器204从存储器208中读取解码程序和恢复损坏的存储介质上存储的chunk所需的数据块,以执行前述解码方法,恢复损坏的存储介质上存储的chunk。
编码程序和解码程序可以合并为一个程序。
校验矩阵在存储器208中的存储方式有多种,可以以矩阵的形式存储于编码程序和解码程序中。也可以以2*R个异或方程的形式存储于存储器208中,并且这2*R个异或方程融合于编码程序和解码程序。
以矩阵的形式存储的情况下,在编码过程中,处理器204执行编码程序,按照前述介绍的编码过程,获得两个校验chunk。解码过程类似与编码过程类似。
对于每一个校验矩阵,都预设编码过程和解码过程,因此存储器208中也可以不存储校验矩阵,而是直接在编码程序和解码程序中存储2*R个异或方程。同时,存储器208中还存储了针对不同chunk的编码或解码过程中,分别使用这2*R个异或方程中的哪些异或方程,以及使用这些异或方程的顺序。
例如采用图5对应的校验矩阵的情况下,为了获取校验chunk P和校验chunk Q的共计16个数据块,编码程序直接指示执行前述介绍的16个异或方程。类似的,如果一个chunk损坏的情况下,为了恢复损坏的chunk的8个数据块,解码程序直接指示执行前述介绍的该chunk损坏的情况下,恢复过程中采用的8个异或方程。
以上提供的存储控制器采用的纠删码的使用限制较少,与存储系统的兼容性更好。
如图10,提供了另一种存储控制器400,存储控制器400可以运用于图1-1或图1-2所示的存储系统中。存储控制器400包括总线402、处理器404、存储器408、数据处理芯片410和通信接口406。处理器404、存储器408和通信接口406之间通过总线402通信。
其中,处理器404可以为CPU。存储器408可以包括易失性存储器。存储器408还可以包括非易失性存储器。
通信接口406包括网络接口和存储介质读写接口,分别用于获取客户端发来的待写入数据和将编码后获得的chunk group存入存储介质。
数据处理芯片410可以通过电路实现,所述电路可以为专用集成电路(英文:application-specific integrated circuit,缩写:ASIC)或可编程逻辑器件(英文:programmable logic device,缩写:PLD)。上述PLD可以是复杂可编程逻辑器件(英文:complex programmable logic device,缩写:CPLD),现场可编程门阵列(英文:field programmable gate array,缩写:FPGA),通用阵列逻辑(英文:generic array logic,缩写:GAL)或其任意组合。
如图11所示,数据处理芯片410具体可以包括选址单元4102、运算单元4104、存储单元4106和读写接口4108。选址单元4102、运算单元4104、存储单元4106实际可以集成为一个电路。
读写接口4108与总线402相连,用于在数据处理芯片410执行编码的场景下,通过 总线402获取存储器408中存储的数据块并存入存储单元4106,并将编码后获取的数据块通过总线402发往存储器208,以便存储控制器200生成chunk group并将chunk group中的各个chunk存入存储介质。读写接口4108还用于在数据处理芯片410执行解码的场景下,通过总线402获取恢复过程中所需的数据块并存入存储单元4106,并将恢复出的数据块发往存储器208。
选址单元4102的功能与校验矩阵类似,选址单元4102指示运算单元4104进行一次异或运算的过程中应当将存储单元4106中哪些数据块进行异或运算,以便运算单元4104从存储单元4106中获取对应的数据块以完成异或运算。
运算单元4104从存储单元4106中获取一次异或运算的过程中需要进行异或运算的多个数据块,执行完一次异或运算后将得到的数据块存入存储单元4106中,接着执行下一次异或运算。
以上提供的数据处理芯片采用的纠删码的使用限制较少,与存储系统的兼容性更好。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
结合本申请公开内容所描述的方法可以由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于RAM、快闪存储器、ROM、可擦除可编程只读存储器(英文:erasable programmable read only memory,缩写:EPROM)、电可擦可编程只读存储器(英文:electrically erasable programmable read only memory,缩写:EEPROM)、HDD、SSD、光盘或者本领域熟知的任何其它形式的存储介质中。
本领域技术人员应该可以意识到,在上述一个或多个示例中,本申请所描述的功能可以用硬件或软件来实现。当使用软件实现时,可以将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。存储介质可以是通用或专用计算机能够存取的任何可用介质。
以上该的具体实施方式,对本申请的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上该仅为本申请的具体实施方式而已,并不用于限定本申请的保护范围,凡在本申请的技术方案的基础之上,所做的任何修改、改进等,均应包括在本申请的保护范围之内。

Claims (18)

  1. 一种存储控制器,其特征在于,包括处理器、存储器和通信接口;
    所述处理器,用于通过所述通信接口获取待编码的K个数据大块chunk,并将所述K个数据chunk缓存入所述存储器,每个数据chunk包括R个数据块,R=2Q,Q和K均为正整数;
    所述处理器,还用于执行所述存储器中的代码实现以下操作:
    读取所述存储器中存储的所述K个数据chunk,根据校验矩阵和所述K个数据chunk生成第一校验chunk和第二校验chunk,每个校验chunk包括R个数据块;
    其中,所述校验矩阵有2*R行,所述校验矩阵中第(k-1)*R+1列至第k*R列为所述K个数据chunk中第k个数据chunk的chunk列集合,K≥k≥1,所述校验矩阵中第K*R+1列至第(K+1)*R列为对应所述第一校验chunk的chunk列集合,所述校验矩阵中第(K+1)*R+1列至第(K+2)*R列为所述第二校验chunk的chunk列集合;
    所述校验矩阵为标准校验矩阵或由标准校验矩阵执行N次调换操作后得到,N≥1,所述调换操作指将任意两个chunk列集合调换;
    所述标准校验矩阵中,所述K个数据chunk中第k个数据chunk的chunk列集合由正对角矩阵和Mk构成,所述第一校验chunk的chunk列集合由正对角矩阵和MK+1构成,所述第二校验chunk的chunk列集合由0矩阵和正对角矩阵构成,所述Mk和所述MK+1为伽罗华域GF(2R)中不同元素对应的二进制矩阵。
  2. 如权利要求1所述的存储控制器,其特征在于,所述处理器还用于,通过所述通信接口将所述K个数据chunk、所述第一校验chunk和所述第二校验chunk分别存入所述存储控制器所在的存储系统的K+2个存储介质中。
  3. 如权利要求2所述的存储控制器,其特征在于,所述处理器还用于,当所述K+2个存储介质中有存储介质损坏时,根据所述校验矩阵和所述K+2存储介质中未损坏的存储介质上存储的数据chunk和所述第一校验chunk和所述第二校验chunk,恢复所述损坏的存储介质。
  4. 一种数据处理芯片,其特征在于,包括电路和读写接口;
    所述电路用于,通过所述读写接口获取待编码的K个数据大块chunk,每个数据chunk包括R个数据块,R=2Q,Q和K均为正整数;
    所述电路还用于,根据校验矩阵和所述K数据chunk生成第一校验chunk和第二校验chunk,每个校验chunk包括R个数据块;
    其中,所述校验矩阵有2*R行,所述校验矩阵中第(k-1)*R+1列至第k*R列为所述K个数据chunk中第k个数据chunk的chunk列集合,K≥k≥1,所述校验矩阵中第K*R+1列至第(K+1)*R列为对应所述第一校验chunk的chunk列集合,所述校验矩阵中第(K+1)*R+1列至第(K+2)*R列为所述第二校验chunk的chunk列集合;
    所述校验矩阵为标准校验矩阵或由标准校验矩阵执行N次调换操作后得到,N≥1,所述调换操作指将任意两个chunk列集合调换;
    所述标准校验矩阵中,所述K个数据chunk中第k个数据chunk的chunk列集合由正对角矩阵和Mk构成,所述第一校验chunk的chunk列集合由正对角矩阵和MK+1构成,所述第二校验chunk的chunk列集合由0矩阵和正对角矩阵构成,所述Mk和所述MK+1 为伽罗华域GF(2R)中不同元素对应的二进制矩阵。
  5. 如权利要求4所述的数据处理芯片,其特征在于,所述数据处理芯片运用于存储控制器中;
    所述电路,还用于通过所述读写接口将所述K个数据chunk、所述第一校验chunk和所述第二校验chunk存入所述存储控制器的存储器中,以便所述存储控制器将所述K个数据chunk、所述第一校验chunk和所述第二校验chunk分别存入所述存储控制器所在的存储系统的K+2个存储介质中。
  6. 如权利要求5所述的数据处理芯片,其特征在于,所述电路还用于,当所述K+2个存储介质中有存储介质损坏时,根据所述校验矩阵和所述K+2存储介质中未损坏的存储介质上存储的数据chunk和所述第一校验chunk和所述第二校验chunk,恢复所述损坏的存储介质。
  7. 一种数据处理方法,其特征在于,所述方法适用于存储控制器,所述方法包括:
    获取待编码的K个数据大块chunk并缓存所述K个数据chunk,每个数据chunk包括R个数据块,R=2Q,Q和K均为正整数;
    根据校验矩阵和所述K个数据chunk生成第一校验chunk和第二校验chunk,每个校验chunk包括R个数据块;
    其中,所述校验矩阵有2*R行,所述校验矩阵中第(k-1)*R+1列至第k*R列为所述K个数据chunk中第k个数据chunk的chunk列集合,K≥k≥1,所述校验矩阵中第K*R+1列至第(K+1)*R列为对应所述第一校验chunk的chunk列集合,所述校验矩阵中第(K+1)*R+1列至第(K+2)*R列为所述第二校验chunk的chunk列集合;
    所述校验矩阵为标准校验矩阵或由标准校验矩阵执行N次调换操作后得到,N≥1,所述调换操作指将任意两个chunk列集合调换;
    所述标准校验矩阵中,所述K个数据chunk中第k个数据chunk的chunk列集合由正对角矩阵和Mk构成,所述第一校验chunk的chunk列集合由正对角矩阵和MK+1构成,所述第二校验chunk的chunk列集合由0矩阵和正对角矩阵构成,所述Mk和所述MK+1为伽罗华域GF(2R)中不同元素对应的二进制矩阵。
  8. 如权利要求7所述的数据处理方法,其特征在于,所述方法还包括:
    将所述K个数据chunk、所述第一校验chunk和所述第二校验chunk分别存入所述存储控制器所在的存储系统的K+2个存储介质中。
  9. 如权利要求8所述的数据处理方法,其特征在于,所述方法还包括:
    当所述K+2个存储介质中有存储介质损坏时,根据所述校验矩阵和所述K+2个存储介质中未损坏的存储介质上存储的数据chunk和所述第一校验chunk和所述第二校验chunk中的至少一个,恢复所述损坏的存储介质。
  10. 一种存储控制器,其特征在于,包括处理器、存储器和通信接口;
    所述处理器,用于通过所述通信接口获取待编码的K个数据大块chunk,并将所述K个数据chunk缓存入所述存储器,每个数据chunk包括R个数据块,R=2Q,Q和K均为正整数;
    所述处理器,还用于执行所述存储器中的代码实现以下操作:
    读取所述存储器中存储的所述K个数据chunk,根据校验矩阵和所述K个数据chunk生成第一校验chunk和第二校验chunk,每个校验chunk包括R个数据块;
    其中,所述校验矩阵有2*R行,所述校验矩阵中第(k-1)*R+1列至第k*R列为所述K个数据chunk中第k个数据chunk的chunk列集合,K≥k≥1,所述校验矩阵中第K*R+1列至第(K+1)*R列为对应所述第一校验chunk的chunk列集合,所述校验矩阵中第(K+1)*R+1列至第(K+2)*R列为所述第二校验chunk的chunk列集合;
    所述校验矩阵为标准校验矩阵或由标准校验矩阵执行N次调换操作后得到,N≥1,所述调换操作指将任意两个chunk列集合调换;
    所述标准校验矩阵中,所述K个数据chunk中第h个数据chunk的chunk列集合由正对角矩阵和Mh构成,K≥h≥1且h为奇数,所述K个数据chunk中第j个数据chunk的chunk列集合由反对角矩阵和Mj构成,K≥j≥1且j为偶数,所述第一校验chunk的chunk列集合由正对角矩阵和MK+1构成,所述第二校验chunk的chunk列集合由0矩阵和正对角矩阵构成,所述Mh、所述Mj和所述MK+1为伽罗华域GF(2R)中不同元素对应的二进制矩阵。
  11. 如权利要求10所述的存储控制器,其特征在于,所述处理器还用于,通过所述通信接口将所述K个数据chunk、所述第一校验chunk和所述第二校验chunk分别存入所述存储控制器所在的存储系统的K+2个存储介质中。
  12. 如权利要求11所述的存储控制器,其特征在于,所述处理器还用于,当所述K+2个存储介质中有存储介质损坏时,根据所述校验矩阵和所述K+2存储介质中未损坏的存储介质上存储的数据chunk和所述第一校验chunk和所述第二校验chunk,恢复所述损坏的存储介质。
  13. 一种数据处理芯片,其特征在于,包括电路和读写接口;
    所述电路用于,通过所述读写接口获取待编码的K个数据大块chunk,每个数据chunk包括R个数据块,R=2Q,Q和K均为正整数;
    所述电路还用于,根据校验矩阵和所述K数据chunk生成第一校验chunk和第二校验chunk,每个校验chunk包括R个数据块;
    其中,所述校验矩阵有2*R行,所述校验矩阵中第(k-1)*R+1列至第k*R列为所述K个数据chunk中第k个数据chunk的chunk列集合,K≥k≥1,所述校验矩阵中第K*R+1列至第(K+1)*R列为对应所述第一校验chunk的chunk列集合,所述校验矩阵中第(K+1)*R+1列至第(K+2)*R列为所述第二校验chunk的chunk列集合;
    所述校验矩阵为标准校验矩阵或由标准校验矩阵执行N次调换操作后得到,N≥1,所述调换操作指将任意两个chunk列集合调换;
    所述标准校验矩阵中,所述K个数据chunk中第h个数据chunk的chunk列集合由正对角矩阵和Mh构成,K≥h≥1且h为奇数,所述K个数据chunk中第j个数据chunk 的chunk列集合由反对角矩阵和Mj构成,K≥j≥1且j为偶数,所述第一校验chunk的chunk列集合由正对角矩阵和Mk+1构成,所述第二校验chunk的chunk列集合由0矩阵和正对角矩阵构成,所述Mh、所述Mj和所述MK+1为伽罗华域GF(2R)中不同元素对应的二进制矩阵。
  14. 如权利要求13所述的数据处理芯片,其特征在于,所述数据处理芯片运用于存储控制器中;
    所述电路,还用于通过所述读写接口将所述K个数据chunk、所述第一校验chunk和所述第二校验chunk存入所述存储控制器的存储器中,以便所述存储控制器将所述K个数据chunk、所述第一校验chunk和所述第二校验chunk分别存入所述存储控制器所在的存储系统的K+2个存储介质中。
  15. 如权利要求14所述的数据处理芯片,其特征在于,所述电路还用于,当所述K+2个存储介质中有存储介质损坏时,根据所述校验矩阵和所述K+2存储介质中未损坏的存储介质上存储的数据chunk和所述第一校验chunk和所述第二校验chunk,恢复所述损坏的存储介质。
  16. 一种数据处理方法,其特征在于,所述方法适用于存储控制器;所述方法包括:
    获取待编码的K个数据大块chunk并缓存所述K个数据chunk,每个数据chunk包括R个数据块,R=2Q,Q和K均为正整数;
    根据校验矩阵和所述K个数据chunk生成第一校验chunk和第二校验chunk,每个校验chunk包括R个数据块;
    其中,所述校验矩阵有2*R行,所述校验矩阵中第(k-1)*R+1列至第k*R列为所述K个数据chunk中第k个数据chunk的chunk列集合,K≥k≥1,所述校验矩阵中第K*R+1列至第(K+1)*R列为对应所述第一校验chunk的chunk列集合,所述校验矩阵中第(K+1)*R+1列至第(K+2)*R列为所述第二校验chunk的chunk列集合;
    所述校验矩阵为标准校验矩阵或由标准校验矩阵执行N次调换操作后得到,N≥1,所述调换操作指将任意两个chunk列集合调换;
    所述标准校验矩阵中,所述K个数据chunk中第h个数据chunk的chunk列集合由正对角矩阵和Mh构成,K≥h≥1且h为奇数,所述K个数据chunk中第j个数据chunk的chunk列集合由反对角矩阵和Mj构成,K≥j≥1且j为偶数,所述第一校验chunk的chunk列集合由正对角矩阵和Mk+1构成,所述第二校验chunk的chunk列集合由0矩阵和正对角矩阵构成,所述Mh、所述Mj和所述MK+1为伽罗华域GF(2R)中不同元素对应的二进制矩阵。
  17. 如权利要求16所述的数据处理方法,其特征在于,所述方法还包括:
    将所述K个数据chunk、所述第一校验chunk和所述第二校验chunk分别存入所述存储控制器所在的存储系统的K+2个存储介质中。
  18. 如权利要求17所述的数据处理方法,其特征在于,还包括:
    当所述K+2个存储介质中有存储介质损坏时,根据所述校验矩阵和所述K+2个存储介质中未损坏的存储介质上存储的数据chunk和所述第一校验chunk和所述第二校验chunk,恢复所述损坏的存储介质。
PCT/CN2017/076954 2017-03-16 2017-03-16 存储控制器、数据处理芯片及数据处理方法 WO2018165943A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201780088333.6A CN110431531B (zh) 2017-03-16 2017-03-16 存储控制器、数据处理芯片及数据处理方法
PCT/CN2017/076954 WO2018165943A1 (zh) 2017-03-16 2017-03-16 存储控制器、数据处理芯片及数据处理方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/076954 WO2018165943A1 (zh) 2017-03-16 2017-03-16 存储控制器、数据处理芯片及数据处理方法

Publications (1)

Publication Number Publication Date
WO2018165943A1 true WO2018165943A1 (zh) 2018-09-20

Family

ID=63522702

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/076954 WO2018165943A1 (zh) 2017-03-16 2017-03-16 存储控制器、数据处理芯片及数据处理方法

Country Status (2)

Country Link
CN (1) CN110431531B (zh)
WO (1) WO2018165943A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112000512B (zh) * 2020-08-14 2022-08-02 山东云海国创云计算装备产业创新中心有限公司 一种数据修复方法及相关装置
CN114895856B (zh) * 2022-07-12 2022-09-16 创云融达信息技术(天津)股份有限公司 一种基于高密度存储硬件的分布式存储系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102023819A (zh) * 2010-12-01 2011-04-20 北京同有飞骥科技股份有限公司 一种双磁盘容错水平型分组并行访问磁盘阵列的构建方法
US20130173956A1 (en) * 2011-12-30 2013-07-04 Streamscale, Inc. Using parity data for concurrent data authentication, correction, compression, and encryption
US20160105202A1 (en) * 2013-05-29 2016-04-14 Kabushiki Kaisha Toshiba Coding and decoding methods and apparatus
CN106484559A (zh) * 2016-10-17 2017-03-08 成都信息工程大学 一种校验矩阵的构造方法及水平阵列纠删码的构造方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4509172A (en) * 1982-09-28 1985-04-02 International Business Machines Corporation Double error correction - triple error detection code
US6219817B1 (en) * 1998-04-20 2001-04-17 Intel Corporation Error correction and detection for faults on time multiplexed data lines
CN100362782C (zh) * 2004-08-24 2008-01-16 华为技术有限公司 对丢失数据单元进行恢复的方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102023819A (zh) * 2010-12-01 2011-04-20 北京同有飞骥科技股份有限公司 一种双磁盘容错水平型分组并行访问磁盘阵列的构建方法
US20130173956A1 (en) * 2011-12-30 2013-07-04 Streamscale, Inc. Using parity data for concurrent data authentication, correction, compression, and encryption
US20160105202A1 (en) * 2013-05-29 2016-04-14 Kabushiki Kaisha Toshiba Coding and decoding methods and apparatus
CN106484559A (zh) * 2016-10-17 2017-03-08 成都信息工程大学 一种校验矩阵的构造方法及水平阵列纠删码的构造方法

Also Published As

Publication number Publication date
CN110431531B (zh) 2020-11-03
CN110431531A (zh) 2019-11-08

Similar Documents

Publication Publication Date Title
WO2018112980A1 (zh) 存储控制器、数据处理芯片及数据处理方法
US9946596B2 (en) Global error recovery system
US10146618B2 (en) Distributed data storage with reduced storage overhead using reduced-dependency erasure codes
US8327080B1 (en) Write-back cache protection
JP6153541B2 (ja) 消失エラー訂正符号を用いてストレージ・アレイにデータを格納する方法、システム及びプログラム
CN109643258B (zh) 使用高速率最小存储再生擦除代码的多节点修复
US8316277B2 (en) Apparatus, system, and method for ensuring data validity in a data storage process
US7934120B2 (en) Storing data redundantly
EP1828899B1 (en) Method and system for syndrome generation and data recovery
US9823968B1 (en) Data storage system employing a variable redundancy distributed RAID controller with embedded RAID logic and method for data migration between high-performance computing architectures and data storage devices using the same
CA2817945A1 (en) Correcting erasures in storage arrays
KR20080071907A (ko) Raid 장치 및 갈로아체를 이용한 데이터 복원 장치
US10417088B2 (en) Data protection techniques for a non-volatile memory array
CN111090540B (zh) 基于纠删码的数据处理方法与装置
US10558524B2 (en) Computing system with data recovery mechanism and method of operation thereof
CN112486725B (zh) 一种对压缩数据进行纠错编码的方法和装置
WO2011015134A1 (zh) 多磁盘容错系统及生成校验块、恢复数据块的方法
CN105808170B (zh) 一种能够修复单磁盘错误的raid6编码方法
CN106537352A (zh) 分布式存储数据恢复
CN109358980A (zh) 一种对数据更新和单磁盘错误修复友好的raid6编码方法
WO2018165943A1 (zh) 存储控制器、数据处理芯片及数据处理方法
KR102004928B1 (ko) 데이터 저장 장치 및 그것의 에러 정정 코드 처리 방법
US20200042386A1 (en) Error Correction With Scatter-Gather List Data Management
CN115113816A (zh) 一种纠删码数据处理系统、方法、计算机设备及介质
CN107885615B (zh) 分布式存储数据的复原方法与系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17900551

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17900551

Country of ref document: EP

Kind code of ref document: A1