CN117478692A - Data storage method, device, equipment, system and computer readable storage medium - Google Patents

Data storage method, device, equipment, system and computer readable storage medium Download PDF

Info

Publication number
CN117478692A
CN117478692A CN202210873344.8A CN202210873344A CN117478692A CN 117478692 A CN117478692 A CN 117478692A CN 202210873344 A CN202210873344 A CN 202210873344A CN 117478692 A CN117478692 A CN 117478692A
Authority
CN
China
Prior art keywords
data
equal
row
linear combination
column
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210873344.8A
Other languages
Chinese (zh)
Inventor
李�杰
唐凯成
程柯云
李柏晴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202210873344.8A priority Critical patent/CN117478692A/en
Publication of CN117478692A publication Critical patent/CN117478692A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The application discloses a data storage method, a device, equipment, a system and a computer readable storage medium, and relates to the technical field of communication. The method is applied to a distributed storage system including a first device and a plurality of storage nodes, the method comprising: the first equipment acquires n groups of first data obtained by x code words; then, data transformation is carried out on y groups of first data in the n groups of first data to obtain y groups of second data, so that the minimum repair bandwidth of any one of storage nodes for storing the y groups of second data is smaller than a threshold value; and then respectively storing the y-group second data and the n-y-group first data on the corresponding storage nodes. When one of the y storage nodes fails, the lost data can be recovered with smaller communication overhead due to the smaller minimum repair bandwidth of the storage node.

Description

Data storage method, device, equipment, system and computer readable storage medium
Technical Field
The present disclosure relates to the field of communications technologies, and in particular, to a data storage method, apparatus, device, system, and computer readable storage medium.
Background
In the distributed storage system, original data to be stored is encoded to obtain check data related to the original data, and then the original data and the check data are stored on a plurality of storage nodes so as to ensure the reliability of data storage.
For example, a piece of original data to be stored is encoded according to a maximum distance separable (maximum distance separable, MDS) code pattern, b pieces of check data related to a piece of original data are obtained, a piece of original data and b pieces of check data are stored on M (m=a+b) storage nodes, and one storage node stores one piece of data. In the case where data of one storage node is lost, the lost data can be recovered by downloading a or more data from the remaining storage nodes, based on the downloaded a data.
Wherein the ratio of the number of downloaded data to the total number of original data is referred to as the repair bandwidth of the storage node. Since a larger repair bandwidth means a larger communication overhead, a data storage method is required so that the repair bandwidth of a storage node is smaller, and thus when data of a certain storage node is lost, the lost data can be recovered with a smaller communication overhead.
Disclosure of Invention
The application provides a data storage method, a device, equipment, a system and a computer readable storage medium, which are used for reducing the repair bandwidth of a storage node.
In a first aspect, there is provided a data storage method applied to a distributed storage system comprising a first device and a plurality of storage nodes, the method comprising: firstly, first equipment acquires n groups of first data obtained by x code words, wherein any code word in the x code words comprises n first data, the n first data are obtained by encoding k original data, one group of first data comprises one first data included in each code word in the x code words, x, n and k are positive integers, n is greater than or equal to k, and x is greater than or equal to 2 and less than or equal to n-k; then, the first device obtains y groups of second data by carrying out data transformation on y groups of first data in n groups of first data, so that the minimum repair bandwidth of any one of storage nodes for storing the y groups of second data is smaller than a threshold value, wherein the minimum repair bandwidth of any one storage node refers to the minimum bandwidth required for recovering the data stored by any one storage node, the minimum bandwidth is obtained based on the ratio of the number of downloaded data to the total number of original data, y is an integer, and x is less than or equal to y is less than or equal to n; and then respectively storing the y-group second data and the n-y-group first data on the corresponding storage nodes.
According to the method, the y-group second data are obtained by carrying out data transformation on the y-group first data, so that the minimum repair bandwidth of y storage nodes for storing the y-group second data is smaller than a threshold value. Therefore, when one of the y storage nodes fails, lost data can be recovered with small communication overhead.
In addition, as x is more than or equal to 2 and less than or equal to n-k, the method can be suitable for different amounts of original data, is flexible, and has less calculation resources required for executing data transformation and less storage resources required for storing data. Furthermore, as x is less than or equal to y is less than or equal to n, the method can be applied to any y storage nodes in n storage nodes, and the application range is wider.
In one possible implementation manner, performing data transformation on the y-group first data to obtain the y-group second data includes: and carrying out data transformation on the y groups of first data according to a linear combination transformation mode to obtain y groups of second data, wherein the linear combination transformation mode is used for correlating a plurality of first data which belong to different code words and correspond to the same reference matrix in different groups, so that the obtained linear combination result is used for restoring the plurality of first data, one group of first data corresponds to one reference matrix, and the reference matrix is used for downloading one group of first data corresponding to the reference matrix from a corresponding storage node.
By associating a plurality of first data belonging to different codewords in different groups and corresponding to the same reference matrix, one second data obtained by association can be used for recovering the plurality of first data. The amount of data required for this method is smaller than in the related art, where the plurality of first data needs to be recovered based on the other plurality of first data.
In one possible implementation manner, the y sets of first data are a matrix of y rows and x columns, one set of first data corresponds to one row of the matrix, the first data belonging to the same codeword corresponds to one column of the matrix, and the data transformation is performed on the y sets of first data according to a transformation mode of linear combination to obtain y sets of second data, where the data transformation includes:
dividing the matrix into a first number of first sub-matrices and a second number of second sub-matrices, the first number and the second number being determined based on the ratio of y to x, any one of the first sub-matrices corresponding to x sets of first data, any one of the second sub-matrices corresponding to s sets of first data, x < s <2*x;
for any first submatrix, performing first linear combination and second linear combination on the first data of the p-th row and the q-th column in any first submatrix and the first data of the p-th row and the p-th column in any first submatrix, respectively replacing the first data of the p-th row and the q-th column and the first data of the p-th row and the p-th column by the results of the first linear combination and the second linear combination to obtain x groups of second data, wherein the first linear combination and the second linear combination are different, p < x is not less than 0, q < x is not less than 0, and q is not equal to q;
For any second submatrix, performing first linear combination and second linear combination on the first data of the p-th row and the q-th column in the front x row of the any second submatrix, and respectively replacing the first data of the p-th row and the q-th column and the first data of the p-th row and the p-th column by the results of the first linear combination and the second linear combination to obtain x groups of second data, wherein p is more than or equal to 0 and less than or equal to p < x, q is more than or equal to 0 and less than or equal to q < x, and p is not equal to q;
and (3) carrying out first linear combination and second linear combination on the data of the t- (s-x) th column of the r+ (s-x) th row and the data of the r th column of the t th row in the rear x row of any second submatrix, respectively replacing the data of the t- (s-x) th column of the r+ (s-x) th row and the data of the r th column of the t th row by the results of the first linear combination and the second linear combination to obtain x groups of second data, wherein r < x > is 0, s-x < t < s, and r is not equal to t- (s-x), and the data of the t- (s-x) th column of the r+ (s-x) th row and the data of the r th column of the t row are the first data of the front x row or the first data of the rear s-x row in any second submatrix.
By performing the first linear combination and the second linear combination of the first data of the p-th row and the q-th column in any one of the first sub-matrices and the first data of the p-th row and the p-th column, the first data of the p-th row and the q-th column can be correlated. Further, when the results of the first linear combination and the second linear combination are replaced with two pieces of second data, respectively, the two pieces of first data can be restored based on either one of the pieces of second data.
Since the first data of the qth row and the p column and the first data of the p row and the p column are two first data belonging to the same codeword in different groups, the first data of the qth row and the p column can be used for obtaining the first data of the p row and the p column by combining the coding mode of the codeword. For the second data of the qth row and p-th column, the second data can also be used to acquire the first data of the p-th row and p-th column. Thus, in the event that the storage node storing the data of the p-th row fails, the second data of the p-th column of the q-th row can be used for acquiring the first data of the q-th column of the p-th row and the first data of the p-th column of the q-th row as well as the first data of the p-th column of the p-th row. That is, for a storage node storing data of a p-th row, first data of a p-th column of the p-th row and second data of a q-th column of the p-th row stored by the storage node can be restored based on one second data of the p-th column of the q-th row. The amount of data required to restore the data of the p-th row is smaller in the method than in the related art in which the first data of the p-th row and the p-th column is restored based on the first data of the q-th row and the q-th column.
In one possible implementation, the first linear combination is mD 1 +vD 2 The second linear combination is gD 1 +fD 2 M, v, g, f are elements of finite field other than 0, and the product of m and f is not equal to the product of v and g, D 1 And D 2 For representing two data that perform a first linear combination and a second linear combination. Since m, v, g, f are various cases satisfying elements each of which is not 0 in the finite field, and the product of m and f is not equal to the product of v and g, the cases of the first linear combination and the second linear combination in this method are various. Correspondingly, theThe method is flexible and various in the case of performing linear combination.
In one possible implementation manner, a set of first data corresponds to a reference matrix, the reference matrix is used for downloading a set of first data corresponding to the reference matrix from a corresponding storage node, and performing data transformation on y sets of first data to obtain y sets of second data, including: in response to the n reference matrices being different, performing position transformation on the y-group first data, wherein the position transformation is used for changing positions of x y first data included in the y-group first data; and carrying out data transformation on the y groups of first data after the position transformation according to a transformation mode of linear combination, and executing the same reference matrix corresponding to the plurality of first data of the linear combination. The method can be flexibly applied to the condition that n reference matrixes are the same or different, and has a wider application range.
In one possible implementation, the y sets of first data are a matrix of y rows and x columns, one set of first data corresponds to one row of the matrix, the first data belonging to the same codeword corresponds to one column of the matrix, and performing position transformation on the y sets of first data includes: dividing the matrix into a first number of first sub-matrices and a second number of second sub-matrices, the first number and the second number being determined based on the ratio of y to x, any one of the first sub-matrices corresponding to x sets of first data, any one of the second sub-matrices corresponding to s sets of first data, x < s <2*x; for any one of the submatrices, cyclically transforming the group in which the first data of each column of the any one of the submatrices is located, so that the first data of the p-th row and the q-th column in the front x row of the any one of the submatrices after cyclic transformation is identical to the reference matrix corresponding to the first data of the p-th row and the p-th column, 0 is less than or equal to p < x,0 is less than or equal to q < x, and p is not equal to q, the first data of the t- (s-x) th column in the r+ (s-x) th row and the t- (s-x) th column in the rear x row of the any one of the submatrices after cyclic transformation is identical to the reference matrix corresponding to the first data of the t-th row and the r < x, and s-x is less than or equal to t- (s-x), and any one of the submatrices is the first submatrices or the second submatrices. By determining the first sub-matrix and the second sub-matrix based on the ratio of y to x and performing subsequent position transformations on the first sub-matrix and the second sub-matrix, the method can be flexibly adapted to different situations of y.
In one possible implementation manner, for any one of storage nodes corresponding to y sets of second data, the second data stored in response to any one of storage nodes is obtained based on the first data included in any one of first sub-matrices, and any one of storage nodes is a u-th storage node in x storage nodes storing x sets of second data, where 0 is equal to or less than u < x, and a minimum repair bandwidth of the u-th storage node is as follows:
wherein,the u-th data is represented in n-x storage nodes other than the x storage nodes, not based on the number of storage nodes obtained by linear combination.
In one possible implementation manner, for any one of storage nodes corresponding to y sets of second data, the second data stored in response to any one of storage nodes is obtained based on the first data included in any one of second sub-matrices, and any one of storage nodes is a c-th storage node in s storage nodes storing s sets of second data, where 0 c < s-x is less than or equal to the following value, where the minimum repair bandwidth of the c-th storage node is less than or equal to:
and under the condition that s-x is less than or equal to c < x, the minimum repair bandwidth of the c-th storage node is less than or equal to the following value:
And under the condition that x is less than or equal to c < s, the minimum repair bandwidth of the c-th storage node is less than or equal to the following value:
wherein c is 0.ltoreq.c<In the case of s-x, the number of the cells,representing the number of storage nodes, among n-s storage nodes other than the s storage nodes, for which the c-th data is not based on a linear combination; at s-x is less than or equal to c<s, in the case of-> Representing the number of storage nodes in which the c- (s-x) th data is not based on a linear combination among n-s storage nodes other than the s storage nodes.
In one possible implementation, the pattern of codewords includes any one of Reed-Solomon (RS) codes, partial repair codes, or piggybacked codes. The method can be suitable for different code patterns and has a wide application range.
In a second aspect, there is provided a data storage apparatus for use with a first device comprised in a distributed storage system, the distributed storage system further comprising a plurality of storage nodes, the apparatus comprising:
the acquisition module is used for acquiring n groups of first data obtained by x code words, wherein any code word in the x code words comprises n first data, the n first data are obtained by encoding k original data, one group of first data comprises one first data included in each code word in the x code words, x, n and k are positive integers, n is greater than or equal to k, and x is greater than or equal to 2 and less than or equal to n-k;
The transformation module is used for carrying out data transformation on the y groups of first data to obtain y groups of second data, so that the minimum restoration bandwidth of any one of the storage nodes for storing the y groups of second data is smaller than a threshold value, wherein the minimum restoration bandwidth of any one of the storage nodes is the minimum bandwidth required for restoring the data stored by any one of the storage nodes, the minimum bandwidth is obtained based on the ratio of the number of downloaded data to the total number of original data, y is an integer, and x is less than or equal to y is less than or equal to n;
and the storage module is used for respectively storing the y groups of second data and the n-y groups of first data on the corresponding storage nodes.
In one possible implementation manner, the transformation module is configured to perform data transformation on the y sets of first data according to a transformation manner of linear combination to obtain y sets of second data, where the transformation manner of linear combination is configured to correlate multiple first data belonging to different codewords and corresponding to the same reference matrix in different sets, so that the obtained linear combination result is used to restore the multiple first data, where one set of first data corresponds to one reference matrix, and the reference matrix is used to download one set of first data corresponding to the reference matrix from a corresponding storage node.
In one possible implementation, the y sets of first data are a matrix of y rows and x columns, one set of first data corresponds to one row of the matrix, and first data belonging to the same codeword corresponds to one column of the matrix;
The transformation module is used for dividing the matrix into a first number of first sub-matrices and a second number of second sub-matrices, the first number and the second number are determined based on the ratio of y to x, any one of the first sub-matrices corresponds to x groups of first data, any one of the second sub-matrices corresponds to s groups of first data, and x < s <2*x;
for any first sub-matrix, performing first linear combination and second linear combination on the first data of the p-th row and the q-th column in any first sub-matrix and the first data of the p-th row and the p-th column, respectively replacing the first data of the p-th row and the q-th column and the first data of the p-th row and the p-th column by the results of the first linear combination and the second linear combination to obtain x groups of second data, wherein the first linear combination and the second linear combination are different, p < x is not less than 0, q < x is not less than 0, and p is not equal to q;
for any second sub-matrix, performing first linear combination and second linear combination on the first data of the p-th row and the q-th column in the front x rows of any second sub-matrix, and respectively replacing the first data of the p-th row and the q-th column and the first data of the p-th row and the p-th column by the results of the first linear combination and the second linear combination to obtain x groups of second data, wherein p is more than or equal to 0 and less than x, q is more than or equal to 0 and less than or equal to q and is not equal to q;
And (3) carrying out first linear combination and second linear combination on the data of the t- (s-x) th column of the r+ (s-x) th row and the data of the r th column of the t th row in the rear x row of any second submatrix, respectively replacing the data of the t- (s-x) th column of the r+ (s-x) th row and the data of the r th column of the t th row by the results of the first linear combination and the second linear combination to obtain x groups of second data, wherein r < x > is 0, s-x < t < s, and r is not equal to t- (s-x), and the data of the t- (s-x) th column of the r+ (s-x) th row and the data of the r th column of the t row are the first data of the front x row or the first data of the rear s-x row in any second submatrix.
In one possible implementation, the first linear combination is mD 1 +vD 2 The second linear combination is gD 1 +fD 2 M, v, g, f are elements of finite field other than 0, and the product of m and f is not equal to the product of v and g, D 1 And D 2 For representing two data that perform a first linear combination and a second linear combination.
In one possible implementation, a set of first data corresponds to a reference matrix, and the reference matrix is used for downloading the set of first data corresponding to the reference matrix from the corresponding storage node;
the transformation module is used for carrying out position transformation on the y groups of first data in response to the difference of the n reference matrixes, and the position transformation is used for changing the positions of the x y first data included in the y groups of first data; and carrying out data transformation on the y groups of first data after the position transformation according to a transformation mode of linear combination, and executing the same reference matrix corresponding to the plurality of first data of the linear combination.
In one possible implementation, the y sets of first data are a matrix of y rows and x columns, one set of first data corresponds to one row of the matrix, and first data belonging to the same codeword corresponds to one column of the matrix;
the transformation module is used for dividing the matrix into a first number of first sub-matrices and a second number of second sub-matrices, the first number and the second number are determined based on the ratio of y to x, any one of the first sub-matrices corresponds to x groups of first data, any one of the second sub-matrices corresponds to s groups of first data, and x < s <2*x;
for any one of the submatrices, cyclically transforming the group in which the first data of each column of the any one of the submatrices is located, so that the first data of the p-th row and the q-th column in the front x row of the any one of the submatrices after cyclic transformation is identical to the reference matrix corresponding to the first data of the p-th row and the p-th column, 0 is less than or equal to p < x,0 is less than or equal to q < x, and p is not equal to q, the first data of the t- (s-x) th column in the r+ (s-x) th row and the t- (s-x) th column in the rear x row of the any one of the submatrices after cyclic transformation is identical to the reference matrix corresponding to the first data of the t-th row and the r < x, and s-x is less than or equal to t- (s-x), and any one of the submatrices is the first submatrices or the second submatrices.
In one possible implementation manner, for any one of storage nodes corresponding to y sets of second data, the second data stored in response to any one of storage nodes is obtained based on the first data included in any one of first sub-matrices, and any one of storage nodes is a u-th storage node in x storage nodes storing x sets of second data, where 0 is equal to or less than u < x, and a minimum repair bandwidth of the u-th storage node is as follows:
Wherein,the u-th data is represented in n-x storage nodes other than the x storage nodes, not based on the number of storage nodes obtained by linear combination.
In one possible implementation manner, for any one of storage nodes corresponding to y sets of second data, the second data stored in response to any one of storage nodes is obtained based on the first data included in any one of second sub-matrices, and any one of storage nodes is a c-th storage node in s storage nodes storing s sets of second data, where 0 c < s-x is less than or equal to the following value, where the minimum repair bandwidth of the c-th storage node is less than or equal to:
and under the condition that s-x is less than or equal to c < x, the minimum repair bandwidth of the c-th storage node is less than or equal to the following value:
and under the condition that x is less than or equal to c < s, the minimum repair bandwidth of the c-th storage node is less than or equal to the following value:
wherein c is 0.ltoreq.c<In the case of s-x, the number of the cells,representing the number of storage nodes, among n-s storage nodes other than the s storage nodes, for which the c-th data is not based on a linear combination; at s-x is less than or equal to c<s, in the case of-> Representing the number of storage nodes in which the c- (s-x) th data is not based on a linear combination among n-s storage nodes other than the s storage nodes.
In one possible implementation, the pattern of the codeword includes any of an RS code, a partial repair code, or a piggybacked code.
In a third aspect, there is provided a network device comprising a processor coupled to a memory, the memory having stored therein at least one program instruction or code that is loaded and executed by the processor to cause the network device to implement the data storage method of any of the first aspects.
In a fourth aspect, a distributed storage system is provided, the distributed storage system including a first device for performing the data storage method of any one of the first aspects, and a plurality of storage nodes for storing data obtained by the first device.
In a fifth aspect, there is provided a computer readable storage medium having stored therein at least one program instruction or code which when loaded and executed by a processor causes a computer to implement the data storage method of any of the first aspects.
In a sixth aspect, there is provided a communication apparatus comprising: a transceiver, a memory, and a processor. The transceiver, the memory and the processor communicate with each other through an internal connection path, the memory is used for storing instructions, the processor is used for executing the instructions stored by the memory to control the transceiver to receive signals and control the transceiver to transmit signals, and when the processor executes the instructions stored by the memory, the processor is caused to execute the data storage method of any one of the first aspect.
Illustratively, the processor is one or more and the memory is one or more.
The memory may be integrated with the processor or separate from the processor, for example.
In a specific implementation process, the memory may be a non-transient (non-transitory) memory, for example, a Read Only Memory (ROM), which may be integrated on the same chip as the processor, or may be separately disposed on different chips, where the type of the memory and the manner of disposing the memory and the processor are not limited in this application.
In a seventh aspect, there is provided a computer program product comprising: computer program code which, when run by a computer, causes the computer to perform the data storage method of any of the first aspects.
In an eighth aspect, there is provided a chip comprising a processor for calling from a memory and executing instructions stored in the memory, to cause a network device on which the chip is mounted to perform the data storage method of any of the first aspects.
Illustratively, the chip further comprises: the input interface, the output interface, the processor and the memory are connected through an internal connecting passage.
It should be appreciated that, the technical solutions of the second aspect to the eighth aspect and the corresponding possible implementation manners of the present application may refer to the technical effects of the first aspect and the corresponding possible implementation manners, which are not described herein.
Drawings
FIG. 1 is a schematic diagram of an implementation environment of a data storage method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of an implementation environment of another data storage method according to an embodiment of the present application;
FIG. 3 is a flow chart of a method for storing data according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a data storage device according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of a network device according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of another network device according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of another network device according to an embodiment of the present application.
Detailed Description
The terminology used in the description of the embodiments of the present application is for the purpose of describing the examples of the present application only and is not intended to be limiting of the present application. Embodiments of the present application are described below with reference to the accompanying drawings.
In a distributed storage system, data is stored on a plurality of storage nodes. Because a single storage node is easy to fail, the data stored by the storage node is lost, redundant data needs to be stored in the system, so that when one storage node fails, the lost data can be recovered by downloading the data on the storage node which does not fail, and the reliability of data storage is further improved. For example, a piece of original data to be stored is encoded according to an MDS code pattern to obtain b pieces of check data related to the a piece of original data, the a piece of original data and the b pieces of check data are stored on M (m=a+b) storage nodes, and one storage node stores one piece of data. When one storage node fails, the lost data can be recovered by downloading a or more data from the rest of non-failed storage nodes.
In this case, the ratio of the number of downloaded data to the total number of original data is a/a=1, that is, the repair bandwidth of the storage node is 1. Since a larger repair bandwidth means a larger communication overhead, a data storage method is required so that the repair bandwidth of a storage node is smaller, and thus, when a certain storage node fails, lost data can be recovered with a smaller communication overhead.
In the related art, repair bandwidth of storage nodes is reduced by storing a plurality of data at each storage node. For example, a×b original data are encoded according to an MDS code pattern to obtain b codewords, where each codeword includes a original data and b check data, and a and b are positive integers. For any codeword, one piece of original data is stored on one system node, one piece of check data is stored on one check node, and the system node and the check node are both storage nodes. Each system node stores b pieces of original data, and each check node stores b pieces of check data. In the related art, b×b check data stored on b check nodes are subjected to data transformation to reduce repair bandwidths of the b check nodes. The scheme of the related technology can only reduce the repair bandwidth of b check nodes in the process of one-time data transformation, so that the application range of the related technology is limited, and the number of storage nodes for reducing the repair bandwidth is fixed. If the repair bandwidth reduction scheme is multiplexed on the system node, iteration needs to be performed on the basis of (a+b) data, the data volume for performing data transformation will increase exponentially, and the computational resources required for performing data transformation and the storage resources required for storing data subsequently are high.
The embodiment of the application provides a data storage method, which can reduce the repair bandwidth of any storage node in the process of one data transformation, has a wide application range, and can reduce the calculation resources required by executing the data transformation and the storage resources required by the follow-up. For ease of understanding, the terms involved in the embodiments of the present application will be explained first:
[ n, k ] MDS code: and encoding the k original data to obtain a code word, wherein the code word comprises n data, the n data comprises k original data and n-k check data, n and k are integers, and n is greater than k. For any one of the n data, the any one of the remaining n-1 data can be acquired based on any k data.
Dividing the number of packets: for any one of the n data, the any one data may include L sub-data, which may be represented by a column vector of length L, which is referred to as a fractional packet number, and L is a positive integer.
The data storage method provided by the embodiment of the application can be applied to the implementation environment shown in fig. 1. The implementation environment may be a distributed storage system, which may be a distributed storage system in a large data center. As shown in fig. 1, the implementation environment includes a first device 101, a second device 102, and a plurality of storage nodes 103, where the first device 101 and the second device 102 are communicatively connected, and the first device 101 and the plurality of storage nodes 103 are communicatively connected respectively. The manner of communication connection includes, but is not limited to, communication connection through a limited network or a wireless network.
Illustratively, the second device 102 is configured to send the original data and the underlying pattern to be stored to the first device 101. The first device 101 is configured to encode original data to be stored according to the basic code pattern, perform data transformation on a plurality of data obtained by encoding, and write the plurality of data after data transformation into the plurality of storage nodes 103. Each storage node 103 is used to store data written by the first device 101. The second device 102 is also illustratively configured to send instructions to the first device 101 to download the original data. The first device 101 is further configured to download data from the plurality of storage nodes 103 to restore the original data based on the received instruction to download the original data, and send the restored original data to the second device 102. In one possible implementation, when one or several storage nodes 103 fail, the first device 101 is further configured to download data from the non-failed storage node 103, so as to recover the data on the failed storage node 103, and write the recovered data to the new storage node 103.
The first device 101 may be a controller, the second device 102 may be a host, and the plurality of storage nodes 103 may be a plurality of memories, which may be the same or different, and the embodiments of the present application are not limited thereto.
Illustratively, the first device 101, the second device 102, and the plurality of storage nodes 103 may each include at least one functional module, with the functions of each device being implemented by the functional modules. For example, fig. 2 illustrates an implementation environment of another data storage method provided in an embodiment of the present application. The first device 101 is a controller, the second device 102 is a host, and the plurality of storage nodes 103 includes storage nodes 0 to n-1, where n is a positive integer. The controller comprises an encoder, a reconstructor and a healer, and the functions of the controller are respectively realized by the encoder, the reconstructor and the healer.
Next, each functional module will be described in connection with a process of storing original data to be stored to a plurality of storage nodes, a process of downloading data from the plurality of storage nodes to acquire the original data, and a process of restoring data on any one of the storage nodes.
Illustratively, the host sends k original data to be stored and the base pattern to the encoder, a process corresponding to the write data between the host and the encoder shown in FIG. 2. The encoder is used for encoding k pieces of original data to be stored according to the basic code pattern to obtain n pieces of data, performing data transformation on the n pieces of data obtained by encoding, and writing the n pieces of data after the data transformation into the storage nodes 0 to n-1. Where a data is written to a storage node, the n data is referred to as a stripe (stripe), the process corresponds to the write stripe shown in fig. 2.
Illustratively, the host sends an instruction to the reconstructor to download the original data, and the receiving reconstructor responds to the original data sent by the instruction to the host, which corresponds to the read data shown in FIG. 2. The reconstructor is configured to download data from the storage node 0 to the storage node n-1, respectively, based on the received instruction for downloading the original data, and restore the original data based on the downloaded data, where the process corresponds to the read stripe shown in fig. 2. And then, the reconstructor restores the original data and then sends the original data to the host.
Illustratively, when one or more of the storage nodes 0 to n-1 fail, the healer is configured to download data from the failed storage node, so as to recover the data stored by the failed storage node, and write the recovered data into the new storage node. Wherein the new storage node may be a newly inserted non-failed storage node. This process corresponds to the downloaded data shown in fig. 2 and the write data between the healer and the storage node. In one possible implementation, the reconstructor and healer may communicate. For example, the reconstructor receives an instruction to download the original data, and downloads the data from storage node 0 to storage node n-1, respectively, based on the instruction. If storage node 0 fails, the reconstructor may send a repair instruction to the healer, the repair instruction to instruct the healer to recover the data stored by storage node 0. The healer restores the data stored in the storage node 0 based on the received repair instruction, and after restoring the data stored in the storage node 0, sends a completion instruction to the reconstructor, where the completion instruction is used to indicate that the data stored in the storage node 0 is restored, and the reconstructor can download the restored data based on the completion instruction, so as to restore the original data.
It should be understood that each of the foregoing embodiments may include a plurality of devices, and the number of devices and the types of devices shown in fig. 1 and 2 are only the number of devices and the types of devices illustrated in the embodiments of the present application, which are not limited thereto.
The data storage method provided in the embodiment of the present application may be shown in fig. 3, and next, the data storage method provided in the embodiment of the present application will be described with reference to the implementation scenarios shown in fig. 1 and fig. 2. As shown in fig. 3, the method includes, but is not limited to, S301 to S303.
S301, a first device acquires n groups of first data obtained by x code words, wherein any code word in the x code words comprises n first data, the n first data are obtained by encoding k original data, one group of first data comprises one first data included in each code word in the x code words, x, n and k are positive integers, n is greater than or equal to k, and x is greater than or equal to 2 and less than or equal to n-k.
In one possible implementation, the x codewords are obtained by the first device encoding x×k raw data according to a basic code pattern. For example, a first device receives a source file and a base pattern to be stored, which are transmitted by a second device, the source file comprising x k raw data. The first device may encode the x k data according to the basic code pattern to obtain x codewords. Illustratively, the base pattern is determined by the second device based on the size of the source file and the storage capabilities of the distributed storage system. The base pattern may be an MDS code, such as an RS code. The base pattern may also be other patterns, such as a partial repair code (locally repairable code) or a piggyback (piggyback) code.
Illustratively, the base pattern is an [ n, k ] MDS pattern. The first device encodes x×k original data according to a basic code pattern to obtain x codewords, including: the first device encodes each k primary data according to an [ n, k ] MDS pattern to obtain a codeword, the codeword comprising n primary data, the n primary data comprising k primary data and n-k check data. It should be noted that, whether the first data is original data or check data, any one of the n first data may be obtained based on k first data of the remaining n-1 first data. By encoding every k original data of the x k original data, the first device can obtain x codewords to be stored. Illustratively, the original data and the check data may each include L sub-data, L being a positive integer, whether the original data or the check data.
In one possible implementation, for an ith codeword of the x codewords, the ith codeword is expressed as:
[d i,0 ,d i,1 ,...,d i,k-1 ,f 0 (d i ),f 1 (d i ),...,f n-k-1 (d i )] (0≤i<x)
wherein d i,0 ,d i,1 ,…,d i,k-1 K pieces of original data representing the ith codeword, the k pieces of original data being passed through d i Denoted by f j (d i ) Represents the j-th check data obtained by encoding k pieces of original data, wherein j is more than or equal to 0 <n-k. For ease of representation, n-k check data are represented as:
[d i,k ,d i,k+1 ,...,d i,n-1 ]=[f 0 (d i ),f 1 (d i ),...,f n-k-1 (d i )] (0≤i<x)
illustratively, for the W-th first data included in the ith codeword, the W-th first data is added to the W-th group, 0.ltoreq.W < n. And adding the W-th first data of each codeword into the W-th group to obtain n groups of first data. Illustratively, the case of each set of first data is shown in table 1.
TABLE 1
S302, the first device performs data transformation on y groups of first data to obtain y groups of second data, and enables the minimum restoration bandwidth of any one of storage nodes storing the y groups of second data to be smaller than a threshold value, wherein the minimum restoration bandwidth of any one storage node refers to the minimum bandwidth required for restoring the data stored by any one storage node, the minimum bandwidth is obtained based on the ratio of the number of downloaded data to the total number of original data, y is an integer, and x is less than or equal to y is less than or equal to n.
The y-set first data may be any y-set first data of the n-set first data, which is not limited in the embodiment of the present application. In one possible implementation, the first device performs data transformation on the y-group first data to obtain y-group second data, including but not limited to the following two ways.
In a first mode, data transformation is performed on the y-group first data according to a linear combination transformation mode, so as to obtain y-group second data.
The linear combination is used for associating a plurality of first data belonging to different codewords and corresponding to the same reference matrix in different groups, so that the obtained linear combination result is used for restoring the plurality of first data, one group of first data corresponds to one reference matrix, and the reference matrix is used for downloading one group of first data corresponding to the reference matrix from the corresponding storage node. The n reference matrices corresponding to the n sets of first data may be determined based on a pattern of the codeword. For example, the code pattern of the codeword corresponds to n reference matrices, and if the n reference matrices are the same, the data transformation may be directly performed on the y-group first data in a linear combination manner, so as to obtain the y-group second data.
In one possible implementation manner, the y sets of first data are a matrix of y rows and x columns, one set of first data corresponds to one row of the matrix, and first data belonging to the same codeword corresponds to one column of the matrix, and then the y sets of first data are subjected to data transformation according to a linear combination transformation manner to obtain y sets of second data, including but not limited to S3021 to S3023.
S3021, dividing the matrix into a first number of first sub-matrices and a second number of second sub-matrices, where the first number and the second number are determined based on a ratio of y to x, any one of the first sub-matrices corresponds to x sets of first data, any one of the second sub-matrices corresponds to S sets of first data, and x < S <2*x.
In one possible implementation, the ratio of y to x is an integer, i.e., y is an integer multiple of x. The first device can equally divide the matrix into a first number of first sub-matrices, i.e. a second number of 0. In this case, the first device can obtain x sets of second data by linearly combining the first data in the respective first sub-matrices, respectively, to obtain y sets of second data. In another possible implementation, the ratio of y to x is not an integer, i.e., y is not an integer multiple of x. The first device can divide the matrix into a first number of first sub-matrices and a second number of second sub-matrices, the second number being other than 0. In this case, the first device can obtain x sets of second data by linearly combining the first data in each first sub-matrix, and obtain s sets of second data by linearly combining the first data in each second sub-matrix. Of course, in case x < y <2*x, the first device may treat the matrix as a second sub-matrix, i.e. the first number is 0. And obtaining y groups of second data by linearly combining the first data included in the second submatrix.
Illustratively, S3022 is performed where the first number is greater than 0 and S3023 is performed where the second number is greater than 0.
S3022, for any first sub-matrix, performing first linear combination and second linear combination on the first data of the p-th row and the q-th column in any first sub-matrix and the first data of the p-th row and the p-th column, and respectively replacing the first data of the p-th row and the q-th column and the first data of the p-th row and the p-th column by the results of the first linear combination and the second linear combination to obtain x groups of second data, wherein the first linear combination and the second linear combination are different, 0 is less than or equal to p < x,0 is less than or equal to q < x, and p is not equal to q.
Illustratively, the first linear combination is mD 1 +vD 2 The second linear combination is gD 1 +fD 2 M, v, g, f are elements of finite field other than 0, and the product of m and f is not equal to the product of v and g, D 1 And D 2 For representing two data that perform a first linear combination and a second linear combination. For example, D 1 First data representing the p-th row and q-th column of the first submatrix, the D 2 First data representing the qth row and p column in any one of the first sub-matrices. The result of the first linear combination may be used to replace the D 1 The result of the second linear combination may be used to replace the D 2 . The finite field may be a Galois Field (GF), e.g., GF (2 8 ). Then m, v, g, f are GF (2 8 ) Is not 0, and the product of m and f is not equal to the product of v and g. For different submatrices, m, v, g, and f may be the same or different when the first linear combination and the second linear combination are performed, and m, v, g, and f may satisfy GF (2 8 ) The element of which is not 0, and the product of m and f is not equal to the product of v and g. The process of performing the first linear combination and the second linear combination once is referred to as a one-time data transformation process, and m, v, g, and f in each of the multiple data transformation processes performed on the same sub-matrix may be the same or different. M, v, g and f in each data transformation process can be satisfied as GF (2 8 ) The element of which is not 0, and the product of m and f is not equal to the product of v and g.
By performing the first linear combination and the second linear combination of the first data of the p-th row and the q-th column in any one of the first sub-matrices and the first data of the p-th row and the p-th column, the first data of the p-th row and the q-th column can be correlated. Further, when the results of the first linear combination and the second linear combination are replaced with two pieces of second data, respectively, the two pieces of first data can be restored based on either one of the pieces of second data.
Since the first data of the qth row and the p column and the first data of the p row and the p column are two first data belonging to the same codeword in different groups, the first data of the qth row and the p column can be used for obtaining the first data of the p row and the p column by combining the coding mode of the codeword. That is, for the second data of the qth row and p-th column, the second data can also be used to acquire the first data of the p-th row and p-th column. Thus, in the event that the storage node storing the data of the p-th row fails, the second data of the p-th column of the q-th row can be used for acquiring the first data of the q-th column of the p-th row and the first data of the p-th column of the q-th row as well as the first data of the p-th column of the p-th row. That is, for a storage node storing data of a p-th row, first data of a p-th column of the p-th row and second data of a q-th column of the p-th row stored by the storage node can be restored based on one second data of the p-th column of the q-th row. Compared with the method in the related art, the amount of data required for restoring the data of the p-th row is smaller in the method provided by the embodiment of the invention, compared with the method in which the first data of the p-th row and the p-th column is restored based on the first data of the q-th row and the p-th column.
For any one of the second sub-matrices, since any one of the second sub-matrices corresponds to s groups of first data, x < s <2*x, an operation similar to that of any one of the first sub-matrices can be performed on the first x groups of first data in the s groups of first data to obtain x groups of second data, and then an operation similar to that of any one of the first sub-matrices can be performed on the last x- (s-x) groups of second data in the obtained x groups of second data and the remaining s-x groups of first data, thereby obtaining s groups of second data. The process of data transforming any one of the second sub-matrices is described as S3023.
S3023, for any one of the second sub-matrices, performing first linear combination and second linear combination on the first data of the p-th row and the q-th column in the previous x rows of the any one of the second sub-matrices, and respectively replacing the first data of the p-th row and the q-th column and the first data of the p-th row and the p-th column with the results of the first linear combination and the second linear combination to obtain x groups of second data, wherein p is more than or equal to 0 and less than or equal to p < x, q is more than or equal to 0 and is less than or equal to q; and (3) carrying out first linear combination and second linear combination on the data of the t- (s-x) th column of the r+ (s-x) th row and the data of the r th column of the t th row in the rear x row of any second submatrix, and respectively replacing the data of the t- (s-x) th column of the r+ (s-x) th row and the data of the r th column of the t th row by the results of the first linear combination and the second linear combination to obtain x groups of second data, wherein r < x, s-x < t < s, and r is not equal to t- (s-x), and the data of the t- (s-x) th column of the r+ (s-x) th row and the data of the r th column of the t row are the second data of the front x row or the first data of the rear s-x row in any second submatrix. The first linear combination and the second linear combination may be referred to the relevant description in S3022, and will not be described here again.
Illustratively, the any one of the second sub-matrices includes 3 rows and 2 columns, and the case of the first data included in the any one of the second sub-matrices is shown in table 2.
TABLE 2
Group 0 d 0,0 d 1,0
Group 1 d 0,1 d 1,1
Group 2 d 0,2 d 1,2
In a first linear combination as mD 1 +vD 2 The second linear combination is gD 1 +fD 2 In the case of (2), the first 2 rows of the first data of any one of the second sub-matrices are subjected to data conversion, and the result of the data conversion is shown in table 3.
TABLE 3 Table 3
Group 0 d 0,0 gd 0,1 +fd 1,0
Group 1 md 0,1 +vd 1,0 d 1,1
Group 2 d 0,2 d 1,2
And then the second data of the 1 st row and the first data of the 2 nd row of any second sub-matrix are subjected to data conversion, and the data conversion result is shown in table 4.
TABLE 4 Table 4
Group 0 d 0,0 gd 0,1 +fd 1,0
Group 1 md 0,1 +vd 1,0 gd 0,2 +fd 1,1
Group 2 md 0,2 +vd 1,1 d 1,2
In table 4, m, v, g, and f used for data conversion between the second data of the 1 st row and the first data of the 2 nd row of any one of the first sub-matrices are the same as m, v, g, and f used for data conversion between the first data of the first 2 nd row of any one of the second sub-matrices.
For example, m, v, g, f used in data conversion of the second data of the 1 st row and the first data of the 2 nd row of the any one of the first sub-matrices may be different from m, v, g, f used in data conversion of the first data of the first 2 nd row of the any one of the second sub-matrices. For example, m, v, g, f used in converting the first data of the first 2 rows of any one of the second sub-matrices are respectively represented as m 0 ,v 0 ,g 0 ,f 0 When the second data of the 1 st row and the first data of the 2 nd row of any one of the first sub-matrixes are subjected to data conversion, m, v, g and f used in the data conversion are respectively expressed as m 1 ,v 1 ,g 1 ,f 1 。m 0 ,v 0 ,g 0 ,f 0 And m 1 ,v 1 ,g 1 ,f 1 To be able to satisfy all GF (2) 8 ) Two sets of elements of the condition that are not 0 and that the product of m and f is not equal to the product of v and g, and the two sets of elements are different. In this case, the result of the data conversion may be as shown in table 5.
TABLE 5
Group 0 d 0,0 g 0 d 0,1 +f 0 d 1,0
Group 1 m 0 d 0,1 +v 0 d 1,0 g 1 d 0,2 +f 1 d 1,1
Group 2 m 1 d 0,2 +v 1 d 1,1 d 1,2
By linearly combining the first data in the first x rows and then linearly combining the second data and the first data in the subsequent x rows, any set of second data obtained based on any one of the second sub-matrices can be made to include data obtained based on linear combination.
In a second mode, in response to the difference of the n reference matrices, performing position transformation on the y-group first data, wherein the position transformation is used for changing the positions of the x-y first data included in the y-group first data; and carrying out data transformation on the y groups of first data after the position transformation according to a transformation mode of linear combination, and executing the same reference matrix corresponding to the plurality of first data of the linear combination.
In an exemplary case where n reference matrices corresponding to the code patterns of the code words are different, the y-group first data is subjected to position conversion such that the reference matrices corresponding to the plurality of first data for which linear combination is performed are the same. So that after the y-group second data and the n-y-group first data are stored in the corresponding storage nodes, the corresponding reference matrix based on the plurality of first data, which are subjected to linear combination, can be downloaded to the second data obtained based on the plurality of first data.
In one possible implementation, the y sets of first data are a matrix of y rows and x columns, one set of first data corresponding to one row of the matrix, and first data belonging to the same codeword corresponding to one column of the matrix. The y sets of first data are position transformed, including but not limited to S3024 and S3025.
S3024, dividing the matrix into a first number of first sub-matrices and a second number of second sub-matrices, where the first number and the second number are determined based on a ratio of y to x, any one of the first sub-matrices corresponds to x sets of first data, any one of the second sub-matrices corresponds to S sets of first data, and x < S <2*x.
The principle of the content related to the division of the matrix into the first number of first sub-matrices and the second number of second sub-matrices in S3024 is the same as that of S3021, and will not be described herein.
S3025, for any one of the submatrices, cyclically transforming the group in which the first data of each column of the any one of the submatrices is located, so that the first data of the p-th row and the q-th column in the front x row of the any one of the submatrices after cyclic transformation is identical to the reference matrix corresponding to the first data of the q-th row and the p-th column, 0 is less than or equal to p < x,0 is less than or equal to q < x, and p is not equal to q, the first data of the t- (S-x) th column in the r+ (S-x) th row of the any one of the submatrices after cyclic transformation is identical to the reference matrix corresponding to the first data of the r-th row and the r-th column, 0 is less than or equal to r < x, S-x is less than or equal to t- (S-x), and any one of the submatrices is the first submatrix or the second submatrices.
Illustratively, the matrix includes 5 rows and 2 columns, with the first 2 rows being the first sub-matrix and the second 3 rows being the second sub-matrix, the matrix including the first data as shown in Table 6.
TABLE 6
The case of the first data after the position transformation is performed on the first sub-matrix and the second sub-matrix, respectively, may be as shown in table 7.
TABLE 7
Group 0 d 0,0 d 1,1
Group 1 d 0,1 d 1,0
Group 2 d 0,2 d 1,3
Group 3 d 0,3 d 1,4
Group 4 d 0,4 d 1,2
The above table 7 is merely a case of the first data after the position conversion illustrated in the embodiment of the present application, and is not intended to limit the manner of position conversion in the embodiment of the present application. Any of the same ways that enables the reference matrices corresponding to the plurality of first data that are linearly combined to be performed can be applied to the process of the positional transformation. Still referring to the matrix shown in table 6 as an example, the first data after the position conversion is performed on the first sub-matrix and the second sub-matrix, respectively, may be as shown in table 8.
TABLE 8
Group 0 d 0,1 d 1,0
Group 1 d 0,0 d 1,1
Group 2 d 0,3 d 1,4
Group 3 d 0,4 d 1,2
Group 4 d 0,2 d 1,3
By performing position transformation on the first number of first sub-matrices and the second sub-matrices, respectively, the reference matrices corresponding to the plurality of first data for performing linear combination can be made identical. Thus, the data conversion can be performed on the y-group first data after the position conversion according to the linear combination conversion mode. The process of performing data transformation according to the linear combination transformation scheme is the same as the related content principle in scheme one, and will not be described again here. Illustratively, when n reference matrices are identical, since the n reference matrices are identical, the identical reference matrices corresponding to the plurality of first data for performing linear combination can be realized without performing the process of the position transformation. The method provided by the embodiment of the application can be flexibly applicable to the situation that n reference matrixes are the same or different, and the application range of the method provided by the embodiment of the application is wider.
S303, the first device stores the y groups of second data and the n-y groups of first data on corresponding storage nodes respectively.
Illustratively, the y-set of second data and the n-y-set of first data have an order, and the first device stores an R-th set of data of the y-set of second data and the n-y-set of first data on an R-th storage node, 0.ltoreq.R < n. Therefore, when one storage node fails, the lost data can be recovered by downloading the data stored by other non-failed storage nodes.
Since there may be a plurality of data download schemes each capable of recovering lost data, the conditions of the data to be downloaded may be different in each data download scheme, and the amount of data to be downloaded may be different. Illustratively, the ratio of the minimum value of the number of downloaded data to the total number of original data is taken as the minimum bandwidth.
In one possible implementation, the threshold is (kx)/(kx), i.e. the threshold is 1. Illustratively, for any one of the storage nodes to which the y sets of second data correspond, the minimum repair bandwidth of that any one storage node includes, but is not limited to, the following two cases.
In the first case, the second data stored in any one of the storage nodes is obtained based on the first data included in any one of the first sub-matrices, and any one of the storage nodes is a u-th storage node among x storage nodes storing x groups of the second data, wherein 0 is equal to or less than u < x.
The minimum repair bandwidth for the u-th storage node is shown in equation (1).
Wherein,refers to rounding down the ratio of y to x. For example, y is equal to 9,x, etcAt 2->Equal to 4.
Illustratively, the second data stored by the u-th storage node is a u-th set of second data from the x sets of second data obtained based on the any one of the first sub-matrices. When the u-th storage node fails, the u-th set of second data can be restored by downloading the three parts of data described in (11) - (13) below.
(11) Downloading the (u) th data stored in k storage nodes except for x storage nodes storing the x groups of second data in n storage nodes, wherein the n storage nodes are storage nodes storing the y groups of second data and the n-y groups of first data. The number of data downloaded by the part is k. Illustratively, when k storage nodes are determined, the u-th data is preferentially selected as a storage node that has not undergone linear combination.
(12) If one of the k pieces of data in (11) is obtained based on linear combination, another piece of data obtained in the same linear combination process is downloaded.
By way of example only, and in an illustrative,the u-th data is represented in n-x storage nodes other than the x storage nodes, not based on the number of storage nodes obtained by linear combination. Thus, at +. >In the case of (2), the k data downloaded in the above (11) include data obtained based on linear combination, whereby the number of data downloaded in the (12) is equal to +.> At->In the case of (2), the k pieces of data downloaded in the above (11) may not be data obtained by linear combination, and thus the number of pieces of data downloaded in the above (12) may be 0.
(13) And downloading the (u) th data in the rest x-1 group of second data except the (u) th group of second data in the x group of second data. The amount of data downloaded by this portion is x-1.
In a second aspect, the second data stored in any one of the storage nodes is obtained based on the first data included in any one of the second sub-matrices, and any one of the storage nodes is a c-th storage node of s storage nodes storing s sets of second data.
According to different conditions of the c-th storage node, the minimum repair bandwidth of the c-th storage node is smaller than or equal to the values shown in the following formulas (2) to (4).
And under the condition that c < s-x is not more than 0, the minimum repair bandwidth of the c-th storage node is not more than a value shown in the following formula (2).
And under the condition that s-x is less than or equal to c < x, the minimum repair bandwidth of the c-th storage node is less than or equal to a value shown in the following formula (3).
And under the condition that x is less than or equal to c < s, the minimum repair bandwidth of the c-th storage node is less than or equal to a value shown in the following formula (4).
Illustratively, the second data stored by the c-th storage node is the c-th set of second data from the s-th set of second data obtained based on the any one of the first sub-matrices, 0.ltoreq.c < s. When the c-th storage node fails, the c-th set of second data can be restored by downloading the three parts of data described in (21) - (23) below.
(21) If 0.ltoreq.c < s-x, downloading the c-th data stored in k storage nodes except for s storage nodes storing the s-group of second data from the n storage nodes. The n storage nodes are storage nodes for storing y groups of second data and n-y groups of first data. Illustratively, when k storage nodes are determined, the c-th data is preferentially selected as a storage node that has not undergone linear combination.
If s-x is less than or equal to c < s, downloading the c- (s-x) th data stored in k-1 storage nodes except the s storage nodes in the n storage nodes and the c- (s-x) th data stored in the 0 th storage node in the s storage nodes. Illustratively, when determining k-1 storage nodes, the c- (s-x) th data is preferentially selected as the storage node that has not undergone linear combination.
The number of data downloaded by the part is k.
(22) If one of the k pieces of data in (21) is obtained based on linear combination, another piece of data obtained in the same linear combination process is downloaded.
According to the different case of c, the amount of data downloaded by the part includes, but is not limited to, the following case A1 to case A3.
In case A1, 0.ltoreq.c < s-x.
Illustratively, for case A1,the c-th data is represented among n-s storage nodes other than the s storage nodes, not based on the number of storage nodes obtained by linear combination. Thus, inIn the case of (2), the number of data downloaded by the (22) is equal to +.>At->In the case of (2), the number of data downloaded by the part is 0.
In case A2, s-x is less than or equal to c < x.
Illustratively, at s-x.ltoreq.c<In the case of s, the number of the s,representing the number of storage nodes in which the c- (s-x) th data is not based on a linear combination among n-s storage nodes other than the s storage nodes.
For case A2, ifThe amount of data downloaded in the portion is equal to If->The number of data downloaded by this section is 0.
In case A3, x is less than or equal to c < s.
For case A3, ifThe amount of data downloaded in the portion is equal to If->The number of data downloaded by this section is 0.
(23) The c-th group of second data comprises at least one second data obtained based on linear combination, and the second data corresponding to the at least one second data in the s-th group of second data is downloaded.
Illustratively, the linear combination performed on the data of the first x rows of any one of the second sub-matrices in S3023 is referred to as a first data transformation process, that is, the first data transformation process refers to a process of performing first and second linear combinations of the first data of the p-th row and the q-th column in the first x rows of any one of the second sub-matrices, 0.ltoreq.p < x, 0.ltoreq.q < x, and p is not equal to q. The linear combination performed on the data of the rear x rows of the arbitrary second submatrix is referred to as a second data transform process, that is, the second data transform process refers to a process of performing first and second linear combinations on the data of the t- (s-x) th column of the (s-x) th row and the data of the r (t) th column of the rear x rows of the arbitrary second submatrix, where 0.ltoreq.r < x, s-x.ltoreq.t < s, and r is not equal to t- (s-x).
Since x < s <2*x, there will be overlap between the first x rows and the last x rows, that is, the (s-x) th row to the x-1 th row in the first x rows also belong to the last x rows. Thus, for a certain set of second data from the (s-x) th to x-1 th sets of second data corresponding to the (s-x) th to x-1 th rows, the set of second data is obtained by performing the first data conversion process and the second data conversion process. That is, one or a few of the second data in the set of second data may be obtained after performing linear combination a plurality of times. Thus, the case of the second data corresponding to the at least one second data will be described based on the case of c. Correspondingly, the amount of data downloaded by the portion includes, but is not limited to, the following cases B1 to B3.
In case B1, 0.ltoreq.c < s-x.
And regarding any one of the at least one second data, taking the second data indicated by the first rule corresponding to the first data transformation process as one second data corresponding to the any one second data. The first rule corresponding to the first data transformation process refers to a rule of which two first data are subjected to the first data transformation process. Illustratively, in the case where the any one of the second data is the qth second data of the qth second data, the second data corresponding to the any one of the second data indicated by the first rule is the qth second data of the qth second data, 0+.q < x, and c is not equal to q. Second data corresponding to each of the at least one second data indicated by the first rule are downloaded, respectively, where the number of downloaded data is x-1.
For any one of the x-1 second data, if the any one of the second data belongs to the last x groups of second data in the s groups of second data, the any one of the second data is indicated to be the data on which the second data transformation process is performed. In this case, the second data corresponding to the any one of the second data indicated by the second rule is downloaded. The second rule corresponds to a second data transformation process, and the second rule refers to a rule on which two data the second data transformation process is performed. Illustratively, where the any one of the second data is the t- (s-x) th second data of the (r+ (s-x) th group, the second data indicated by the second rule corresponding to the t- (s-x) th second data of the (r+ (s-x) th group is the second data of the (t) th column of the (t) th group, 0.ltoreq.r < x, s-x.ltoreq.t < s, and r is not equal to t- (s-x). The amount of data downloaded here is equal to or less than 2*x-s.
For the case B1, the number of the downloaded second data is equal to or less than (x-1) + (2*x-s) = 3*x-s-1, and the downloaded second data are all second data corresponding to the at least one second data.
In case B2, s-x is less than or equal to c < x.
And downloading, for any one of the at least one second data, second data corresponding to the any one second data indicated by the second rule. The amount of data downloaded here is x-1. And downloading one second data corresponding to the c- (s-x) th second data of the c-th group indicated by the first rule. The number of data downloaded here is 1.
And for any one of the x-1 second data indicated by the second rule, if the any one of the x-1 second data belongs to the first x groups of second data in the s groups of second data, downloading the second data corresponding to the any one of the x-1 second data indicated by the first rule. The amount of data downloaded here is equal to or less than 2*x-s.
For case B2, the number of the downloaded second data is equal to or less than (x-1) +1+ (2*x-s) = 3*x-s, and the downloaded second data are all second data corresponding to the at least one second data.
In case B3, x.ltoreq.c < s.
And downloading, for any one of the at least one second data, second data corresponding to the any one second data indicated by the second rule. The amount of data downloaded here is x-1.
And for any one of the x-1 second data, if the any one of the x-1 second data belongs to the first x groups of second data in the s groups of second data, downloading the second data corresponding to the any one of the second data, which is indicated by the first rule. The amount of data downloaded here is equal to or less than 2*x-s.
For case B3, the number of the downloaded second data is equal to or less than (x-1) + (2*x-s) = 3*x-s-1, and the downloaded second data are all second data corresponding to the at least one second data.
Based on the above cases B1 to B3, the number of data downloaded in the step (23) is 3*x-s or less.
From the contents of the first case and the second case, when x is kept unchanged, the minimum repair bandwidth of any one storage node increases as y increases, and when y is kept unchanged, the minimum repair bandwidth of any one storage node decreases as x increases. Thus, when y is equal to n and x is equal to 2, the minimum repair bandwidth for any one storage node will reach a maximum. Therefore, when 2.ltoreq.x.ltoreq.n-k, n.gtoreq.6, the minimum repair bandwidth of any one storage node is 1 or less. That is, in the method provided by the embodiment of the present application, the repair bandwidth of the storage node storing the second data is smaller.
Illustratively, for n sets of storage nodes storing y sets of second data and n-y sets of first data, x×k original data can be recovered based on data stored on any k storage nodes in the n sets of storage nodes. For example, when n-k storage nodes on the n storage nodes fail, x×k original data can be recovered by downloading data stored by the remaining non-failed k storage nodes. Of course, data stored on the failed n-k storage nodes can also be recovered based on the downloaded data stored on the k storage nodes.
In the method provided by the embodiment of the application, the y-group second data is obtained by carrying out data transformation on the y-group first data, so that the minimum repair bandwidth of y storage nodes for storing the y-group second data is smaller than the threshold value. Therefore, when one of the y storage nodes fails, lost data can be recovered with small communication overhead.
In addition, as x is more than or equal to 2 and less than or equal to n-k, the method provided by the embodiment of the application can be suitable for different amounts of original data, is flexible, and requires less computing resources for executing data transformation and less storage resources for storing data. Furthermore, x is less than or equal to y is less than or equal to n, and the method provided by the embodiment of the application can be applied to any y storage nodes in n storage nodes, and the application range of the method provided by the embodiment of the application is wider.
The method provided in the embodiments of the present application may be used iteratively, for example, the method is performed once to reduce the repair bandwidth of some storage nodes, and the method is performed once to reduce the repair bandwidth of the remaining storage nodes. Therefore, the application mode of the method is flexible.
Next, taking the basic code pattern as the RS code, and k=6, n= 9,x =2 as an example, the data storage method provided in the embodiment of the present application will be described with reference to the implementation environment shown in fig. 1.
Illustratively, the first device encodes 2*6 =12 raw data resulting in 2 codewords, any one codeword comprising 9 first data. The first device obtains 9 sets of first data derived from 2 codewords, a set of first data comprising one first data comprised by each of the 2 codewords. The 9 sets of first data may be as shown in table 9.
TABLE 9
Group 0 d 0,0 d 1,0
Group 1 d 0,1 d 1,1
Group 2 d 0,2 d 1,2
Group 3 d 0,3 d 1,3
Group 4 d 0,4 d 1,4
Group 5 d 0,5 d 1,5
Group 6 d 0,6 dd 1,6
Group 7 d 0,7 d 1,7
Group 8 d 0,8 d 1,8
Wherein d i,0 ,d i,1 ,...,d i,8 Indicating that the i-th codeword includes 9 first data, i=0 or 1.d, d i,0 ,d i,1 ,...,d i,5 Represents the 6 original data encoded to obtain the ith codeword, d i,6 ,d i,7 ,d i,8 Representing the 3 parity data encoded. The d is i,6 ,d i,7 ,d i,8 And d i,0 ,d i,1 ,...,d i,5 The following formula (5) is satisfied.
/>
Wherein the [ A ]]The method is characterized in that the method is an n-k row and k column coding matrix which is determined based on the mode of coding k original data to obtain n-k check data, and one code pattern corresponds to one coding matrix. For example, for the RS code of k=6, n=9, the coding matrix is as shown in equation (6), where α is GF (2 8 ) Is a generator polynomial f (x) =x 8 +x 4 +x 3 +x 2 Solution to +1.
Illustratively, the first device performs data conversion on all of the n sets of first data to obtain n sets of second data. It should be noted that, since n reference matrices corresponding to the RS code are the same, when the first device performs data transformation on n groups of first data, the first device may perform transformation on n groups of first data only in a transformation manner of linear combination. The n sets of second data after the data transformation are performed are shown in table 10.
Table 10
Group 0 d 0,0 gd 0,1 +fd 1,0
Group 1 md 0,1 +vd 1,0 d 1,1
Group 2 d 0,2 gd 0,3 +fd 1,2
Group 3 md 0,3 +vd 1,2 d 1,3
Group 4 d 0,4 gd 0,5 +fd 1,4
Group 5 md 0,5 +vd 1,4 d 1,5
Group 6 d 0,6 gd 0,7 +fd 1,6
Group 7 md 0,7 +vd 1,6 gd 0,8 +fd 1,7
Group 8 md 0,8 +vd 1,7 d 1,8
In one possible implementation, the sequence is selected from GF (2 8 ) The included elements determine m, v, g, f. I.e. from GF (2 8 ) Is a generator polynomial f (x) =x 8 +x 4 +x 3 +x 2 M, v, g, f are determined in the solution of +1. Illustratively, let v, g, f be 1 and m be α.
The first device stores n sets of second data in the storage nodes 0 to n-1, respectively, and the second data stored in the respective storage nodes are shown in table 11.
TABLE 11
Storage node 0 Group 0 d 0,0 d 0,1 +d 1,0
Storage node 1 Group 1 αd 0,1 +d 1,0 d 1,1
Storage node 2 Group 2 d 0,2 d 0,3 +d 1,2
Storage node 3 Group 3 αd 0,3 +d 1,2 d 1,3
Storage node 4 Group 4 d 0,4 d 0,5 +d 1,4
Storage node 5 Group 5 αd 0,5 +d 1,4 d 1,5
Storage node 6 Group 6 d 0,6 d 0,7 +d 1,6
Storage node 7 Group 7 αd 0,7 +d 1,6 d 0,8 +d 1,7
Storage node 8 Group 8 αd 0,8 +d 1,7 d 1,8
Illustratively, if storage node 0 fails, lost data may be recovered by downloading the data shown in table 12.
Table 12
Storage node 1 αd 0,1 +d 1,0
Storage node 2 d 0,2 d 0,3 +d 1,2
Storage node 3 αd 0,3 +d 1,2
Storage node4 d 0,4 d 0,5 +d 1,4
Storage node 5 αd 0,5 +d 1,4
Storage node 6 d 0,6 d 0,7 +d 1,6
Storage node 7 αd 0,7 +d 1,6
Wherein, based on alpha d 0,3 +d 1,2 And d 0,3 +d 1,2 Obtaining d 0,3 Based on alpha d 0,5 +d 1,4 And d 0,5 +d 1,4 Obtaining d 0,5 Based on alpha d 0,7 +d 1,6 And d 0,7 +d 1,6 Obtaining d 0,7 . Based on d 0,2 To d 0,7 And equation (6) gives d 0,0 And d 0,1 Based on alpha d 0,1 +d 1,0 Can obtain d 1,0 . Whereby d stored by storage node 0 can be restored 0,0 And d 0,1 +d 1,0 The amount of data downloaded is 10 and the minimum repair bandwidth of the storage node 0 is 5/6.
Illustratively, a procedure for recovering x×k pieces of original data will be described. For example, storage node 0, storage node 2, and storage node 8 all fail, in which case the 2*6 =12 original data can be recovered by downloading the data stored by the remaining 6 storage nodes that have not failed, the downloaded data being as shown in table 13.
TABLE 13
Storage node 1 αd 0,1 +d 1,0 d 1,1
Storage node 3 αd 0,3 +d 1,2 d 1,3
Storage node 4 d 0,4 d 0,5 +d 1,4
Storage node 5 αd 0,5 +d 1,4 d 1,5
Storage node 6 d 0,6 d 0,7 +d 1,6
Storage node 7 αd 0,7 +d 1,6 d 0,8 +d 1,7
Since the downloaded data includes two second data obtained by performing the first linear combination and the second linear combination once, for example, αd 0,5 +d 1,4 And d 0,5 +d 1,4 . This table 13 may be simplified as table 14.
TABLE 14
αd 0,1 +d 1,0 d 1,1
αd 0,3 +d 1,2 d 1,3
d 0,4 d 1,4
d 0,5 d 1,5
d 0,6 d 1,6
d 0,7 d 0,8 +d 1,7
Based on equation (6), the following set of equations can be obtained, where d 0,0 、d 0,1 、d 0,2 、d 0,3 、d 1,0 And d 1,2 Is an unknown number.
The equation set relates to d 0,0 、d 0,1 、d 0,2 、d 0,3 、d 1,0 And d 1,2 The coefficient matrix of (a) is as follows:
since the coefficient matrix is full of rank, the above equation set can be solved, so that the 12 pieces of original data can be restored.
Fig. 4 is a schematic structural diagram of a data storage device according to an embodiment of the present application. The apparatus is applied to the first device shown in fig. 1 described above or the controller shown in fig. 2 described above, for example. The data storage device shown in fig. 4 is capable of performing all or part of the operations performed by the first device or the controller based on the following modules shown in fig. 4. It should be understood that the apparatus may include additional modules than those shown or omit some of the modules shown therein, which is not limiting in this embodiment of the application. As shown in fig. 4, the apparatus includes:
an obtaining module 401, configured to obtain n groups of first data obtained by x codewords, where any one codeword of the x codewords includes n first data, where the n first data is obtained by encoding k original data, one group of first data includes one first data included in each codeword of the x codewords, x, n, k are positive integers, n > k is 2.ltoreq.x.ltoreq.n-k;
The transformation module 402 is configured to perform data transformation on the y sets of first data to obtain y sets of second data, so that a minimum repair bandwidth of any one of storage nodes storing the y sets of second data is smaller than a threshold value, where the minimum repair bandwidth of any one storage node is a minimum bandwidth required for recovering data stored by the any one storage node, the minimum bandwidth is obtained based on a ratio of a number of downloaded data to a total number of original data, y is an integer, and x is less than or equal to y is less than or equal to n;
the storage module 403 is configured to store the y-group second data and the n-y-group first data on corresponding storage nodes respectively.
In one possible implementation manner, the transforming module 402 is configured to perform data transformation on the y sets of first data according to a linearly combined transformation manner to obtain y sets of second data, where the linearly combined transformation manner is configured to correlate a plurality of first data belonging to different codewords and corresponding to the same reference matrix in different sets, so that the obtained linearly combined result is used to restore the plurality of first data, where one set of first data corresponds to one reference matrix, and the reference matrix is used to download one set of first data corresponding to the reference matrix from a corresponding storage node.
In one possible implementation, the y sets of first data are a matrix of y rows and x columns, one set of first data corresponds to one row of the matrix, and first data belonging to the same codeword corresponds to one column of the matrix;
a transformation module 402, configured to divide the matrix into a first number of first sub-matrices and a second number of second sub-matrices, where the first number and the second number are determined based on a ratio of y to x, any one of the first sub-matrices corresponds to x sets of first data, any one of the second sub-matrices corresponds to s sets of first data, and x < s <2*x;
for any first sub-matrix, performing first linear combination and second linear combination on the first data of the p-th row and the q-th column in any first sub-matrix and the first data of the p-th row and the p-th column, respectively replacing the first data of the p-th row and the q-th column and the first data of the p-th row and the p-th column by the results of the first linear combination and the second linear combination to obtain x groups of second data, wherein the first linear combination and the second linear combination are different, p < x is not less than 0, q < x is not less than 0, and p is not equal to q;
for any second sub-matrix, performing first linear combination and second linear combination on the first data of the p-th row and the q-th column in the front x rows of any second sub-matrix, and respectively replacing the first data of the p-th row and the q-th column and the first data of the p-th row and the p-th column by the results of the first linear combination and the second linear combination to obtain x groups of second data, wherein p is more than or equal to 0 and less than x, q is more than or equal to 0 and less than or equal to q and is not equal to q;
And (3) carrying out first linear combination and second linear combination on the data of the t- (s-x) th column of the r+ (s-x) th row and the data of the r th column of the t th row in the rear x row of any second submatrix, respectively replacing the data of the t- (s-x) th column of the r+ (s-x) th row and the data of the r th column of the t th row by the results of the first linear combination and the second linear combination to obtain x groups of second data, wherein r < x > is 0, s-x < t < s, and r is not equal to t- (s-x), and the data of the t- (s-x) th column of the r+ (s-x) th row and the data of the r th column of the t row are the first data of the front x row or the first data of the rear s-x row in any second submatrix.
In one possible implementation, the first linear combination is mD 1 +vD 2 The second linear combination is gD 1 +fD 2 M, v, g, f are elements of finite field other than 0, and the product of m and f is not equal to the product of v and g, D 1 And D 2 For representing two data that perform a first linear combination and a second linear combination.
In one possible implementation, a set of first data corresponds to a reference matrix, and the reference matrix is used for downloading the set of first data corresponding to the reference matrix from the corresponding storage node;
a transformation module 402, configured to perform a position transformation on the y-set of first data in response to the n reference matrices being different, where the position transformation is used to change positions of x×y first data included in the y-set of first data; and carrying out data transformation on the y groups of first data after the position transformation according to a transformation mode of linear combination, and executing the same reference matrix corresponding to the plurality of first data of the linear combination.
In one possible implementation, the y sets of first data are a matrix of y rows and x columns, one set of first data corresponds to one row of the matrix, and first data belonging to the same codeword corresponds to one column of the matrix;
a transformation module 402, configured to divide the matrix into a first number of first sub-matrices and a second number of second sub-matrices, where the first number and the second number are determined based on a ratio of y to x, any one of the first sub-matrices corresponds to x sets of first data, any one of the second sub-matrices corresponds to s sets of first data, and x < s <2*x;
for any one of the submatrices, cyclically transforming the group in which the first data of each column of the any one of the submatrices is located, so that the first data of the p-th row and the q-th column in the front x row of the any one of the submatrices after cyclic transformation is identical to the reference matrix corresponding to the first data of the p-th row and the p-th column, 0 is less than or equal to p < x,0 is less than or equal to q < x, and p is not equal to q, the first data of the t- (s-x) th column in the r+ (s-x) th row and the t- (s-x) th column in the rear x row of the any one of the submatrices after cyclic transformation is identical to the reference matrix corresponding to the first data of the t-th row and the r < x, and s-x is less than or equal to t- (s-x), and any one of the submatrices is the first submatrices or the second submatrices.
In one possible implementation manner, for any one of storage nodes corresponding to y groups of second data, the second data stored in response to any one of storage nodes is obtained based on the first data included in any one of first sub-matrices, and any one of storage nodes is a u-th storage node in x storage nodes storing x groups of second data, where u < x is 0 and less, and the minimum repair bandwidth of the u-th storage node is as follows:
Wherein,the u-th data is represented in n-x storage nodes other than the x storage nodes, not based on the number of storage nodes obtained by linear combination.
In one possible implementation manner, for any one of storage nodes corresponding to y sets of second data, the second data stored in response to any one of storage nodes is obtained based on the first data included in any one of second sub-matrices, and any one of storage nodes is a c-th storage node in s storage nodes storing s sets of second data, where 0 c < s-x is less than or equal to the following value, where the minimum repair bandwidth of the c-th storage node is less than or equal to:
and under the condition that s-x is less than or equal to c < x, the minimum repair bandwidth of the c-th storage node is less than or equal to the following value:
and under the condition that x is less than or equal to c < s, the minimum repair bandwidth of the c-th storage node is less than or equal to the following value:
wherein c is 0.ltoreq.c<In the case of s-x, the number of the cells,representing the number of storage nodes, among n-s storage nodes other than the s storage nodes, for which the c-th data is not based on a linear combination; at s-x is less than or equal to c<s, in the case of-> Representing the number of storage nodes in which the c- (s-x) th data is not based on a linear combination among n-s storage nodes other than the s storage nodes.
In one possible implementation, the pattern of the codeword includes any of an RS code, a partial repair code, or a piggybacked code.
In the device provided by the embodiment of the application, the y-group second data is obtained by performing data transformation on the y-group first data, so that the minimum repair bandwidth of y storage nodes storing the y-group second data is smaller than the threshold value. Therefore, when one of the y storage nodes fails, lost data can be recovered with small communication overhead.
In addition, as x is more than or equal to 2 and less than or equal to n-k, the device provided by the embodiment of the application can be suitable for different amounts of original data, is flexible, and requires less computing resources for executing data transformation and less storage resources for storing data. Furthermore, as x is less than or equal to y is less than or equal to n, the device provided by the embodiment of the application can be applied to any y storage nodes in n storage nodes, and the application range is wider.
The device provided by the embodiment of the application can be used iteratively, for example, the device is used for reducing the repair bandwidth of some storage nodes, and then the device is used for reducing the repair bandwidth of other storage nodes. The application mode of the device is flexible.
It should be understood that, in implementing the functions of the apparatus provided in fig. 4, only the division of the functional modules is illustrated, and in practical application, the functional modules may be allocated to different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules to complete all or part of the functions described above. In addition, the apparatus and the method embodiments provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the apparatus and the method embodiments are detailed in the method embodiments and are not repeated herein.
The specific hardware structure of the apparatus in the above embodiment is illustrated as a network device 1500 shown in fig. 5, and includes a transceiver 1501, a processor 1502 and a memory 1503. The transceiver 1501, the processor 1502 and the memory 1503 are connected by a bus 1504. Wherein the transceiver 1501 is configured to transmit data and receive data, the memory 1503 is configured to store instructions or program code, and the processor 1502 is configured to invoke the instructions or program code in the memory 1503 to cause the device to perform the relevant processing steps of the first device in the above-described method embodiment. In a specific embodiment, the network device 1500 of the embodiment of the present application may correspond to the first device in the foregoing method embodiments, where the processor 1502 in the network device 1500 reads the instructions or the program code in the memory 1503, so that the network device 1500 shown in fig. 5 can perform all or part of the operations performed by the first device.
The network device 1500 may also correspond to the apparatus shown in fig. 4 described above, for example, the acquisition module 401, the transformation module 402, and the storage module 403 referred to in fig. 4 correspond to the processor 1502.
Referring to fig. 6, fig. 6 illustrates a schematic structure of a network device 2000 according to an exemplary embodiment of the present application. The network device 2000 shown in fig. 6 is configured to perform the operations related to the data storage method shown in fig. 3 described above. The network device 2000 is, for example, a switch, a router, or the like.
As shown in fig. 6, the network device 2000 includes at least one processor 2001, a memory 2003, and at least one communication interface 2004.
The processor 2001 is, for example, a CPU, a digital signal processor (digital signal processor, DSP), a network processor (network processer, NP), a graphics processor (graphics processing unit, GPU), a neural network processor (neural-network processing units, NPU), a data processing unit (data processing unit, DPU), a microprocessor, or one or more integrated circuits for implementing the aspects of the present application. For example, the processor 2001 includes an application-specific integrated circuit (ASIC), a programmable logic device (programmable logic device, PLD) or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. PLDs are, for example, complex programmable logic devices (complex programmable logic device, CPLD), field-programmable gate arrays (field-programmable gate array, FPGA), general-purpose array logic (generic array logic, GAL), or any combination thereof. Which may implement or perform the various logical blocks, modules, and circuits described in connection with the disclosure of embodiments of the invention. The processor may also be a combination that performs the function of a computation, e.g., including one or more microprocessors, a combination of a DSP and a microprocessor, and so forth.
Optionally, the network device 2000 also includes a bus. The bus is used to transfer information between the components of the network device 2000. The bus may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, only one thick line is shown in fig. 6, but not only one bus or one type of bus. In addition to the bus connection, the components of the network device 2000 in fig. 6 may be connected by other manners, which are not limited by the embodiment of the present invention.
The Memory 2003 is, for example, but not limited to, a read-only Memory (ROM) or other type of static storage device that can store static information and instructions, as well as a random access Memory (random access Memory, RAM) or other type of dynamic storage device that can store information and instructions, as well as an electrically erasable programmable read-only Memory (electrically erasable programmable read-only Memory, EEPROM), compact disc read-only Memory (compact disc read-only Memory) or other optical disc storage, optical disc storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), magnetic disk storage media, or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory 2003 is, for example, independent and is connected to the processor 2001 via a bus. Memory 2003 may also be integrated with processor 2001.
The communication interface 2004 uses any transceiver-like device for communicating with other devices or communication networks, which may be ethernet, radio Access Network (RAN) or wireless local area network (wireless local area networks, WLAN), etc. Communication interface 2004 may include a wired communication interface, and may also include a wireless communication interface. Specifically, the communication interface 2004 may be an ethernet (FE) interface, a Fast Ethernet (FE) interface, a Gigabit Ethernet (GE) interface, an asynchronous transfer mode (asynchronous transfer mode, ATM) interface, a wireless local area network (wireless local area networks, WLAN) interface, a cellular network communication interface, or a combination thereof. The ethernet interface may be an optical interface, an electrical interface, or a combination thereof. In the present embodiment, the communication interface 2004 may be used for the network device 2000 to communicate with other devices.
In a particular implementation, the processor 2001 may include one or more CPUs, such as CPU0 and CPU1 shown in FIG. 6, as an example. Each of these processors may be a single-core (single-CPU) processor or may be a multi-core (multi-CPU) processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).
In a particular implementation, as one embodiment, the network device 2000 may include multiple processors, such as processor 2001 and processor 2005 shown in fig. 6. Each of these processors may be a single-core processor (single-CPU) or a multi-core processor (multi-CPU). A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).
In a specific implementation, the network device 2000 may also include output devices and input devices, as one embodiment. The output device communicates with the processor 2001, which can display information in a variety of ways. For example, the output device may be a liquid crystal display (liquid crystal display, LCD), a light emitting diode (light emitting diode, LED) display device, a Cathode Ray Tube (CRT) display device, or a projector (projector), or the like. The input device(s) and processor 2001 are in communication and may receive input from a user in a variety of ways. For example, the input device may be a mouse, a keyboard, a touch screen device, a sensing device, or the like.
In some embodiments, memory 2003 is used to store program code 2010 for performing aspects of the present application, and processor 2001 may execute program code 2010 stored in memory 2003. That is, the network device 2000 can implement the data storage method provided by the method embodiment through the processor 2001 and the program code 2010 in the memory 2003. One or more software modules may be included in program code 2010. Optionally, the processor 2001 itself may also store program code or instructions for performing the present aspects.
In a specific embodiment, the network device 2000 of the embodiment of the present application may correspond to the first device in the above-described method embodiments, and the processor 2001 in the network device 2000 reads the program code 2010 in the memory 2003 or the program code or instructions stored by the processor 2001 itself, so that the network device 2000 shown in fig. 6 can perform all or part of the operations performed by the first device.
The network device 2000 may also correspond to the apparatus shown in fig. 4 described above, and each functional module in the apparatus shown in fig. 4 is implemented in software of the network device 2000. In other words, the apparatus shown in fig. 4 includes functional blocks generated after the processor 2001 of the network device 2000 reads the program code 2010 stored in the memory 2003. For example, the acquisition module 401, the transformation module 402, and the storage module 403 referred to in fig. 4 correspond to the processor 2001 and/or the processor 2005.
Wherein the steps of the method shown in fig. 3 are performed by integrated logic circuitry of hardware or instructions in the form of software in the processor of the network device 2000. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in the processor for execution. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads information in the memory, and in combination with its hardware, performs the steps of the above method, which will not be described in detail here to avoid repetition.
Referring to fig. 7, fig. 7 illustrates a schematic structural diagram of a network device 2100 provided in another exemplary embodiment of the present application. The network device 2100 shown in fig. 7 is configured to perform all or part of the operations involved in the data storage method shown in fig. 3 described above. The network device 2100 is, for example, a switch, router, etc., and the network device 2100 may be implemented by a general bus architecture. As shown in fig. 7, the network device 2100 includes: a main control board 2110 and an interface board 2130.
The main control board is also called a main processing unit (main processing unit, MPU) or a routing processing card (route processor card), and the main control board 2110 is used for controlling and managing various components in the network device 2100, including routing computation, device management, device maintenance, and protocol processing functions. The main control board 2110 includes: a central processor 2111 and a memory 2112.
The interface board 2130 is also referred to as a line interface unit card (line processing unit, LPU), line card, or service board. The interface board 2130 is used to provide various service interfaces and to enable forwarding of data packets. The service interfaces include, but are not limited to, ethernet interfaces, such as flexible ethernet service interfaces (flexible ethernet Clients, flexE Clients), POS (packet over SONET/SDH) interfaces, etc. The interface board 2130 includes: central processor 2131 network processor 2132, forwarding table entry memory 2134, and physical interface cards (physical interface card, PIC) 2133.
The central processor 2131 on the interface board 2130 is used to control and manage the interface board 2130 and communicate with the central processor 2111 on the main control board 2110.
The network processor 2132 is used to implement forwarding processing of the message. The network processor 2132 may be in the form of a forwarding chip. The forwarding chip may be a network processor (network processor, NP). In some embodiments, the forwarding chip may be implemented by an application-specific integrated circuit (ASIC) or a field programmable gate array (field programmable gate array, FPGA). Specifically, the network processor 2132 is configured to forward the received message based on the forwarding table stored in the forwarding table entry memory 2134, and if the destination address of the message is the address of the message processing device 2100, upload the message to the CPU (e.g. the central processor 2131) for processing; if the destination address of the message is not the address of the network device 2100, the next hop and the egress interface corresponding to the destination address are found from the forwarding table according to the destination address, and the message is forwarded to the egress interface corresponding to the destination address. The processing of the uplink message may include: processing a message input interface and searching a forwarding table; the processing of the downlink message may include: forwarding table lookup, etc. In some embodiments, the central processor may also perform the function of a forwarding chip, such as implementing software forwarding based on a general purpose CPU, so that no forwarding chip is needed in the interface board.
The physical interface card 2133 is used to implement the docking function of the physical layer, from which the original traffic enters the interface board 2130, and from which processed messages are sent out from the physical interface card 2133. The physical interface card 2133, also referred to as a daughter card, may be mounted on the interface board 2130 and is responsible for converting the photoelectric signals into messages and forwarding the messages to the network processor 2132 for processing after performing validity check on the messages. In some embodiments, the central processor 2131 may also perform the functions of the network processor 2132, such as implementing software forwarding based on a general purpose CPU, such that the network processor 2132 is not required in the physical interface card 2133.
Illustratively, the network device 2100 includes a plurality of interface boards, e.g., the network device 2100 also includes an interface board 2140, the interface board 2140 including: central processor 2141, network processor 2142, forwarding table entry store 2144, and physical interface card 2143. The function and implementation of the various components in interface board 2140 are the same or similar to interface board 2130 and are not described in detail herein.
Illustratively, network device 2100 also includes a switch web 2120. Switch board 2120 may also be referred to as a switch board unit (switch fabric unit, SFU). In the case of a network device having multiple interface boards, switch web 2120 is used to accomplish the data exchange between the interface boards. For example, interface board 2130 and interface board 2140 may communicate with each other via switch web 2120.
The main control board 2110 is coupled to the interface board. For example. Main control board 2110, interface board 2130 and interface board 2140 are connected to the system backplane via a system bus to achieve interworking between the switch fabric 2120 and the system backplane. In one possible implementation, an inter-process communication protocol (inter-process communication, IPC) channel is established between the main control board 2110 and the interface boards 2130 and 2140, and communication is performed between the main control board 2110 and the interface boards 2130 and 2140 through the IPC channel.
Logically, network device 2100 includes a control plane that includes a main control board 2110 and a central processor 2111, and a forwarding plane that includes various components that perform forwarding, such as a forwarding table entry memory 2134, a physical interface card 2133, and a network processor 2132. The control plane performs the functions of router, generating forwarding table, processing signaling and protocol messages, configuring and maintaining the state of the network device, etc., and the control plane issues the generated forwarding table to the forwarding plane, where the network processor 2132 forwards the message received by the physical interface card 2133 based on the forwarding table issued by the control plane. The forwarding table issued by the control plane may be stored in forwarding table entry memory 2134. In some embodiments, the control plane and the forwarding plane may be completely separate and not on the same network device.
It should be noted that the main control board may have one or more blocks, and the main control board and the standby main control board may be included when there are multiple blocks. The interface boards may have one or more, the more data processing capabilities the network device is, the more interface boards are provided. The physical interface card on the interface board may also have one or more pieces. The switching network board may not be provided, or may be provided with one or more blocks, and load sharing redundancy backup can be jointly realized when the switching network board is provided with the plurality of blocks. Under the centralized forwarding architecture, the network device may not need to exchange network boards, and the interface board bears the processing function of the service data of the whole system. Under the distributed forwarding architecture, the network device may have at least one switching fabric, through which data exchange between multiple interface boards is implemented, providing high-capacity data exchange and processing capabilities. Therefore, the data access and processing capacity of the message processing device of the distributed architecture is greater than that of the message processing device of the centralized architecture. The network device may be in the form of a single board, i.e. there is no switch board, the functions of the interface board and the main control board are integrated on the single board, and the central processor on the interface board and the central processor on the main control board may be combined into a central processor on the single board, so as to perform the functions after the two are overlapped. The specific architecture employed is not limited in any way herein, depending on the specific networking deployment scenario.
In a specific embodiment, the network device 2100 corresponds to the data storage device shown in fig. 4 described above. In some embodiments, the acquisition module 401, the transformation module 402, and the storage module 403 in the data storage device shown in fig. 4 correspond to the central processor 2111 or the network processor 2132 in the network device 2100.
Based on the network devices shown in fig. 5-7, the embodiment of the application further provides a distributed storage system, where the distributed storage system includes: a first device and a plurality of storage nodes. Of course, the distributed storage system may also include a second device. Illustratively, the first device is the network device 1500 shown in fig. 5 or the network device 2000 shown in fig. 6 or the network device 2100 shown in fig. 7, and the storage node is the network device 1500 shown in fig. 5 or the network device 2000 shown in fig. 6 or the network device 2100 shown in fig. 7. Illustratively, the second device is network device 1500 shown in fig. 5 or network device 2000 shown in fig. 6 or network device 2100 shown in fig. 7. The method performed by the first device, the second device and the storage node may be referred to the above description of the embodiment shown in fig. 3, and will not be repeated here.
It is to be appreciated that the processor described above can be a central processing unit (central processing unit, CPU), but also other general purpose processors, digital signal processors (digital signal processing, DSP), application specific integrated circuits (application specific integrated circuit, ASIC), field-programmable gate arrays (field-programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or any conventional processor or the like. It is noted that the processor may be a processor supporting an advanced reduced instruction set machine (advanced RISC machines, ARM) architecture.
Further, in an alternative embodiment, the memory may include read only memory and random access memory, and provide instructions and data to the processor. The memory may also include non-volatile random access memory. For example, the memory may also store information of the device type.
The memory may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. The volatile memory may be random access memory (random access memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available. For example, static RAM (SRAM), dynamic RAM (dynamic random access memory, DRAM), synchronous DRAM (SDRAM), double data rate synchronous DRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchronous DRAM (SLDRAM), and direct memory bus RAM (DR RAM).
There is also provided a computer readable storage medium having stored therein at least one program instruction or code which when loaded and executed by a processor causes a computer to implement the data storage method of figure 3.
The present application provides a computer program (product) which, when executed by a computer, can cause a processor or computer to perform the respective steps and/or flows corresponding to the above-described method embodiments.
There is provided a chip comprising a processor for calling from a memory and executing instructions stored in said memory, to cause a network device on which said chip is mounted to perform the method of the above aspects.
Illustratively, the chip further comprises: the input interface, the output interface, the processor and the memory are connected through an internal connecting passage.
An apparatus is also provided, comprising the chip. Optionally, the device is a network device. The device is illustratively a router or a switch or a server.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces, in whole or in part, a flow or function as described herein. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a digital versatile disk (digital versatile disc, DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.
The foregoing embodiments have been provided for the purpose of illustrating the technical solution and advantageous effects of the present application in further detail, and it should be understood that the foregoing embodiments are merely illustrative of the present application and are not intended to limit the scope of the present application, and any modifications, equivalents, improvements, etc. made on the basis of the technical solution of the present application should be included in the scope of the present application.
Those of ordinary skill in the art will appreciate that the various method steps and modules described in connection with the embodiments disclosed herein may be implemented as software, hardware, firmware, or any combination thereof, and that the steps and components of the various embodiments have been generally described in terms of functionality in the foregoing description to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Those of ordinary skill in the art may implement the described functionality using different approaches for each particular application, but such implementation is not to be considered as beyond the scope of the present application.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the above storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer program instructions. By way of example, the methods of embodiments of the present application may be described in the context of machine-executable instructions, such as program modules, being included in devices on a real or virtual processor of a target. Generally, program modules include routines, programs, libraries, objects, classes, components, data structures, etc. that perform particular tasks or implement particular abstract data types. In various embodiments, the functionality of the program modules may be combined or split between described program modules. Machine-executable instructions for program modules may be executed within local or distributed devices. In a distributed device, program modules may be located in both local and remote memory storage media.
Computer program code for carrying out methods of embodiments of the present application may be written in one or more programming languages. These computer program code may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data storage device such that the program code, when executed by the computer or other programmable data storage device, results in the implementation of the functions/operations specified in the flowchart and/or block diagram block or blocks. The program code may execute entirely on the computer, partly on the computer, as a stand-alone software package, partly on the computer and partly on a remote computer or entirely on the remote computer or server.
In the context of embodiments of the present application, computer program code or related data may be carried by any suitable carrier to enable an apparatus, device or processor to perform the various processes and operations described above. Examples of carriers include signals, computer readable media, and the like.
Examples of signals may include electrical, optical, radio, acoustical or other form of propagated signals, such as carrier waves, infrared signals, etc.
A machine-readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination thereof. More detailed examples of a machine-readable storage medium include an electrical connection with one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical storage device, a magnetic storage device, or any suitable combination thereof.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system, apparatus and module may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, e.g., the division of the modules is merely a logical function division, and there may be additional divisions of actual implementation, e.g., multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not performed. In addition, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices, or modules, or may be an electrical, mechanical, or other form of connection.
The modules illustrated as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the purposes of the embodiments of the present application.
In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules.
The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application is essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method in the various embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The terms "first," "second," and the like in this application are used to distinguish between identical or similar items that have substantially the same function and function, and it should be understood that there is no logical or chronological dependency between the "first," "second," and "n," and that there is no limitation on the number and order of execution. It will be further understood that, although the following description uses the terms first, second, etc. to describe various elements, these elements should not be limited by the terms. These terms are only used to distinguish one element from another element. For example, a first device may be referred to as a second device, and similarly, a second device may be referred to as a first device, without departing from the scope of the various described examples. The first device and the second device may both be any type of network device and, in some cases, may be separate and distinct network devices.
It should also be understood that, in the embodiments of the present application, the sequence number of each process does not mean that the execution sequence of each process should be determined by the function and the internal logic of each process, and should not constitute any limitation on the implementation process of the embodiments of the present application.
The term "at least one" in this application means one or more, the term "plurality" in this application means two or more, for example, a plurality of second messages means two or more second messages. The terms "system" and "network" are often used interchangeably herein.
It is to be understood that the terminology used in the description of the various examples described herein is for the purpose of describing particular examples only and is not intended to be limiting. As used in the description of the various described examples and in the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the terms "if" and "if" may be interpreted to mean "when" ("white" or "upon") or "in response to a determination" or "in response to detection. Similarly, the phrase "if determined" or "if [ a stated condition or event ] is detected" may be interpreted to mean "upon determination" or "in response to determination" or "upon detection of [ a stated condition or event ] or" in response to detection of [ a stated condition or event ] "depending on the context.
It should be appreciated that determining B from a does not mean determining B from a alone, but may also determine B from a and/or other information.
It should be further understood that reference throughout this specification to "one embodiment," "an embodiment," "one possible implementation," means that a particular feature, structure, or characteristic described in connection with the embodiment or implementation is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment," "one possible implementation" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Claims (24)

1. A data storage method, the method being applied to a distributed storage system comprising a first device and a plurality of storage nodes, the method comprising:
the first device obtains n groups of first data obtained by x code words, any one of the x code words comprises n first data, the n first data are obtained by encoding k original data, one group of first data comprises one first data included by each code word in the x code words, the x, n and k are positive integers, n is greater than or equal to k, and x is greater than or equal to 2 and less than or equal to n-k;
The first device performs data transformation on y groups of first data to obtain y groups of second data, so that the minimum repair bandwidth of any one of storage nodes storing the y groups of second data is smaller than a threshold value, wherein the minimum repair bandwidth of any one storage node refers to the minimum bandwidth required for recovering the data stored by any one storage node, the minimum bandwidth is obtained based on the ratio of the number of downloaded data to the total number of original data, y is an integer, and x is less than or equal to y is less than or equal to n;
the first device stores the y-group second data and the n-y-group first data on corresponding storage nodes, respectively.
2. The method of claim 1, wherein the performing data transformation on the y-set of first data to obtain the y-set of second data comprises:
and carrying out data transformation on the y groups of first data according to a linear combination transformation mode to obtain the y groups of second data, wherein the linear combination transformation mode is used for correlating a plurality of first data which belong to different codewords in different groups and correspond to the same reference matrix, the obtained linear combination result is used for restoring the plurality of first data, one group of first data corresponds to one reference matrix, and the reference matrix is used for downloading one group of first data corresponding to the reference matrix from a corresponding storage node.
3. The method of claim 2, wherein the y sets of first data are matrices of y rows and x columns, one set of first data corresponds to one row of the matrix, first data belonging to the same codeword corresponds to one column of the matrix, and the performing data transformation on the y sets of first data according to a linear combination transformation manner to obtain the y sets of second data includes:
dividing the matrix into a first number of first sub-matrices and a second number of second sub-matrices, the first number and the second number being determined based on the ratio of the y to the x, any one of the first sub-matrices corresponding to x sets of first data, any one of the second sub-matrices corresponding to s sets of first data, x < s <2*x;
for any first sub-matrix, performing first linear combination and second linear combination on the first data of the p-th row and the q-th column in the any first sub-matrix and the first data of the p-th row and the p-th column, respectively replacing the first data of the p-th row and the q-th column and the first data of the p-th row and the p-th column by the results of the first linear combination and the second linear combination to obtain x groups of second data, wherein the first linear combination and the second linear combination are different, p < x is more than or equal to 0, q < x is more than or equal to 0, and q is not equal to q;
For any second sub-matrix, carrying out the first linear combination and the second linear combination on the first data of the p-th row and the q-th column in the front x rows of the any second sub-matrix, and respectively replacing the first data of the p-th row and the q-th column and the first data of the q-th row and the p-th column by the results of the first linear combination and the second linear combination to obtain x groups of second data, wherein p < x is more than or equal to 0, q < x is more than or equal to 0, and q is not equal to q;
and carrying out the first linear combination and the second linear combination on the data of the (r+ (s-x) th row and the (t- (s-x) th column in the rear x row of any second submatrix, and respectively replacing the data of the (r+ (s-x) th row and the (t) -th column and the data of the (t) -th row and the (t) -th column by the results of the first linear combination and the second linear combination to obtain x groups of second data, wherein r < x is more than or equal to 0, s-x is less than or equal to t < s, and the r is not equal to t- (s-x), and the data of the (t- (s-x) th row and the data of the (t) -th row and the (r) -th column are the second data of the front x row or the first data of the rear s-x row in any second submatrix.
4. A method according to claim 3, wherein the first linear combination is mD 1 +vD 2 The second linear combination is gD 1 +fD 2 The m, v, g, f are elements of finite field other than 0, and the product of m and f is not equal to the product of v and g, the D 1 And said D 2 Two data representing the execution of the first linear combination and the second linear combination.
5. The method of claim 1, wherein a set of first data corresponds to a reference matrix, the reference matrix being used to download a set of first data corresponding to the reference matrix from a corresponding storage node, the data transforming the y sets of first data to obtain y sets of second data, comprising:
performing position transformation on the y-group first data in response to n reference matrices being different, wherein the position transformation is used for changing the positions of x y first data included in the y-group first data;
and carrying out data transformation on the y groups of first data after the position transformation according to a transformation mode of linear combination, and executing the same reference matrix corresponding to the plurality of first data of the linear combination.
6. The method of claim 5, wherein the y sets of first data are a matrix of y rows and x columns, one set of first data corresponds to one row of the matrix, first data belonging to the same codeword corresponds to one column of the matrix, and the performing the position transformation on the y sets of first data comprises:
Dividing the matrix into a first number of first sub-matrices and a second number of second sub-matrices, the first number and the second number being determined based on the ratio of the y to the x, any one of the first sub-matrices corresponding to x sets of first data, any one of the second sub-matrices corresponding to s sets of first data, x < s <2*x;
for any one of the submatrices, circularly transforming the group where the first data of each column of the any one of the submatrices is located, so that the first data of the p-th row and the q-th column in the front x row of the any one of the submatrices after the circular transformation is identical to the reference matrix corresponding to the first data of the q-th row and the p-th column, 0 is less than or equal to p < x,0 is less than or equal to q < x, and p is not equal to q, the first data of the t- (s-x) th column in the r+ (s-x) th row and the t- (s-x) th column in the rear x row of the any one of the submatrices after the circular transformation is identical to the reference matrix corresponding to the first data of the t row and the r column, 0 is less than or equal to r < x, s-x is not equal to t- (s-x), and any one of the submatrices is the first submatrices or the second submatrices.
7. The method of any of claims 1-6, wherein for any one of the storage nodes to which the y sets of second data correspond, the second data stored in response to the any one of the storage nodes is derived based on the first data included in any one of the first sub-matrices, and wherein the any one of the storage nodes is a u-th storage node of x storage nodes storing x sets of second data, 0+u < x, and a minimum repair bandwidth of the u-th storage node is as follows:
Wherein the saidThe u-th data is represented in n-x storage nodes other than the x storage nodes, not based on the number of storage nodes obtained by linear combination.
8. The method according to any one of claims 1-6, wherein, for any one of storage nodes corresponding to the y sets of second data, the second data stored in response to the any one of storage nodes is obtained based on the first data included in any one of the second sub-matrices, and the any one of storage nodes is a c-th storage node of s storage nodes storing s sets of second data, then in the case that 0 c < s-x is less than or equal to the following value:
and under the condition that s-x is less than or equal to c < x, the minimum repair bandwidth of the c-th storage node is less than or equal to the following value:
and under the condition that x is less than or equal to c < s, the minimum repair bandwidth of the c-th storage node is less than or equal to the following value:
wherein c is 0.ltoreq.c<In the case of s-x, theRepresenting the number of storage nodes, among n-s storage nodes other than the s storage nodes, for which the c-th data is not based on a linear combination; at s-x is less than or equal to c<In the case of s, the >Representing the number of storage nodes in which the c- (s-x) th data is not based on a linear combination among n-s storage nodes other than the s storage nodes.
9. The method of any of claims 1-8, wherein the pattern of codewords comprises any of a reed-solomon RS code, a partial repair code, or a piggybacked code.
10. A data storage apparatus for application to a first device comprised by a distributed storage system, the distributed storage system further comprising a plurality of storage nodes, the apparatus comprising:
the acquisition module is used for acquiring n groups of first data obtained by x code words, wherein any code word in the x code words comprises n first data, the n first data are obtained by encoding k original data, one group of first data comprises one first data included in each code word in the x code words, the x, n and k are positive integers, n is greater than or equal to k, and x is greater than or equal to 2 and less than or equal to n-k;
the transformation module is used for carrying out data transformation on y groups of first data to obtain y groups of second data, so that the minimum restoration bandwidth of any one of storage nodes for storing the y groups of second data is smaller than a threshold value, wherein the minimum restoration bandwidth of any one storage node refers to the minimum bandwidth required for restoring the data stored by any one storage node, the minimum bandwidth is obtained based on the ratio of the number of downloaded data to the total number of original data, y is an integer, and x is less than or equal to y is less than or equal to n;
And the storage module is used for respectively storing the y groups of second data and the n-y groups of first data on corresponding storage nodes.
11. The apparatus of claim 10, wherein the transformation module is configured to perform data transformation on the y sets of first data according to a linearly combined transformation manner to obtain the y sets of second data, where the linearly combined transformation manner is configured to correlate a plurality of first data belonging to different codewords and corresponding to a same reference matrix in different sets, so that the obtained linearly combined result is used to restore the plurality of first data, where a set of first data corresponds to a reference matrix, and where the reference matrix is used to download a set of first data corresponding to the reference matrix from a corresponding storage node.
12. The apparatus of claim 11, wherein the y sets of first data are a matrix of y rows and x columns, one set of first data corresponding to a row of the matrix, and first data belonging to the same codeword corresponding to a column of the matrix;
the transformation module is configured to divide the matrix into a first number of first sub-matrices and a second number of second sub-matrices, where the first number and the second number are determined based on a ratio of the y to the x, any one of the first sub-matrices corresponds to x sets of first data, any one of the second sub-matrices corresponds to s sets of first data, and x < s <2*x;
For any first sub-matrix, performing first linear combination and second linear combination on the first data of the p-th row and the q-th column in the any first sub-matrix and the first data of the p-th row and the p-th column, respectively replacing the first data of the p-th row and the q-th column and the first data of the p-th row and the p-th column by the results of the first linear combination and the second linear combination to obtain x groups of second data, wherein the first linear combination and the second linear combination are different, p < x is more than or equal to 0, q < x is more than or equal to 0, and q is not equal to q;
for any second sub-matrix, carrying out the first linear combination and the second linear combination on the first data of the p-th row and the q-th column in the front x rows of the any second sub-matrix, and respectively replacing the first data of the p-th row and the q-th column and the first data of the q-th row and the p-th column by the results of the first linear combination and the second linear combination to obtain x groups of second data, wherein p < x is more than or equal to 0, q < x is more than or equal to 0, and q is not equal to q;
and carrying out the first linear combination and the second linear combination on the data of the (r+ (s-x) th row and the (t- (s-x) th column in the rear x row of any second submatrix, and respectively replacing the data of the (r+ (s-x) th row and the (t) -th column and the data of the (t) -th row and the (t) -th column by the results of the first linear combination and the second linear combination to obtain x groups of second data, wherein r < x is more than or equal to 0, s-x is less than or equal to t < s, and the r is not equal to t- (s-x), and the data of the (t- (s-x) th row and the data of the (t) -th row and the (r) -th column are the second data of the front x row or the first data of the rear s-x row in any second submatrix.
13. The apparatus of claim 12, wherein the first linear combination is mD 1 +vD 2 The second linear combination is gD 1 +fD 2 The m, v, g, f are elements of finite field other than 0, and the product of m and f is not equal to the product of v and g, the D 1 And said D 2 Two data representing the execution of the first linear combination and the second linear combination.
14. The apparatus of claim 10, wherein a set of first data corresponds to a reference matrix, the reference matrix being used to download the set of first data corresponding to the reference matrix from a corresponding storage node;
the transformation module is used for carrying out position transformation on the y groups of first data in response to the difference of n reference matrixes, and the position transformation is used for changing the positions of the x y first data included in the y groups of first data; and carrying out data transformation on the y groups of first data after the position transformation according to a transformation mode of linear combination, and executing the same reference matrix corresponding to the plurality of first data of the linear combination.
15. The apparatus of claim 14, wherein the y sets of first data are a matrix of y rows and x columns, one set of first data corresponding to a row of the matrix, and first data belonging to the same codeword corresponding to a column of the matrix;
The transformation module is configured to divide the matrix into a first number of first sub-matrices and a second number of second sub-matrices, where the first number and the second number are determined based on a ratio of the y to the x, any one of the first sub-matrices corresponds to x sets of first data, any one of the second sub-matrices corresponds to s sets of first data, and x < s <2*x;
for any one of the submatrices, circularly transforming the group where the first data of each column of the any one of the submatrices is located, so that the first data of the p-th row and the q-th column in the front x row of the any one of the submatrices after the circular transformation is identical to the reference matrix corresponding to the first data of the q-th row and the p-th column, 0 is less than or equal to p < x,0 is less than or equal to q < x, and p is not equal to q, the first data of the t- (s-x) th column in the r+ (s-x) th row and the t- (s-x) th column in the rear x row of the any one of the submatrices after the circular transformation is identical to the reference matrix corresponding to the first data of the t row and the r column, 0 is less than or equal to r < x, s-x is not equal to t- (s-x), and any one of the submatrices is the first submatrices or the second submatrices.
16. The apparatus of any of claims 10-15, wherein for any one of the storage nodes to which the y sets of second data correspond, the second data stored in response to the any one of the storage nodes is derived based on the first data included in any one of the first sub-matrices, and wherein the any one of the storage nodes is a u-th storage node of x storage nodes storing x sets of second data, 0+u < x, and a minimum repair bandwidth of the u-th storage node is as follows:
Wherein the saidThe u-th data is represented in n-x storage nodes other than the x storage nodes, not based on the number of storage nodes obtained by linear combination.
17. The apparatus according to any one of claims 10-15, wherein, for any one of storage nodes corresponding to the y sets of second data, the second data stored in response to the any one of storage nodes is obtained based on the first data included in any one of the second sub-matrices, and the any one of storage nodes is a c-th storage node of s storage nodes storing s sets of second data, a minimum repair bandwidth of the c-th storage node is less than or equal to the following value in a case that 0 is less than or equal to c < s-x:
and under the condition that s-x is less than or equal to c < x, the minimum repair bandwidth of the c-th storage node is less than or equal to the following value:
and under the condition that x is less than or equal to c < s, the minimum repair bandwidth of the c-th storage node is less than or equal to the following value:
wherein c is 0.ltoreq.c<In the case of s-x, theRepresenting the number of storage nodes, among n-s storage nodes other than the s storage nodes, for which the c-th data is not based on a linear combination; at s-x is less than or equal to c<In the case of s, the >Representing the number of storage nodes in which the c- (s-x) th data is not based on a linear combination among n-s storage nodes other than the s storage nodes.
18. The apparatus according to any of claims 10-17, wherein the pattern of codewords comprises any of a reed-solomon RS code, a partial repair code, or a piggybacked code.
19. A network device, the network device comprising: a processor coupled to a memory having stored therein at least one program instruction or code that is loaded and executed by the processor to cause the network device to implement the method of any of claims 1-9.
20. A distributed storage system comprising a first device for performing the method of any of claims 1-10 and a plurality of storage nodes for storing data obtained by the first device.
21. A computer readable storage medium having stored therein at least one program instruction or code which when loaded and executed by a processor causes a computer to implement the method of any of claims 1-9.
22. A computer program product, characterized in that the computer program product comprises computer program code which, when run by a computer, causes the computer to implement the method as claimed in any one of claims 1-9.
23. A chip comprising a processor for calling from a memory and executing instructions stored in the memory, to cause a network device on which the chip is mounted to perform the method of any of claims 1-9.
24. The chip of claim 23, further comprising: the input interface, the output interface, the processor and the memory are connected through an internal connection path.
CN202210873344.8A 2022-07-21 2022-07-21 Data storage method, device, equipment, system and computer readable storage medium Pending CN117478692A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210873344.8A CN117478692A (en) 2022-07-21 2022-07-21 Data storage method, device, equipment, system and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210873344.8A CN117478692A (en) 2022-07-21 2022-07-21 Data storage method, device, equipment, system and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN117478692A true CN117478692A (en) 2024-01-30

Family

ID=89622606

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210873344.8A Pending CN117478692A (en) 2022-07-21 2022-07-21 Data storage method, device, equipment, system and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN117478692A (en)

Similar Documents

Publication Publication Date Title
US8103928B2 (en) Multiple device apparatus, systems, and methods
KR102548215B1 (en) Systems and methods for decoding data using compressed channel output information
US8447954B2 (en) Parallel pipelined vector reduction in a data processing system
CN104303166A (en) High performance interconnect link layer
CN104202057A (en) Information processing method and device
CN112332856A (en) Layer decoding method and device of quasi-cyclic LDPC code
Zorgui et al. Centralized multi-node repair for minimum storage regenerating codes
US10090863B2 (en) Coding and decoding methods and apparatus
US8429486B2 (en) Decoding device, data storage device, data communication system, and decoding method
US20160049962A1 (en) Method and apparatus of ldpc encoder in 10gbase-t system
CN117478692A (en) Data storage method, device, equipment, system and computer readable storage medium
CN113452475B (en) Data transmission method, device and related equipment
US9923669B2 (en) Distributed Reed-Solomon codes for simple multiple access networks
CN115858230A (en) Maximum distance separable code construction, repair method and related device
KR102115216B1 (en) Polar codes decoding device and method thereof
CN105356966A (en) Cyclic redundancy check (CRC) implementation method and device, and network equipment
KR101710138B1 (en) Data distribution processing system and data distribution processing method
EP3667964B1 (en) Data processing method and related apparatus
CN109818705B (en) Method, device and equipment for transmitting and receiving subrate signals
EP3493435A1 (en) Encoding method and apparatus
JP3879082B2 (en) Byte error correction / detection device
CN116032418A (en) Encoding method, decoding method, apparatus, device, and readable storage medium
CN116662063B (en) Error correction configuration method, error correction method, system, equipment and medium for flash memory
CN118041488A (en) Data transmission method, device, system and computer readable storage medium
Biswas et al. On m-spotty weight enumerators of Z 2 (Z 2+ u Z 2)-linear codes and Griesmer type bound

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication