CN114443350A - Data processing method based on erasure codes and related device - Google Patents

Data processing method based on erasure codes and related device Download PDF

Info

Publication number
CN114443350A
CN114443350A CN202111640791.0A CN202111640791A CN114443350A CN 114443350 A CN114443350 A CN 114443350A CN 202111640791 A CN202111640791 A CN 202111640791A CN 114443350 A CN114443350 A CN 114443350A
Authority
CN
China
Prior art keywords
blocks
group
coefficient
data block
block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111640791.0A
Other languages
Chinese (zh)
Inventor
曲秀超
陈孝伟
白杨
薛强
樊晓光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianyi Cloud Technology Co Ltd
Original Assignee
Tianyi Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianyi Cloud Technology Co Ltd filed Critical Tianyi Cloud Technology Co Ltd
Priority to CN202111640791.0A priority Critical patent/CN114443350A/en
Publication of CN114443350A publication Critical patent/CN114443350A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1004Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's to protect a block of data words, e.g. CRC or checksum

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to the technical field of computers, and discloses a data processing method based on erasure codes and a related device. The method comprises the steps of obtaining a plurality of original data blocks; dividing a plurality of original data blocks into two sequential groups and a plurality of cross-group data block groups; the sequence numbers of original data blocks in the sequential grouping are continuous, each group-crossing data block group comprises the same number of original data blocks, and the sequence numbers of the group-crossing data block group data blocks are discontinuous; and generating a plurality of check blocks of the plurality of original data blocks, wherein the plurality of check blocks comprise two global check blocks, sequence grouping check blocks respectively corresponding to each sequence grouping, and cross-group check blocks respectively corresponding to each cross-group data block group. Therefore, under the condition of occupying less hard disks, data recovery can be carried out under a local scope in the data block recovery process, global data recovery is reduced to the maximum extent, reconstruction cost is effectively reduced, and the method has higher fault-tolerant capability under the same condition.

Description

Data processing method based on erasure codes and related device
Technical Field
The present application relates to the field of computer technologies, and in particular, to an erasure code based data processing method and related apparatus.
Background
Erasure codes are a technology for improving data reliability of a cloud storage system, and currently, cloud storage reliability technologies are divided into erasure codes and multi-copy technologies. Because the multi-copy technology occupies more disk space in the big data era, the utilization rate of the disk is low, and the disk array frequently generates uncorrectable errors, the erasure code technology is needed to quickly reconstruct lost data.
Current coding techniques are mainly divided into two main categories: one is error correction coding based on galois field operation; the other is erasure coding based on xor operations. The galois field erasure code is mainly an RS code (Reed-Solomon code), which can be infinitely extended under the condition of sufficient storage system resources, and has a Maximum Distance Separable (MDS) attribute. However, there is a linear combination of redundant data and original data in the RS, resulting in the need to collect all remaining data unit information once the relevant data needs to be recovered. Resulting in additional overhead in time and space. RS erasure codes cannot guarantee the maximum possible data recovery within a packet, relying more on global data recovery. Therefore, how to use erasure code technology to process data, thereby reducing data recovery time and reducing disk occupancy rate is an urgent problem to be solved.
Disclosure of Invention
The application provides an erasure code-based data processing method and a related device, which are used for solving the problems of reducing data recovery time and reducing disk occupancy rate by using an erasure code technology to process data.
In a first aspect, an embodiment of the present application provides a data processing method based on erasure codes, including:
acquiring a plurality of original data blocks;
dividing the plurality of original data blocks into two sequential groups and a plurality of cross-group data block groups; the sequence numbers of original data blocks in the sequential grouping are continuous, each group-crossing data block group comprises the same number of original data blocks, and the sequence numbers of the group-crossing data block group data blocks are discontinuous;
and generating a plurality of check blocks of the plurality of original data blocks, wherein the plurality of check blocks comprise two global check blocks, sequence grouping check blocks corresponding to the sequence grouping respectively, and cross-group check blocks corresponding to the cross-group data block groups respectively.
In a possible implementation manner, if the plurality of original data blocks correspond to a plurality of inter-group data block group division strategies, one of the inter-group data block group division strategies is selected, and the plurality of original data blocks are divided into a plurality of inter-group data block groups.
In one possible embodiment, the method further comprises:
determining whether the failed data block can be recovered or not based on a pre-constructed recovery condition indication table; the recovery condition indication table is used for recording the distribution condition of the failure data blocks and corresponding recovery indication information, wherein the recovery indication information is a recovery condition if the failure data blocks can be recovered, and indicates that the recovery indication information is not reconfigurable if the failure data blocks can not be recovered;
and if the failure data block can be recovered, reconstructing the failure data block by adopting a plurality of check blocks of the original data block.
In a possible implementation manner, constructing the recovery condition indication table specifically includes:
constructing a failure data block sample;
for each failure data block distribution condition sample, establishing a determinant coefficient matrix for reconstructing the failure data block distribution condition sample based on the non-failure data blocks;
on the premise that the determinant of the determinant coefficient matrix is not zero, solving the coefficient relation in the determinant coefficient matrix;
and constructing a corresponding relation between the distribution condition samples of the failure data blocks and the coefficient relation.
In one possible implementation, the two global parity chunks include a first global parity chunk and a second global parity chunk, the first global parity chunk corresponds to the nth row coefficient in the determinant coefficient matrix in the row-column coefficient matrix, and the second global parity chunk corresponds to the n +1 th row coefficient in the determinant coefficient matrix, where the determinant coefficient matrix includes n +1 row coefficients in total; each original data block corresponds to one coefficient in a row of coefficients;
if the number of the failure data blocks is 4, each failure data block is a first failure data block, a second failure data block, a third failure data block and a fourth failure data block in sequence, and the first failure data block corresponds to a first coefficient, the second failure data block corresponds to a second coefficient, the third failure data block corresponds to a third coefficient and the fourth failure data block corresponds to a fourth coefficient in the determinant coefficient matrix, then:
if the failure data blocks are all original data blocks, the corresponding coefficient relationship is as follows: the first coefficient is not equal to the third coefficient, the second coefficient is not equal to the fourth coefficient, and the sum of the first coefficient and the third coefficient is not equal to the sum of the second coefficient and the fourth coefficient;
if the failure data block comprises 1 cross-group check block and 3 original data blocks, and the failure original data block and the failure 1 cross-group check block are in the same group, the failure data block can not be reconstructed;
if the failure data block comprises 1 cross-group check block and 3 original data blocks, and the failure original data block and the failure 1 cross-group check block are in different groups, the corresponding coefficient relationship is as follows: the first coefficient is not equal to zero, the second coefficient is not equal to zero, and the difference between the third coefficient and the first coefficient is not zero;
if the failure data block comprises 1 global check block and 3 original data blocks, the corresponding coefficient relationship is as follows: the difference between the third coefficient and the first coefficient is not zero, and the first coefficient is not equal to the third coefficient;
if the failure data block comprises 1 group-crossing check block, 1 global check block and 2 original data blocks, and the failed 2 original data blocks are in the same group with the failed 1 group-crossing check block or the failed 1 global check block, the failure data block cannot be reconstructed;
if the failure data block comprises 1 group-crossing check block, 1 global check block and 2 original data blocks, and the 2 failed original data blocks are respectively in the same group with the 1 failed group-crossing check block and the 1 failed global check block, the corresponding coefficient relationship is as follows: the first coefficient is not zero;
if the failure data block comprises 2 cross-group check blocks and 2 original data blocks, the corresponding coefficient relationship is as follows: the first coefficient is not equal to the second coefficient, and the first coefficient and the second coefficient are not zero;
if the failure data block comprises 2 global check blocks and 2 original data blocks, and the 2 failed original data blocks and the 1 non-failed cross-group check block are in the same group, the failure data block can not be reconstructed;
if the failure data block comprises 2 global check blocks and 2 original data blocks, and the 2 failed original data blocks are respectively in the same group with the 2 non-failed cross-group check blocks, the corresponding coefficient relationship is as follows: the coefficients are unconstrained;
if the failure data block comprises 2 cross-group check blocks, 1 global check block and 1 original data block, the corresponding coefficient relationship is as follows: the first coefficient is not zero;
if the failure data block comprises 2 global check blocks, 1 cross-group check block and 1 original data block, and the failed 1 original data block and the failed 1 cross-group check block are in the same group, the failure data block can not be reconstructed;
if the failure data block comprises 2 global check blocks, 1 cross-group check block and 1 original data block, and the 1 failed original data block and the 1 failed cross-group check block are in different groups, the corresponding coefficient relationship is as follows: the coefficients are unconstrained.
In a possible implementation, the minimum number of data blocks required for reconstructing a single failed data block of the plurality of original data blocks is the number of data blocks included in a single cross-group data block group.
In one possible implementation, the maximum number of invalid data blocks allowed by the plurality of original data blocks is the number of total parity blocks.
In a second aspect, an embodiment of the present application provides an erasure code-based data processing apparatus, where the apparatus includes:
the acquisition module is used for acquiring a plurality of original data blocks;
the grouping module is used for dividing the original data blocks into two sequential groups and a plurality of cross-group data block groups; the sequence numbers of original data blocks in the sequential grouping are continuous, each group-crossing data block group comprises the same number of original data blocks, and the sequence numbers of the group-crossing data block group data blocks are discontinuous;
and the check block determining module is used for generating a plurality of check blocks of the plurality of original data blocks, wherein the plurality of check blocks comprise two global check blocks, sequential grouping check blocks respectively corresponding to the sequential grouping, and cross-group check blocks respectively corresponding to the cross-group data block groups.
In a possible implementation manner, the grouping module is further configured to select one of the cross-group data block group division policies if the plurality of original data blocks correspond to a plurality of cross-group data block group division policies, and divide the plurality of original data blocks into a plurality of cross-group data block groups.
In a possible embodiment, the apparatus further comprises:
the recovery determining module is used for determining whether the failed data block can be recovered based on a pre-constructed recovery condition indicating table; the recovery condition indication table is used for recording the distribution condition of the failure data blocks and corresponding recovery indication information, wherein the recovery indication information is a recovery condition if the failure data blocks can be recovered, and indicates that the recovery indication information is not reconfigurable if the failure data blocks can not be recovered;
and the reconstruction module is used for reconstructing the failure data block by adopting the plurality of check blocks of the original data block if the failure data block can be recovered.
In a possible embodiment, the apparatus further comprises:
the sample construction module is used for constructing a failure data block sample;
the matrix establishing module is used for establishing a determinant coefficient matrix for reconstructing the distribution condition samples of the failure data blocks based on the non-failure data blocks aiming at each distribution condition sample of the failure data blocks;
the calculation module is used for solving the coefficient relation in the determinant coefficient matrix on the premise that the determinant of the determinant coefficient matrix is not zero;
and the corresponding relation construction module is used for constructing the corresponding relation between the distribution condition sample of the failure data block and the coefficient relation.
In a third aspect, an embodiment of the present application provides an electronic device, including:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement any of the erasure code based data processing methods as provided in the first aspect above.
In a fourth aspect, the present application further provides a computer-readable storage medium, where instructions of the computer-readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the steps of any one of the erasure code based data processing methods provided in the first aspect.
In a fifth aspect, the present application provides a computer program product, which includes a computer program, and the computer program is executed by a processor to implement the steps of any one of the erasure code based data processing methods provided in the first aspect.
The technical scheme provided by the embodiment of the application at least has the following beneficial effects:
according to the data processing method based on the erasure codes, a plurality of original data blocks are obtained; dividing the plurality of original data blocks into two sequential groups and a plurality of cross-group data block groups; the sequence numbers of original data blocks in the sequential grouping are continuous, each group-crossing data block group comprises the same number of original data blocks, and the sequence numbers of the group-crossing data block group data blocks are discontinuous; and generating a plurality of check blocks of the plurality of original data blocks, wherein the plurality of check blocks comprise two global check blocks, sequence grouping check blocks corresponding to the sequence grouping respectively, and cross-group check blocks corresponding to the cross-group data block groups respectively. Therefore, the original data blocks are divided into the cross-group data block groups, the number of the cross-group data block groups and the number of the data blocks included in the cross-group data block groups can be flexibly adjusted, data blocks can be shared among the cross-group data block groups, data recovery can be carried out under a local action domain in the data block recovery process, overall data recovery is reduced to the maximum extent, reconstruction cost is effectively reduced, and network bandwidth and calculation time are effectively reduced. The reliability and the integrity of data in the cloud computing storage process can be better ensured. Meanwhile, the method has higher fault-tolerant capability under the same condition, occupies the hard disk as little as possible, and saves the hard disk space.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a structural diagram of an RS code according to an embodiment of the present application;
fig. 2 is a structural diagram of an LRC erasure code coding according to an embodiment of the present application;
fig. 3 is a schematic flowchart of a data processing method based on erasure codes according to an embodiment of the present application;
fig. 4 is a cross-group erasure code structure diagram of tuples (6,2,2,2) provided in the embodiment of the present application;
fig. 5 is a cross-group erasure code structure diagram of tuples (6,2,2,3) provided in the embodiment of the present application;
fig. 6 is a flowchart illustrating a method for constructing a recovery situation indication table according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an erasure code-based data processing apparatus according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.
Hereinafter, some terms in the embodiments of the present application are explained to facilitate understanding by those skilled in the art.
(1) In the embodiments of the present application, the term "plurality" means two or more, and other terms are similar thereto.
(2) "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
(3) A server serving the terminal, the contents of the service such as providing resources to the terminal, storing terminal data; the server is corresponding to the application program installed on the terminal and is matched with the application program on the terminal to run.
(4) The terminal may refer to an APP (Application) of a software class or a client. The system is provided with a visual display interface and can interact with a user; is corresponding to the server, and provides local service for the client. For software applications, except some applications that are only run locally, the software applications are generally installed on a common client terminal and need to be run in cooperation with a server terminal. After the internet has developed, more common applications include e-mail clients for e-mail receiving and sending, and instant messaging clients. For such applications, a corresponding server and a corresponding service program are required in the network to provide corresponding services, such as database services, configuration parameter services, and the like, so that a specific communication connection needs to be established between the client terminal and the server terminal to ensure the normal operation of the application program.
(5) Data blocks (data fragments): original user data is systematically divided into minimum coding units.
(6) Parity fragments: data information other than the data block, auxiliary data information generated for recovering the data block.
(7) Local check block: only linearly related to the grouped data blocks.
(8) Global check block: and has a linear correlation with all data blocks.
(9) MDS code: the maximum distance can be divided into codes, and the linear coding mode of single boundary is satisfied. Compared with other codes, the method has the lowest storage overhead under the condition of equal fault-tolerant capability.
(10) Strip: if a certain number of coding blocks are lost, the redundant set formed by a plurality of data blocks and corresponding check blocks can be regenerated by operating the residual coding blocks in the strip.
(11) Storage overhead: and the sum of the number of the initial data blocks and the number of the check blocks.
(12) Fault tolerance rate: when n blocks in the coding block are lost, the n blocks of data can be any combination of data blocks, local check blocks and global check blocks. The fault tolerance is the ratio of the number of theoretically reconstructed combinations to the total number of combinations.
(13) Fault tolerance capability: the number of maximum coding block failures that a stripe can theoretically tolerate. Assuming that the error tolerance of an erasure code is n, the erasure code can reconstruct a failed block and only if no more than n encoded blocks fail (theoretically repairable).
(14) And (3) reconstructing the overhead: the data reconstruction is the number of codes read from the slice.
Any number of elements in the drawings are by way of example and not by way of limitation, and any nomenclature is used solely for differentiation and not by way of limitation.
Erasure codes are a technology for improving data reliability of a cloud storage system, and currently, cloud storage reliability technologies are divided into erasure codes and multi-copy technologies. Because the multi-copy technology occupies more disk space in the big data era, the utilization rate of the disk is low, and the disk array frequently generates uncorrectable errors, the erasure code technology is more needed to quickly reconstruct lost data. For this reason, many fields are concerned with the improvement of erasure coding technology, and it is hoped to optimize data repair time by erasure coding technology while reducing disk occupancy.
Current coding techniques are mainly divided into two main categories: one type is an error correction code based on galois field operation; the other is erasure coding based on xor operations.
The galois field erasure code is mainly an RS code (Reed-Solomon code), which can be infinitely extended under the condition of sufficient storage system resources, and has a Maximum Distance Separable (MDS) attribute. However, there is a linear combination of redundant data and original data in the RS, resulting in the need to collect all remaining data unit information once the relevant data needs to be recovered. Resulting in additional overhead in time and space. The LRC (Local erasure Codes) erasure Codes do not perform all linear combinations of the original data and the redundant data, and the related data only performs partial linear combinations. The LRC can reduce the pulling data and can be further promoted in time and space.
In the decoding process, the RS code needs to use all data to perform recovery operation, and under the limitation of bandwidth and disk IO (Input-Output), a long time is often required. For this purpose, an LRC coding method is proposed on the basis of RS. The method can only use a small amount of associated missing data and redundant data to recover data, erasure codes cannot guarantee the data recovery in the packet to the maximum extent, and more data recovery depends on the global data. Therefore, how to use erasure code technology to process data, thereby reducing data recovery time and reducing disk occupancy rate is an urgent problem to be solved.
In view of the above, the present application provides an erasure code-based data processing method and a related apparatus, which are used to solve the problem of how to use an erasure code technology to perform data processing, thereby reducing data recovery time and reducing disk occupancy.
The invention conception of the invention is as follows: the method comprises the steps of obtaining a plurality of original data blocks; dividing a plurality of original data blocks into two sequential groups and a plurality of cross-group data block groups; the sequence numbers of original data blocks in the sequential grouping are continuous, each group-crossing data block group comprises the same number of original data blocks, and the sequence numbers of the group-crossing data block group data blocks are discontinuous; and generating a plurality of check blocks of the plurality of original data blocks, wherein the plurality of check blocks comprise two global check blocks, sequence grouping check blocks respectively corresponding to each sequence grouping, and cross-group check blocks respectively corresponding to each cross-group data block group. Therefore, the original data blocks are divided into the cross-group data block groups, the number of the cross-group data block groups and the number of the data blocks included in the cross-group data block groups can be flexibly adjusted, data blocks can be shared among the cross-group data block groups, data recovery can be carried out under a local action domain in the data block recovery process, overall data recovery is reduced to the maximum extent, and reconstruction cost is effectively reduced. Meanwhile, the method has higher fault-tolerant capability under the same condition, occupies the hard disk as little as possible, and saves the hard disk space.
After the inventive concepts of the embodiments of the present application are introduced, some simple descriptions are made below for application scenarios to which the technical solutions of the embodiments of the present application can be applied after the inventive concepts of the embodiments of the present application are introduced, and it should be noted that the application scenarios described below are only used for describing the embodiments of the present application and are not limited. In specific implementation, the technical scheme provided by the embodiment of the application can be flexibly applied according to actual needs.
The method provided by the embodiment of the application is suitable for access authority management in a scene that multiple clients share the cloud hard disk, and is also suitable for cloud hard disk services provided based on a distributed block storage cluster technology. In a common distributed block storage system at present, the method provided by the embodiment of the present application can be directly applied to manage the client access permission, so as to solve the potential data conflict problem when multiple clients share, ensure the consistency of the client permission in the cluster through the method provided by the embodiment of the present application, ensure that the service access of the client is not affected when a few nodes in the storage cluster fail, and improve the availability and reliability of the storage service.
The method provided by the embodiment of the application is also suitable for the cloud storage system, and is mainly applied to the situation that the cloud storage system is off-line, and when the single-point equipment fails, the data are quickly recovered by using the cross-group MLRC. At present, the system usually adopts an RS algorithm or a 3-copy case, but compared with the conventional RS and LRC algorithms and 3-copy, the newly proposed MLRC (Multi-Step Local Reconstruction Codes) algorithm can quickly recover data. And meanwhile, the utilization rate of the hard disk is reduced. The reliability of the data of the cloud storage system is guaranteed, and friendly user experience is provided for Platform as a Service (PASS) Service of an upper layer.
Of course, the method provided in the embodiment of the present application is not limited to be used in the foregoing application scenarios, and may also be used in other possible application scenarios, and the embodiment of the present application is not limited. To further explain the technical solutions provided by the embodiments of the present application, the following detailed description is made with reference to the accompanying drawings and the specific embodiments. Although the embodiments of the present application provide method operation steps as shown in the following embodiments or figures, more or fewer operation steps may be included in the method based on conventional or non-inventive labor. In steps where no necessary causal relationship exists logically, the order of execution of the steps is not limited to that provided by the embodiments of the present application.
To facilitate those skilled in the art to understand the erasure code based data processing method provided in the embodiments of the present application, the RS code and the LRC code are first described below with reference to the accompanying drawings.
Referring to fig. 1, a structure diagram of an RS code provided in an embodiment of the present application is shown. As shown in fig. 1, it is assumed that the original data of the RS code is divided into 6 original data blocks, 4 check blocks, each of which is linearly related to the 6 original data blocks. That is, if one of the original data blocks is lost, all other remaining original data blocks must be collected to recover the lost one of the original data blocks.
The purpose of the LRC erasure codes is to reduce the resources required to recover the data, so that the LRC needs only a portion of the original data block in the process of recovering the data. As shown in FIG. 2, the original data is divided into 6 blocks of original data, resulting in 4 parity blocks, two of which are verifiedBlock Q0、Q1Globally associated with 6 original data blocks, leaving two check blocks P0、P1The original data blocks are divided into two groups of original data blocks with equal size, and the two groups of original data blocks are sequentially grouped check blocks, and each group of original data blocks corresponds to one sequentially grouped check block. For descriptive convenience, the original data block is denoted as (d)0,d1,d2,d3,d4,d5) Check block P is grouped in sequence0Is prepared from (d)0,d1,d2) Calculated, sequentially grouped check blocks P1Is prepared from (d)3,d4,d5) And (4) calculating.
To understand, an LRC tuple (k, l, r) may be defined, where k is the number of original data blocks, l is the number of sequential packet check blocks, each sequential packet check block includes k/l of original data blocks, r is the number of global check blocks, and n ═ k + l + r data segments are total, so that it is necessary to increase the number of data segments compared to the number of original data segments
Figure BDA0003443340570000111
Multiple storage space.
The number of check blocks should then be chosen such that the erasure code has the MR (maximum Recoverable) property. Among them, the MR attributes are explained as follows: for example, an LRC tuple is (6,2,2), and the number of original data blocks of a sequential packet check block is 3, which allows a maximum of 4 original data blocks to be damaged. This time it is not of MR nature, since 4 corrupted original data blocks are not allowed to occur arbitrarily. E.g. d0,d1,d2,p0All are damaged, and at the moment, the system cannot recover the original data, and three unknowns cannot be solved because only two global check blocks exist. With the MR attribute system, any check block for the upper limit r +1 can be missing. It is not difficult to construct equations for the failed original data blocks. The difficulty is mainly reflected in constructing an equation with MR properties that can recover any missing original data block under the condition of r +1 damaged data as an upper limit.
Thus tuple (6,2,2) can recover the missing of any 3 original data blocks, while there is 86% chance to be able to handle the missing of 4 original data blocks. If the data segment is missing, it is an urgent problem to check whether the recovery operation can be performed. If the sequence grouping check block is available and at least one original data block is ensured to be erased, the sequence grouping check block and the original data block are exchanged, the exchanged data original data block is available, and the sequence grouping check block is unavailable. All grouping conditions are correspondingly operated, and then whether the data original data block and the global check block are available or not is tested. And if the original data block to be erased does not exceed the global check block, judging that the data erasing recovery operation can be carried out by using an algorithm.
From the above example of tuples (6,2,2), some general properties can be noted. Using (k, l, r) to represent:
(1) a single original data block is erased and can be recovered by k/l sequential block checks.
(2) And the original data blocks with the maximum number of r +1 are erased arbitrarily, and the system has MR property and can recover data.
From the two points, the lower bound of the check block data can be further deduced:
for any (n, k) linear codes, wherein k is an original data block and n-k is a check data block, the following properties of n-k ≧ l + r must be satisfied:
(1) and any r +1 data block is erased, and the system can recover.
(2) And erasing the single data block, and recovering the data block by k/l data blocks.
The lowest lower bound of the lowest data check block of the LRC at this time is l + r.
Therefore, the relevant knowledge and some properties of the RS code and the LRC code can be known, and the erasure code-based data processing method provided by the embodiment of the application can be conveniently understood based on the knowledge.
Referring to fig. 3, a schematic flowchart of a data processing method based on erasure codes according to an embodiment of the present application is shown. As shown in fig. 3, the method comprises the steps of:
in step 301, a plurality of original data blocks is obtained.
In step 302, dividing a plurality of original data blocks into two sequential groups and a plurality of cross-group data block groups; the sequence numbers of the original data blocks in the sequential grouping are continuous, each group-crossing data block group contains the same number of original data blocks, and the sequence numbers of the group-crossing data block group data blocks are discontinuous.
In step 303, a plurality of check blocks of the plurality of original data blocks are generated, where the plurality of check blocks include two global check blocks, a sequential grouping check block corresponding to each sequential grouping, and a cross-group check block corresponding to each cross-group data block group.
For example, as shown in fig. 4, in the case of LRC grouping, another dimension information may be added, and the dimension is mainly grouped in combination with data in the group, for example, in the case of an LRC tuple of (6,2,2), (6,2,2) represents that 6 original data blocks, 2 global parity check blocks, and 2 sequential grouping parity check blocks introduce two cross-group parity check blocks P between groups2、P3In which P is2From d0,d2,d4Generation of P3From d1,d3,d5And (4) generating. The generated tuple becomes (6,2,2, 2). The tuple (6,2,2,2) represents the meaning of 6 original data blocks d0、d1、d2、d3、d4、d52 global check blocks Q0、Q12 sequential block checkup blocks P0、P12 cross-group check blocks P2、P3
In a possible implementation manner, if the plurality of original data blocks correspond to a plurality of inter-group data block group division strategies, one of the inter-group data block group division strategies is selected, and the plurality of original data blocks are divided into a plurality of inter-group data block groups.
Illustratively, in addition to the grouping shown in fig. 4, another strategy can be used for grouping, as shown in fig. 5, each group spans two original data blocks, for example, the LRC tuple represented by (6,2,2,3) means 6 original data blocks d0、d1、d2、d3、d4、d52 global check blocks Q0、Q12 sequential block checkblocks P0、P13 cross-group check blocks P2、P3、P4. Wherein P is2From d0、d3Generation of P3From d1、d4Generation of P4From d2、d5And (4) generating.
In a possible implementation manner, the erasure code-based data processing method provided in the embodiment of the present application may determine whether a failed data block is recoverable based on a recovery condition indication table that is constructed in advance; the recovery condition indication table is used for recording the distribution condition of the failure data blocks and corresponding recovery indication information, wherein the recovery indication information is a recovery condition if the failure data blocks can be recovered, and the recovery indication information indicates that the recovery indication information cannot be reconstructed if the failure data blocks cannot be recovered; and if the failure data block can be recovered, reconstructing the failure data block by adopting a plurality of check blocks of the original data block.
In one possible embodiment, constructing the recovery case indication table may be performed as the steps shown in fig. 6:
in step 601, constructing a failure data block sample;
in step 602, for each failed data block distribution sample, establishing a determinant coefficient matrix for reconstructing the failed data block distribution sample based on the non-failed data blocks;
in step 603, on the premise that the determinant of the determinant coefficient matrix is not zero, the coefficient relationship in the determinant coefficient matrix is solved;
in step 604, the corresponding relationship between the distribution case samples of the failure data blocks and the coefficient relationship is constructed.
Illustratively, a specific process of constructing the recovery condition indication table when the number of the failed data blocks is 4 is discussed by taking 6 original data blocks and tuples (6,2,2,2) as examples. Wherein d is0、d1、d2、d3、d4、d5Representing 6 original data blocks, Q0、Q1Representing 2 global parity chunks, P0、P1Representing 2 sequential block checkblocks, P2、P3Representing 2 cross-group parity chunks. When all original data blocks are not invalid, the corresponding determinant coefficient matrix (1) can be obtained as follows:
Figure BDA0003443340570000141
the following analysis is performed for a plurality of cases in which 4 invalid data blocks occur:
(1) and 4 failure data blocks are all original data blocks, and d is assumed0,d1,d2,d3If four original data blocks fail, the determinant coefficient matrix (2) can be expressed as:
Figure BDA0003443340570000151
if the data block is theoretically reconfigurable in 4 data block failure modes, the determinant of the determinant coefficient matrix, which is a solution of the column vector where the data block is located, is not 0, and can be expressed as formula (3):
Figure BDA0003443340570000152
simplified to obtain formula (4):
(a0-a2)(a1-a3)(a0+a2-(a1+a3))≠0 (4)
obtaining the coefficient relation in the determinant coefficient matrix: a is0≠a2,a1≠a3,a0+a2≠a1+a3Therefore, the corresponding relation between the original data blocks and the coefficient relation can be constructed for the 4 failure data blocks.
(2) And the 4 failure data blocks are 3 original data blocks and 1 check block.
1. If the failed check block is a cross-group check block and the failed 3 original data blocks are in the same group as the failed cross-group data block, 2 equations and three unknowns can be obtained, but 3 unknowns cannot be solved by the 2 equations, so that the reconstruction is theoretically impossible.
2. If the failed check block is a cross-group check block and the failed 3 original data blocks are different groups from the failed cross-group data block, assuming that the failed data block is d0,d1,d2,P2The determinant coefficient matrix (5) can then be expressed as:
Figure BDA0003443340570000161
if the data block is theoretically reconfigurable in 4 data block failure modes, the solution of the column vector where the data block is located is required, that is, the determinant of the determinant coefficient matrix is not 0, and can be expressed as formula (6):
Figure BDA0003443340570000162
simplifying to obtain formula (7):
a0a1(a2-a0)≠0 (7)
obtaining the coefficient relation in the determinant coefficient matrix: a is0≠0,a1≠0,a2-a0Not equal to 0, so 4 invalid data blocks can be constructed as the corresponding relation between 3 original data blocks and 1 check block and coefficient relation.
3. If the failed check block is the global check block, assume the failed data block is d0,d1,d2,Q0The determinant coefficient matrix (8) can then be expressed as:
Figure BDA0003443340570000163
if the data block is theoretically reconfigurable in 4 data block failure modes, the solution of the column vector where the data block is located is required, that is, the determinant of the determinant coefficient matrix is not 0, and can be expressed as formula (9):
Figure BDA0003443340570000171
reduction yields equation (10):
(a2-a0)≠0,a0≠a2 (10)
obtaining the coefficient relation in the determinant coefficient matrix: (a)2-a0)≠0,a0≠a2Therefore, 4 failure data blocks can be constructed as the corresponding relation between 3 original data blocks and 1 check block and coefficient relation.
(3) The 4 failure data blocks are 2 original data blocks and 2 check blocks
1. If the failed check block is 1 cross-group check block and 1 global check block, and the failed 2 original data blocks and one of the check blocks are in the same group, assuming that the failed data block is d0,d2,P2,Q1If the data block is theoretically reconfigurable in 4 data block failure modes, the solution of the column vector where the data block is located is required, that is, the determinant of the determinant coefficient matrix is not 0, and can be expressed as formula (11):
Figure BDA0003443340570000172
the resulting theoretically invalid data block is not reconstructable.
2. If the failed check blocks are 1 cross-group check block and 1 global check block, and the failed 2 original data blocks are from 2 cross-group check blocks respectively, assuming that the failed data block is d0,d1,P2,Q1If the data block is theoretically reconfigurable in the failure mode of 4 data blocks, the solution of the column vector of the data block is needed, namely the determinant of the determinant coefficient matrix is not 0, and the-a is obtained0Not equal to 0 or a0Not equal to 0, so 4 invalid data blocks can be constructed as the corresponding relations between 2 original data blocks and 2 check blocks and coefficient relations.
3. If the number of failed check blocks is 2Checking the block across groups, assuming that the failed data block is d0,d1,P2,P3If the data block is theoretically reconfigurable in 4 data block failure modes, the solution of the column vector where the data block is located is required, that is, the determinant of the determinant coefficient matrix is not 0, and the formula (12) can be expressed as follows:
a0a1(a1-a0)≠0 (12)
obtaining the coefficient relation in the determinant coefficient matrix: a is0≠a1Not equal to 0, so 4 invalid data blocks can be constructed as the corresponding relations between 2 original data blocks and 2 check blocks and coefficient relations.
4. If the failed check block is 2 global check blocks, and the failed 2 original data blocks are in the same group with one of the non-failed cross-group check blocks, assuming that the failed data block is d0,d2,Q0,Q1If the data block is theoretically reconfigurable in 4 data block failure modes, the solution of the column vector where the data block is located is required, that is, the determinant of the determinant coefficient matrix is not 0, and can be expressed as formula (13):
Figure BDA0003443340570000181
so theoretically the failed data block is not reconstructable.
5. If the failed check block is 2 global check blocks and the failed two check blocks are from 2 cross-group check blocks respectively, assuming that the failed data block is d0,d1,Q0,Q1Since the determinant is 1, data recovery is possible in this case. And is independent of the selected coefficients.
(4) 4 failure data blocks are 1 original data block and 3 check blocks
1. If the failed 3 parity chunks include 2 cross-group parity chunks and 1 global parity chunk, assuming that the failed data chunk is d0,P2,P3,Q0Obtaining a0Not equal to 0, so 4 failed data blocks can be constructed as 1 original numberAccording to the corresponding relation between the block and the 3 check blocks and the coefficient relation.
2. If the failed 3 parity chunks include 2 global parity chunks and 1 cross-group parity chunk, and the 1 failed original data chunk comes from the failed cross-group parity chunk. Assume that the failed data block is d0,P2,Q0,Q1The determinant is constantly equal to 0, and theoretically reconstruction is impossible.
3. If the failed 3 parity chunks include 2 global parity chunks and 1 cross-group parity chunk, and the 1 failed original data chunk is not in the failed cross-group parity chunk. Assume that the failed data block is d0,P3,Q0,Q1The determinant is then constantly equal to-1, theoretically reconstructions can be made, and independently of the selected coefficients.
(5) 4 invalid data blocks are 4 check blocks
All check blocks can be obtained by re-encoding the original data block, regardless of the selected coefficients.
Therefore, the corresponding relation between the distribution condition samples of the failure data blocks and the coefficient relation can be constructed according to the method, and the recovery condition indication table is obtained. The above example is only one scheme in which when the number of the original data blocks is 6, the number of the cross-group parity chunks is 2, and the remaining failure schemes are the same as the above scheme calculation method, and in the case of calculation tolerance of 4 errors, other failed parity chunk combinations can be selected, and the final result data can be calculated in the case of synthesizing all the parity chunks.
In one possible implementation, the two global check blocks include a first global check block and a second global check block, the first global check block corresponds to the nth row coefficient in the determinant coefficient matrix, and the second global check block corresponds to the (n + 1) th row coefficient in the determinant coefficient matrix, where the determinant coefficient matrix includes n +1 row coefficients in total; each original data block corresponds to one coefficient in a row of coefficients;
if the number of the failure data blocks is 4, each failure data block is a first failure data block, a second failure data block, a third failure data block and a fourth failure data block in sequence, and the first failure data block corresponds to a first coefficient, the second failure data block corresponds to a second coefficient, the third failure data block corresponds to a third coefficient and the fourth failure data block corresponds to a fourth coefficient in the column coefficient matrix, then:
if the failure data blocks are all original data blocks, the corresponding coefficient relationship is as follows: the first coefficient is not equal to the third coefficient, the second coefficient is not equal to the fourth coefficient, and the sum of the first coefficient and the third coefficient is not equal to the sum of the second coefficient and the fourth coefficient;
if the failure data block comprises 1 cross-group check block and 3 original data blocks, and the failure original data block and the failure 1 cross-group check block are in the same group, the failure data block can not be reconstructed;
if the failure data block comprises 1 cross-group check block and 3 original data blocks, and the failure original data block and the failure 1 cross-group check block are in different groups, the corresponding coefficient relationship is as follows: the first coefficient is not equal to zero, the second coefficient is not equal to zero, and the difference between the third coefficient and the first coefficient is not zero;
if the failure data block comprises 1 global check block and 3 original data blocks, the corresponding coefficient relationship is as follows: the difference between the third coefficient and the first coefficient is not zero, and the first coefficient is not equal to the third coefficient;
if the failure data block comprises 1 group-crossing check block, 1 global check block and 2 original data blocks, and the failed 2 original data blocks are in the same group with the failed 1 group-crossing check block or the failed 1 global check block, the failure data block cannot be reconstructed;
if the failure data block comprises 1 group-crossing check block, 1 global check block and 2 original data blocks, and the 2 failed original data blocks are respectively in the same group with the 1 failed group-crossing check block and the 1 failed global check block, the corresponding coefficient relationship is as follows: the first coefficient is not zero;
if the failure data block comprises 2 cross-group check blocks and 2 original data blocks, the corresponding coefficient relationship is as follows: the first coefficient is not equal to the second coefficient, and the first coefficient and the second coefficient are not zero;
if the failure data block comprises 2 global check blocks and 2 original data blocks, and the 2 failed original data blocks and the 1 non-failed cross-group check block are in the same group, the failure data block can not be reconstructed;
if the failure data block comprises 2 global check blocks and 2 original data blocks, and the 2 failed original data blocks are respectively in the same group with the 2 non-failed cross-group check blocks, the corresponding coefficient relationship is as follows: the coefficients are unconstrained;
if the failure data block comprises 2 cross-group check blocks, 1 global check block and 1 original data block, the corresponding coefficient relationship is as follows: the first coefficient is not zero;
if the failure data block comprises 2 global check blocks, 1 cross-group check block and 1 original data block, and the failed 1 original data block and the failed 1 cross-group check block are in the same group, the failure data block can not be reconstructed;
if the failure data block comprises 2 global check blocks, 1 cross-group check block and 1 original data block, and the 1 failed original data block and the 1 failed cross-group check block are in different groups, the corresponding coefficient relationship is as follows: the coefficients are unconstrained.
Illustratively, according to the foregoing, when the number of original data blocks is 6 and the number of cross-group parity blocks is 2, four invalid data blocks are selected to obtain the distribution of the invalid data blocks and corresponding restoration indication information, as shown in table 1:
TABLE 1
Figure BDA0003443340570000201
Figure BDA0003443340570000211
Figure BDA0003443340570000221
The other schemes are similar to the analysis of the method, and the final distribution condition of the failure data blocks and the corresponding recovery indication information can be obtained under the condition of integrating all the check blocks. Therefore, under the condition that all coefficient matrixes are met, the MLRC erasure code data processing method can better adapt to the condition of 4-tolerant errors.
In a possible implementation, through computational analysis, a tuple (k, l, r, m) can be summarized, where k is represented by k original data chunks, l is represented by l global parity chunks, r is represented by r sequentially grouped parity chunks, and m is represented by m cross-group parity chunks. In order to compare different cross-group parity chunks, the conditions of 10 original data chunks and 12 original data chunks may be selected, and when the number of the original data chunks is 10, there may be 2 cross-group parity chunks or 5 cross-group parity chunks. When the number of the original data blocks is 12, there may be 2 cross-group parity blocks, 3 cross-group parity blocks, 4 cross-group parity blocks, or 6 cross-group parity blocks. The same method as that used in the foregoing is used to calculate and analyze the storage overhead and the fault-tolerant overhead in each case, which is specifically shown in table 2:
TABLE 2
Figure BDA0003443340570000222
Figure BDA0003443340570000231
Therefore, the more the group-crossing check blocks are in the same original data block, the more the divided group-crossing data block groups of the original data block are, the higher the fault tolerance rate is, and the highest fault-tolerant element is.
In one possible implementation, the minimum number of data blocks required for reconstruction of a single failed data block of the plurality of original data blocks is the number of data blocks included in a single cross-group data block group. As shown in table 2, the single block minimum reconstruction is the minimum number of data blocks required for reconstructing a single failed data block, and is the number of data blocks included in a single cross-group data block group, that is, the number of data blocks included in one cross-group check block.
In one possible embodiment, the maximum number of invalid data blocks allowed by the plurality of original data blocks is the number of total parity blocks. The highest fault-tolerant element in table 2 is the maximum number of failed data blocks allowed by the original data blocks, and is the same as the corresponding total number of parity blocks in table 2.
In a possible implementation manner, the data processing method based on the MLRC erasure code provided in this application can also be used to perform longitudinal comparison with data processing methods based on other erasure codes, and still use (10,2,2,2) for comparison, which is specifically shown in table 3:
TABLE 3
Coding scheme Storage overhead Tolerance 4 error rate Maximum fault tolerance Monolithic minimum reconstruction
RS(10,4) 14 100% 4 10
LRC(10,2,2) 14 86% 4 5
ESRC(10,2,2) 14 93% 4 6
SHEC(10,5,4) 14 83% 5 4
MLRC(10,2,2,2) 16 100% 6 5
It can be seen from table 3 that although the storage overhead of the data processing method based on the MLRC erasure code provided by the present application is increased, the error tolerance rate of 4, the highest error tolerance, and the minimum reconstruction of the single block are significantly improved compared to other data processing methods based on the erasure code.
Based on the foregoing description, the present application discloses an erasure code based data processing method, which includes obtaining a plurality of original data blocks; dividing a plurality of original data blocks into two sequential groups and a plurality of cross-group data block groups; the sequence numbers of original data blocks in the sequential grouping are continuous, each group-crossing data block group comprises the same number of original data blocks, and the sequence numbers of the group-crossing data block group data blocks are discontinuous; and generating a plurality of check blocks of the plurality of original data blocks, wherein the plurality of check blocks comprise two global check blocks, sequence grouping check blocks respectively corresponding to each sequence grouping, and cross-group check blocks respectively corresponding to each cross-group data block group. Therefore, by dividing the original data blocks into the group-crossing data block groups, the number of the group-crossing data block groups and the number of the data blocks included in the group-crossing data block groups can be flexibly adjusted, data block sharing can be performed among the group-crossing data block groups, data recovery can be performed under a local action domain in the data block recovery process, global data recovery can be reduced to the maximum extent, reconstruction cost is effectively reduced, and network bandwidth and calculation time are effectively reduced. The reliability and the integrity of data in the cloud computing storage process can be better guaranteed. Meanwhile, the method has higher fault-tolerant capability under the same condition, occupies the hard disk as little as possible, and saves the hard disk space.
As shown in fig. 7, based on the same inventive concept as the erasure code-based data processing method, an embodiment of the present application further provides an erasure code-based data processing apparatus, including: an obtaining module 701, a grouping module 702, and a check block determining module 703, wherein:
an obtaining module 701, configured to obtain multiple original data blocks;
a grouping module 702, configured to divide the plurality of original data blocks into two sequential groups and a plurality of cross-group data block groups; the sequence numbers of original data blocks in the sequential grouping are continuous, each group-crossing data block group comprises the same number of original data blocks, and the sequence numbers of the group-crossing data block group data blocks are discontinuous;
a check block determining module 703 is configured to generate a plurality of check blocks of the plurality of original data blocks, where the plurality of check blocks include two global check blocks, a sequential grouping check block corresponding to each sequential grouping, and a cross-group check block corresponding to each cross-group data block group.
In a possible implementation manner, the grouping module 702 is further configured to select one of the cross-group data block group division policies if the multiple original data blocks correspond to multiple cross-group data block group division policies, and divide the multiple original data blocks into multiple cross-group data block groups.
In a possible embodiment, the apparatus further comprises:
the recovery determining module is used for determining whether the failed data block can be recovered based on a pre-constructed recovery condition indicating table; the recovery condition indication table is used for recording the distribution condition of the failure data blocks and corresponding recovery indication information, wherein the recovery indication information is a recovery condition if the failure data blocks can be recovered, and indicates that the recovery indication information is not reconfigurable if the failure data blocks can not be recovered;
and the reconstruction module is used for reconstructing the failure data block by adopting the plurality of check blocks of the original data block if the failure data block can be recovered.
In a possible implementation, the recovery situation indication table is constructed, and the apparatus further includes:
the sample construction module is used for constructing a failure data block sample;
the matrix establishing module is used for establishing a determinant coefficient matrix for reconstructing the distribution condition samples of the failure data blocks based on the non-failure data blocks aiming at each distribution condition sample of the failure data blocks;
the calculation module is used for solving the coefficient relation in the determinant coefficient matrix on the premise that the determinant of the determinant coefficient matrix is not zero;
and the corresponding relation construction module is used for constructing the corresponding relation between the distribution condition sample of the failure data block and the coefficient relation.
The erasure code-based data processing apparatus provided in the embodiment of the present application and the erasure code-based data processing method adopt the same inventive concept, and can obtain the same beneficial effects, which are not described herein again.
Based on the same inventive concept as the erasure code-based data processing method, the embodiment of the application also provides an electronic device. An electronic device 800 according to this embodiment of the application is described below with reference to fig. 8. The electronic device 800 shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 8, the electronic device 800 is represented in the form of a general electronic device. The components of the electronic device 800 may include, but are not limited to: the at least one processor 801, the at least one memory 802, and a bus 803 that couples various system components including the memory 802 and the processor 801.
Bus 803 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a processor, or a local bus using any of a variety of bus architectures.
The memory 802 may include readable media in the form of volatile memory, such as Random Access Memory (RAM)8021 and/or cache memory 8022, and may further include Read Only Memory (ROM) 8023.
Memory 802 may also include a program/utility 8025 having a set (at least one) of program modules 8024, such program modules 8024 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The electronic device 800 may also communicate with one or more external devices 804 (e.g., keyboard, pointing device, etc.), with one or more devices that enable a user to interact with the electronic device 800, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 800 to communicate with one or more other electronic devices. Such communication may be through input/output (I/O) interfaces 805. Also, the electronic device 800 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 806. As shown, the network adapter 806 communicates with other modules for the electronic device 800 over the bus 803. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 800, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
In an exemplary embodiment, a computer-readable storage medium comprising instructions, such as memory 802 comprising instructions, executable by processor 801 to perform the above described attacker threat scoring is also provided. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
In an exemplary embodiment, a computer program product is also provided, comprising a computer program which, when executed by the processor 801, implements any of the erasure code based data processing methods as provided herein.
In an exemplary embodiment, various aspects of an erasure code based data processing method provided by the present application can also be implemented in the form of a program product, which includes program code for causing a computer device to perform the steps in the erasure code based data processing method according to various exemplary embodiments of the present application described above in this specification when the program product is run on the computer device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The program product for the erasure code-based data processing method of the embodiments of the present application may employ a portable compact disc read only memory (CD-ROM) and include program codes, and may be run on an electronic device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the consumer electronic device, partly on the consumer electronic device, as a stand-alone software package, partly on the consumer electronic device and partly on a remote electronic device, or entirely on the remote electronic device or server. In the case of remote electronic devices, the remote electronic devices may be connected to the consumer electronic device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external electronic device (e.g., through the internet using an internet service provider).
It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such division is merely exemplary and not mandatory. Indeed, the features and functions of two or more units described above may be embodied in one unit, according to embodiments of the application. Conversely, the features and functions of one unit described above may be further divided into embodiments by a plurality of units.
Further, while the operations of the methods of the present application are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable image scaling apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable image scaling apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable image scaling apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable image scaling device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer implemented process such that the instructions which execute on the computer or other programmable device provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. A data processing method based on erasure codes is characterized by comprising the following steps:
acquiring a plurality of original data blocks;
dividing the plurality of original data blocks into two sequential groups and a plurality of cross-group data block groups; the sequence numbers of original data blocks in the sequential grouping are continuous, each group-crossing data block group comprises the same number of original data blocks, and the sequence numbers of the group-crossing data block group data blocks are discontinuous;
and generating a plurality of check blocks of the plurality of original data blocks, wherein the plurality of check blocks comprise two global check blocks, sequence grouping check blocks corresponding to the sequence grouping respectively, and cross-group check blocks corresponding to the cross-group data block groups respectively.
2. The method of claim 1, wherein if the plurality of original data blocks correspond to a plurality of inter-group data block grouping strategies, one of the inter-group data block grouping strategies is selected to divide the plurality of original data blocks into a plurality of inter-group data block groups.
3. The method of claim 1, further comprising:
determining whether the failed data block can be recovered or not based on a pre-constructed recovery condition indication table; the recovery condition indication table is used for recording the distribution condition of the failure data blocks and corresponding recovery indication information, wherein the recovery indication information is a recovery condition if the failure data blocks can be recovered, and indicates that the recovery indication information is not reconfigurable if the failure data blocks can not be recovered;
and if the failure data block can be recovered, reconstructing the failure data block by adopting a plurality of check blocks of the original data block.
4. The method according to claim 3, wherein constructing the recovery situation indication table specifically includes:
constructing a failure data block sample;
for each failed data block distribution condition sample, establishing a determinant coefficient matrix for reconstructing the failed data block distribution condition sample based on the non-failed data blocks;
on the premise that the determinant of the determinant coefficient matrix is not zero, solving the coefficient relation in the determinant coefficient matrix;
and constructing a corresponding relation between the distribution condition samples of the failure data blocks and the coefficient relation.
5. The method according to claim 4, wherein the two global parity check blocks comprise a first global parity check block and a second global parity check block, and in the row and column coefficient matrix, the first global parity check block corresponds to the n-th row coefficient in the determinant coefficient matrix, and the second global parity check block corresponds to the n + 1-th row coefficient in the determinant coefficient matrix, wherein the determinant coefficient matrix comprises n +1 row coefficients; each original data block corresponds to one coefficient in a row of coefficients;
if the number of the failure data blocks is 4, each failure data block is a first failure data block, a second failure data block, a third failure data block and a fourth failure data block in sequence, and the first failure data block corresponds to a first coefficient, the second failure data block corresponds to a second coefficient, the third failure data block corresponds to a third coefficient and the fourth failure data block corresponds to a fourth coefficient in the determinant coefficient matrix, then:
if the failure data blocks are all original data blocks, the corresponding coefficient relationship is as follows: the first coefficient is not equal to the third coefficient, the second coefficient is not equal to the fourth coefficient, and the sum of the first coefficient and the third coefficient is not equal to the sum of the second coefficient and the fourth coefficient;
if the failure data block comprises 1 cross-group check block and 3 original data blocks, and the failure original data block and the failure 1 cross-group check block are in the same group, the failure data block can not be reconstructed;
if the failure data block comprises 1 cross-group check block and 3 original data blocks, and the failure original data block and the failure 1 cross-group check block are in different groups, the corresponding coefficient relationship is as follows: the first coefficient is not equal to zero, the second coefficient is not equal to zero, and the difference between the third coefficient and the first coefficient is not zero;
if the failure data block comprises 1 global check block and 3 original data blocks, the corresponding coefficient relationship is as follows: the difference between the third coefficient and the first coefficient is not zero, and the first coefficient is not equal to the third coefficient;
if the failure data block comprises 1 group-crossing check block, 1 global check block and 2 original data blocks, and the failed 2 original data blocks are in the same group with the failed 1 group-crossing check block or the failed 1 global check block, the failure data block cannot be reconstructed;
if the failure data block comprises 1 group-crossing check block, 1 global check block and 2 original data blocks, and the 2 failed original data blocks are respectively in the same group with the 1 failed group-crossing check block and the 1 failed global check block, the corresponding coefficient relationship is as follows: the first coefficient is not zero;
if the failure data block comprises 2 cross-group check blocks and 2 original data blocks, the corresponding coefficient relationship is as follows: the first coefficient is not equal to the second coefficient, and the first coefficient and the second coefficient are not zero;
if the failure data block comprises 2 global check blocks and 2 original data blocks, and the 2 failed original data blocks and the 1 non-failed cross-group check block are in the same group, the failure data block can not be reconstructed;
if the failure data block comprises 2 global check blocks and 2 original data blocks, and the 2 failed original data blocks are respectively in the same group with the 2 non-failed cross-group check blocks, the corresponding coefficient relationship is as follows: the coefficients are unconstrained;
if the failure data block comprises 2 cross-group check blocks, 1 global check block and 1 original data block, the corresponding coefficient relationship is as follows: the first coefficient is not zero;
if the failure data block comprises 2 global check blocks, 1 cross-group check block and 1 original data block, and the failed 1 original data block and the failed 1 cross-group check block are in the same group, the failure data block can not be reconstructed;
if the failure data block comprises 2 global check blocks, 1 cross-group check block and 1 original data block, and the 1 failed original data block and the 1 failed cross-group check block are in different groups, the corresponding coefficient relationship is as follows: the coefficients are unconstrained.
6. The method according to any one of claims 1 to 5, wherein the minimum number of data blocks required for reconstructing a single failed data block of the plurality of original data blocks is the number of data blocks included in a single cross-group data block group.
7. The method of any of claims 1-5, wherein the maximum number of failed data blocks allowed for the plurality of original data blocks is the number of total parity blocks.
8. An erasure code-based data processing apparatus, the apparatus comprising:
the acquisition module is used for acquiring a plurality of original data blocks;
the grouping module is used for dividing the original data blocks into two sequential groups and a plurality of cross-group data block groups; the sequence numbers of original data blocks in the sequential grouping are continuous, each group-crossing data block group comprises the same number of original data blocks, and the sequence numbers of the group-crossing data block group data blocks are discontinuous;
and the check block determining module is used for generating a plurality of check blocks of the plurality of original data blocks, wherein the plurality of check blocks comprise two global check blocks, sequential grouping check blocks respectively corresponding to the sequential grouping, and cross-group check blocks respectively corresponding to the cross-group data block groups.
9. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the steps of the erasure code based data processing method according to any one of claims 1-7.
10. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the steps of the erasure code based data processing method of any one of claims 1-7.
CN202111640791.0A 2021-12-29 2021-12-29 Data processing method based on erasure codes and related device Pending CN114443350A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111640791.0A CN114443350A (en) 2021-12-29 2021-12-29 Data processing method based on erasure codes and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111640791.0A CN114443350A (en) 2021-12-29 2021-12-29 Data processing method based on erasure codes and related device

Publications (1)

Publication Number Publication Date
CN114443350A true CN114443350A (en) 2022-05-06

Family

ID=81365497

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111640791.0A Pending CN114443350A (en) 2021-12-29 2021-12-29 Data processing method based on erasure codes and related device

Country Status (1)

Country Link
CN (1) CN114443350A (en)

Similar Documents

Publication Publication Date Title
EP3014450B1 (en) Erasure coding across multiple zones
CN111149093B (en) Data encoding, decoding and repairing method of distributed storage system
US8458515B1 (en) Raid5 recovery in a high availability object based file system
US9098447B1 (en) Recovery of corrupted erasure-coded data files
US8751897B2 (en) Distributed system for fault-tolerant data storage
US20080155191A1 (en) Systems and methods for providing heterogeneous storage systems
US20170060684A1 (en) Encoding data for storage in a dispersed storage network
US10592344B1 (en) Generation and verification of erasure encoded fragments
CN110413208B (en) Method, apparatus and computer program product for managing a storage system
US11074146B2 (en) Method, device and computer program product for managing redundant arrays of independent drives
US10346066B2 (en) Efficient erasure coding of large data objects
Venkatesan et al. Effect of codeword placement on the reliability of erasure coded data storage systems
CN113687975B (en) Data processing method, device, equipment and storage medium
US9489254B1 (en) Verification of erasure encoded fragments
CN102843212B (en) Coding and decoding processing method and device
WO2024001494A1 (en) Data storage method, single-node server, and device
CN113552998B (en) Method, apparatus and program product for managing stripes in a storage system
US11609820B2 (en) Method and system for redundant distribution and reconstruction of storage metadata
US9552254B1 (en) Verification of erasure encoded fragments
US9098446B1 (en) Recovery of corrupted erasure-coded data files
US11157362B2 (en) Elastic storage in a dispersed storage network
CN111506450B (en) Method, apparatus and computer program product for data processing
US9489252B1 (en) File recovery using diverse erasure encoded fragments
CN114443350A (en) Data processing method based on erasure codes and related device
US11561859B2 (en) Method, device and computer program product for managing data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination