CN114116297B

CN114116297B - Data encoding method, device, equipment and medium

Info

Publication number: CN114116297B
Application number: CN202210097075.0A
Authority: CN
Inventors: 吴睿振; 张旭; 陈静静; 张永兴; 王凛
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2022-01-27
Filing date: 2022-01-27
Publication date: 2022-04-22
Anticipated expiration: 2042-01-27
Also published as: CN114116297A

Abstract

The application discloses a data encoding method, a device, equipment and a medium, wherein the method comprises the steps of determining an encoding matrix and determining a storage erasure structure; reading original data, and performing coding calculation by using a coding matrix based on the original data to obtain an original data block and a check data block; and performing stripe grouping based on the preset number of the stripes to obtain a plurality of stripe groups, and updating the check data block by using the original data block in the same stripe group to obtain an updated check data block. Therefore, on the basis of the original data block and the check data block, the original data block is used for updating the check data block in the same strip group to obtain the updated check data block so as to complete coding.

Description

Data encoding method, device, equipment and medium

Technical Field

The present invention relates to the field of computers, and in particular, to a data encoding method, apparatus, device, and medium.

Background

At present, with the rapid development of communication technology and network technology, the digital information is exponentially and explosively increased, and the data storage technology is also greatly challenged. The reliability of data in memory systems and the power consumption of memory systems are of increasing concern. Now facing such a huge data scale, the reliability of data in a storage system is inversely proportional to the number of components contained in the storage system, i.e. the greater the number of components of the storage system, the lower the reliability of data in the storage system. According to the related research, about 30 disks are damaged in an internet data center consisting of 600 disks each month, and the data reliability reduction caused by the disk failure is a serious problem in a large-scale storage system, and researches on related fault-tolerant technologies are carried out.

The large stripe erasure is a relatively clear application requirement, and the large stripe in the large stripe erasure means that the number of the stripes corresponding to the data and the verification is relatively large, so that the security of the data can be greatly improved, and the probability of the requirement of hard disk inspection is reduced. However, under the condition of erasure of a large stripe, if any error is found, data is recovered, and by using the existing erasure correction algorithm, more data blocks need to be taken out, the data volume is too large, while the current storage limit working speed is mainly the IOPS of the hard disk, the reading speed caused by the large data volume is very slow, and the recovery speed is directly very slow.

In summary, in the large stripe erasure scenario, it is a problem to be solved at present that the number of data blocks that need to be read during data decoding recovery is reduced to accelerate the data decoding recovery speed.

Disclosure of Invention

In view of the above, an object of the present invention is to provide a data encoding method, which can reduce the number of data blocks that need to be read during data decoding recovery, so as to speed up the data decoding recovery. The specific scheme is as follows:

in a first aspect, the present application discloses a data encoding method, comprising:

determining a coding matrix based on the number of the preset original data blocks, the number of the preset check blocks and a historical erasure correcting code algorithm, and determining a storage erasure correcting structure based on the number of the preset original data blocks, the number of the preset check blocks and the number of the preset strips; the storage erasure structure comprises a data disc, a check disc and a stripe;

reading original data, and performing coding calculation by using the coding matrix based on the original data to obtain an original data block and a check data block;

and performing stripe grouping based on the preset stripe number to obtain a plurality of stripe groups, and updating the check data block by using the original data block in the same stripe group to obtain an updated check data block.

Optionally, the updating the check data block by using the original data block in the same stripe group to obtain an updated check data block includes:

sequencing the strips with the preset number of the strips to obtain a strip sequencing sequence number;

performing stripe grouping based on the preset stripe number and the stripe sorting sequence number to obtain a plurality of stripe groups comprising an even-numbered stripe and an odd-numbered stripe;

one of the even-numbered stripes and the odd-numbered stripes of the same stripe group is used as a basic stripe, and the other is used as an operation stripe;

and updating the check data block by using the original data block in the basic stripe and the operation stripe of the same stripe group.

Optionally, before updating the check data block by using the original data block in the base stripe and the operation stripe of the same stripe group, the method further includes:

determining the number to be updated of the check data blocks to be updated participating in the updating step in the basic stripe and the operation stripe of the same stripe group;

and dividing the number of the preset original data blocks by the number to be updated to obtain a first numerical value, and performing rounding-up operation on the first numerical value to obtain a second numerical value.

Optionally, the updating the verification data block by using the original data block to obtain an updated verification data block includes:

sequencing the data disks based on a preset sequencing rule to obtain a target data disk serial number, and sequencing the check disks based on the target data disk serial number to obtain a target check disk serial number;

determining a first target data disk sequence number corresponding to the original data block of the basic stripe in the storage erasure structure and a second target data disk sequence number corresponding to the original data block of the operation stripe;

in the same stripe group, the original data blocks corresponding to the basic stripe are listed as one group, the number of the original data blocks to be updated is determined repeatedly, and then the original data blocks are sorted to obtain a first sorting sequence number;

sequentially taking the data disks with the number corresponding to the second numerical value as a group based on the second target data disk serial number, and performing group sorting to obtain a second group sorting serial number of the number to be updated;

when the first target data disk serial number, the first sorting serial number and the second sorting serial number are the same, determining a group of original data blocks corresponding to the first sorting serial number, and determining a first target original data block corresponding to the first target data disk serial number from the group of original data blocks;

determining a second target original data block with the quantity corresponding to the second numerical value corresponding to the second sorting sequence number in the operation strip, and performing exclusive-or operation on the second target original data block and a first target original data block in the group of original data blocks to obtain multiple groups of operated original data blocks with unchanged first sorting sequence numbers and the quantity to be updated;

and respectively updating the plurality of verification data blocks of the quantity to be updated by utilizing the plurality of groups of the operated original data blocks to obtain updated verification data blocks.

Optionally, the updating, by using multiple sets of the calculated original data blocks, the multiple check data blocks of the quantity to be updated respectively to obtain updated check data blocks includes:

dividing the number to be updated by 2 to obtain a target numerical value; the target value is the number of the check data blocks to be updated in one strip;

sequentially corresponding multiple groups of the calculated original data blocks with the first sorting sequence numbers not larger than the target value to the to-be-updated check data blocks in the basic stripe arranged according to the sequence with the sequentially increasing target check disk sequence numbers according to the sequence with the sequentially increasing first sorting sequence numbers, and determining the corresponding relation;

sequentially corresponding multiple groups of the calculated original data blocks with the first sorting sequence numbers smaller than the target value to the to-be-updated check data blocks in the operation strip arranged according to the sequence with the sequentially increasing target check disk sequence numbers according to the sequence with the sequentially increasing first sorting sequence numbers, and determining the corresponding relation;

and determining a group of the calculated original data blocks and one to-be-updated check data block which all have corresponding relations, and updating the to-be-updated check data block by using the group of the calculated original data blocks to obtain an updated check data block.

Optionally, the determining that all of a group of the calculated original data blocks and one to-be-updated check data block have a corresponding relationship, and updating the to-be-updated check data block by using the group of the calculated original data blocks to obtain an updated check data block includes:

determining a group of the calculated original data blocks and one to-be-updated check data block which all have corresponding relations;

determining a row of parameters corresponding to one check data block to be updated from the coding matrix as a row matrix, and taking the group of the calculated original data blocks as a column matrix;

and multiplying the row matrix and the column matrix to obtain an updated check data block.

Optionally, the performing stripe grouping based on the preset number of stripes to obtain a plurality of stripe groups, and updating the check data block in the same stripe group by using the original data block to obtain an updated check data block includes:

when the preset stripe number is an even number, performing stripe grouping based on the preset stripe number to obtain a plurality of stripe groups, and updating the check data block by using the original data block in the same stripe group to obtain an updated check data block;

or when the preset number of stripes is an odd number, performing stripe grouping based on the preset number of stripes to obtain a plurality of stripe groups, and leaving a single stripe, then updating the check data block in the same stripe group by using the original data block to obtain an updated check data block, and forbidding the single stripe to participate in the updating step.

In a second aspect, the present application discloses a data encoding apparatus comprising:

a matrix determining module, configured to determine an encoding matrix based on the number of preset original data blocks, the number of preset check blocks, and a historical erasure code algorithm;

a structure determining module, configured to determine a storage erasure correcting structure based on the number of preset original data blocks, the number of preset check blocks, and the number of preset stripes; the storage erasure structure comprises a data disc, a check disc and a stripe;

the encoding module is used for reading original data and carrying out encoding calculation by utilizing the encoding matrix based on the original data to obtain an original data block and a check data block;

and the updating module is used for grouping the strips based on the preset number of the strips to obtain a plurality of strip groups, and updating the verification data block by using the original data block in the same strip group to obtain an updated verification data block.

In a third aspect, the present application discloses an electronic device comprising a processor and a memory; wherein the processor implements the data encoding method disclosed above when executing the computer program stored in the memory.

In a fourth aspect, the present application discloses a computer readable storage medium for storing a computer program; wherein the computer program realizes the data encoding method disclosed in the foregoing when executed by a processor.

As can be seen, in the present application, an encoding matrix is determined based on the number of preset original data blocks, the number of preset check blocks, and a historical erasure correction code algorithm, and a storage erasure correction structure is determined based on the number of preset original data blocks, the number of preset check blocks, and the number of preset stripes; the storage erasure structure comprises a data disc, a check disc and a stripe; reading original data, and performing coding calculation by using the coding matrix based on the original data to obtain an original data block and a check data block; and performing stripe grouping based on the preset stripe number to obtain a plurality of stripe groups, and updating the check data block by using the original data block in the same stripe group to obtain an updated check data block. Therefore, the encoding method improves the encoding complexity by recombining the encoded original data blocks to participate in encoding, reduces the data blocks required during decoding, can decode and recover the error original data blocks by using fewer data blocks, and accelerates the speed of decoding and recovering.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a flowchart of a data encoding method provided in the present application;

FIG. 2 is a flow chart of a specific data encoding method provided in the present application;

fig. 3 is a schematic diagram of an erasure code algorithm provided in the present application;

FIG. 4 is a schematic diagram of a storage erasure structure provided in the present application;

FIG. 5 is a schematic diagram of a storage erasure correction structure after stripe grouping according to the present application;

FIG. 6 is a schematic structural diagram of a data encoding apparatus provided in the present application;

fig. 7 is a block diagram of an electronic device provided in the present application.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

When decoding recovery is currently performed, too many data blocks need to be read, and the reading speed is very slow due to too large data amount, so that the decoding recovery speed is very slow.

In order to overcome the above problems, the present application provides a data encoding scheme, which can reduce the number of data blocks that need to be read during data decoding recovery, so as to speed up the data decoding recovery.

Referring to fig. 1, an embodiment of the present application discloses a data encoding method, including:

step S11: determining a coding matrix based on the number of the preset original data blocks, the number of the preset check blocks and a historical erasure correcting code algorithm, and determining a storage erasure correcting structure based on the number of the preset original data blocks, the number of the preset check blocks and the number of the preset strips; the storage erasure structure includes a data disk, a parity disk, and a stripe.

In the embodiment of the present application, the historical Erasure Coding (EC) is a data protection method, which divides data into segments, expands, codes redundant data, and stores the redundant data in different locations, such as a disk, a storage node, or other geographical locations.

In the embodiment of the application, the number of the preset original data blocks and the number of the preset check blocks may be any values; the preset stripe number may be any value, and may be even or odd.

In the embodiment of the application, an encoding matrix is determined based on the preset original data block number, the preset check block number and a historical erasure code algorithm, the preset original data block number is k, and when the preset check block number is r, the encoding matrix of k x (k + r) can be obtained.

In the embodiment of the application, based on predetermine the original data block quantity predetermine check block quantity and predetermine strip quantity and confirm that the storage is rectified and is deleted the structure, predetermine original data block quantity and be k, predetermine the quantity of check block quantity and be r, when predetermine strip quantity and be s, the storage that obtains is rectified and is deleted the structure and has strip 1 to strip s totally s strips, has 1 to the k data disc of dish totally, has k +1 to the k + r check-up discs of dish totally.

Step S12: and reading original data, and performing coding calculation by using the coding matrix based on the original data to obtain an original data block and a check data block.

In the embodiment of the application, after the original data is obtained, the original data is divided into original data blocks, then the coding relation and the decoding relation are set, and the original data blocks and the check data blocks corresponding to each strip are obtained by coding calculation based on the original data blocks by using the coding matrix. Since the number of the preset original data blocks is k, the number of the preset check blocks is r, and the number of the preset stripes is s, the number of the original data blocks of all the stripes obtained by calculation is k × s, and the number of the check data blocks is r × s.

Step S13: and performing stripe grouping based on the preset stripe number to obtain a plurality of stripe groups, and updating the check data block by using the original data block in the same stripe group to obtain an updated check data block.

In the embodiment of the application, the number of the preset strips is s, when the number of the preset strips is an even number, the strips are grouped based on the number of the preset strips to obtain a plurality of strip groups, and the original data block is used for updating the check data block in the same strip group to obtain the updated check data block. And when the number of the preset stripes is an odd number, grouping the stripes with the preset number of the stripes to obtain a plurality of stripe groups, and forbidding the check data block corresponding to the single stripe to participate in the updating step when the single stripe is left.

It should be noted that the updating process is performed in the same stripe group, and different stripe groups cannot be updated with each other.

As can be seen, in the present application, an encoding matrix is determined based on the number of preset original data blocks, the number of preset check blocks, and a historical erasure correction code algorithm, and a storage erasure correction structure is determined based on the number of preset original data blocks, the number of preset check blocks, and the number of preset stripes; the storage erasure structure comprises a data disc, a check disc and a stripe; reading original data, and performing coding calculation by using the coding matrix based on the original data to obtain an original data block and a check data block; and performing stripe grouping based on the preset stripe number to obtain a plurality of stripe groups, and updating the check data block by using the original data block in the same stripe group to obtain an updated check data block. Therefore, the encoding method improves the encoding complexity by recombining the encoded original data blocks to participate in encoding, reduces the data blocks required during decoding, can decode and recover error data by using fewer data blocks, and accelerates the decoding and recovering speed.

Referring to fig. 2, an embodiment of the present application discloses a specific data encoding method, which includes:

step S21: determining a coding matrix based on the number of the preset original data blocks, the number of the preset check blocks and a historical erasure correcting code algorithm, and determining a storage erasure correcting structure based on the number of the preset original data blocks, the number of the preset check blocks and the number of the preset strips; the storage erasure structure includes a data disk, a parity disk, and a stripe.

For a more specific processing procedure of step S21, reference may be made to corresponding contents disclosed in the foregoing embodiments, and details are not repeated here.

Step S22: and reading original data, and performing coding calculation by using the coding matrix based on the original data to obtain an original data block and a check data block.

For a more specific processing procedure of step S22, reference may be made to corresponding contents disclosed in the foregoing embodiments, and details are not repeated here.

Step S23: sequencing the strips with the preset number of the strips to obtain a strip sequencing sequence number; and grouping the stripes based on the preset number of the stripes and the stripe sorting sequence number to obtain a plurality of stripe groups comprising an even-numbered stripe and an odd-numbered stripe.

In the embodiment of the application, the strips with the preset number of strips are sequenced to obtain a strip sequencing serial number, for example, when the preset number of strips is 4, strip sequencing serial numbers of 4 strips in total, namely, strip 1, strip 2, strip 3 and strip 4, are obtained; then, performing stripe grouping based on the preset stripe number and the stripe sorting sequence number to obtain a plurality of stripe groups including an even-numbered stripe and an odd-numbered stripe, for example, a stripe 1 and a stripe 2 are grouped into one group, that is, a group 1, and a stripe 3 and a stripe 4 are grouped into one group, that is, a group 2; the stripes corresponding to the stripes 1 and 3 are odd numbered stripes, and the stripes corresponding to the stripes 2 and 4 are even numbered stripes.

Step S24: one of the even-numbered stripes and the odd-numbered stripes of the same stripe group is used as a basic stripe, and the other is used as an operation stripe; and updating the check data block by using the original data block in the basic stripe and the operation stripe of the same stripe group.

In the embodiment of the present application, one of the even-numbered stripes and the odd-numbered stripes in the same stripe group is used as a basic stripe, and the other is used as an operation stripe; specifically, when the odd-numbered stripes are used as the basic stripes, the even-numbered stripes are the operation stripes; when the even numbered stripes are used as the basic stripes, the odd numbered stripes are the operation stripes. For example, the odd numbered stripes are used as the base stripes, the even numbered stripes are the operation stripes, and the method can be expressed by using the stripe corresponding to the stripe 1 in the group 1 as the base stripe, and using the stripe corresponding to the stripe 2 as the operation stripe, and using the stripe corresponding to the stripe 3 as the base stripe and the stripe corresponding to the stripe 4 as the operation stripe in another stripe group, that is, in the group 2.

In this embodiment of the present application, in the basic stripe and the operation stripe of the same stripe group, the original data block is used to update the parity data block, and the specific process is to determine a first target data disk serial number corresponding to the original data block of the basic stripe and a second target data disk serial number corresponding to the original data block of the operation stripe in the storage erasure correcting structure; in the same stripe group, the original data blocks corresponding to the basic stripe are listed as one group, the number of the original data blocks to be updated is determined repeatedly, and then the original data blocks are sorted to obtain a first sorting sequence number; sequentially taking the data disks with the number corresponding to the second numerical value as a group based on the second target data disk sequence number, and performing group sequencing to obtain a second group sequencing sequence number of the quantity to be updated; when the first target data disk serial number, the first sorting serial number and the second sorting serial number are the same, determining a group of original data blocks corresponding to the first sorting serial number, and determining a first target original data block corresponding to the first target data disk serial number from the group of original data blocks; determining a second target original data block with the quantity corresponding to the second numerical value corresponding to the second sorting sequence number in the operation strip, and performing exclusive-or operation on the second target original data block and a first target original data block in the group of original data blocks to obtain multiple groups of operated original data blocks with unchanged first sorting sequence numbers and the quantity to be updated; and respectively updating the plurality of verification data blocks of the quantity to be updated by utilizing the plurality of groups of the operated original data blocks to obtain updated verification data blocks. It should be noted that the second numerical value calculation method is to determine the number to be updated of the parity data blocks to be updated participating in the updating step in the base stripe and the operation stripe of the same stripe group; and dividing the number of the preset original data blocks by the number to be updated to obtain a first numerical value, and performing rounding-up operation on the first numerical value to obtain a second numerical value.

It should be noted that, when the data disks with the corresponding number of the second numerical values are sequentially used as a group based on the second target data disk serial number, and the second group sorting serial number of the quantity to be updated is obtained by performing group sorting, if the number of the data disks in the operation stripe is not enough to support the division of the data disks into multiple groups of data disks with the quantity to be updated and each group of data disks contains the second numerical value number of data disks, the data disks are firstly grouped according to the above grouping manner, and when the number of the data disks is not enough, the number of the data disks in each group of data disks may be smaller than the second numerical value number, or may be 0, but if the number of the data disks in the previous data disk group is smaller than the second numerical value number, the number of the data disks in the current data disk group is 0.

It should be noted that, the steps of respectively updating the plurality of check data blocks of the number to be updated by using a plurality of groups of the calculated original data blocks to obtain the updated check data blocks include dividing the number to be updated by 2 to obtain a target value; the target value is the number of the check data blocks to be updated in one strip; sequentially corresponding multiple groups of the calculated original data blocks with the first sorting sequence numbers not larger than the target value to the to-be-updated check data blocks in the basic stripe arranged according to the sequence with the sequentially increasing target check disk sequence numbers according to the sequence with the sequentially increasing first sorting sequence numbers, and determining the corresponding relation; sequentially corresponding multiple groups of the calculated original data blocks with the first sorting sequence numbers smaller than the target value to the to-be-updated check data blocks in the operation strip arranged according to the sequence with the sequentially increasing target check disk sequence numbers according to the sequence with the sequentially increasing first sorting sequence numbers, and determining the corresponding relation; and determining a group of the calculated original data blocks and one to-be-updated check data block which all have corresponding relations, and updating the to-be-updated check data block by using the group of the calculated original data blocks to obtain an updated check data block.

The specific steps of determining a group of the calculated original data blocks and one to-be-updated check data block which all have a corresponding relationship, and updating the to-be-updated check data block by using the group of the calculated original data blocks to obtain an updated check data block include determining a group of the calculated original data blocks and one to-be-updated check data block which all have a corresponding relationship; determining a row of parameters corresponding to one check data block to be updated from the coding matrix as a row matrix, and taking the group of the calculated original data blocks as a column matrix; and multiplying the row matrix and the column matrix to obtain an updated check data block.

In the embodiment of the present application, when the preset number of stripes is an even number, the stripes are grouped based on the preset number of stripes, and a plurality of stripe groups including the basic stripes and the operation stripes are obtained. And when the number of the preset stripes is an odd number, grouping the stripes with the preset number of the stripes to obtain a plurality of stripe groups containing the basic stripes and the operation stripes, remaining a single stripe, and forbidding the check data block corresponding to the single stripe to participate in the updating step.

Determining a coding matrix based on the number of the preset original data blocks, the number of the preset check blocks and a historical erasure correction code algorithm, and determining a storage erasure correction structure based on the number of the preset original data blocks, the number of the preset check blocks and the number of the preset strips; the storage erasure structure comprises a data disc, a check disc and a stripe; reading original data, and performing coding calculation by using the coding matrix based on the original data to obtain an original data block and a check data block; one of the even-numbered stripes and the odd-numbered stripes of the same stripe group is used as a basic stripe, and the other is used as an operation stripe; and updating the check data block by using the original data block in the basic stripe and the operation stripe of the same stripe group. Therefore, the encoding method has the advantages that the encoding complexity is improved, meanwhile, the encoded data reading is not increased, the number of required data blocks is reduced during decoding, the error data can be decoded and recovered by using fewer data blocks, and the decoding and recovering speed is increased.

Historical Erasure Coding (EC) is a data protection method that segments data into fragments, expands, encodes, and stores redundant data in different locations, such as disks, storage nodes, or other geographic locations. Dividing original data into k original data blocks, generating r check data blocks according to an encoding matrix, and distributing n (n = k + r) blocks to different servers. Only k blocks are needed to restore the original data; where k of the k original data blocks represents the number of blocks into which the original data is divided and the minimum number of blocks from which the original data is restored. The smaller the k value is, the higher the cost of data reconstruction is when a fault occurs; the larger the k value is, the more data copying is needed, the load of a network and input and output equipment is increased, and r in r check data blocks represents the reliability and storage cost which influence data storage. The larger the value is, the greater the tolerance to the fault is, the redundancy of the data is increased, and the storage cost is also increased. n is the total number of generated blocks, which is the sum of the original data block and the check data block; the effective memory ratio is k/n at this time.

In addition, historical erasure codes typically utilize Van der Monte or Cauchy matrices, the encoding of which is shown in FIG. 3; the figure lists a number of preset original data blocks and a number of the preset check blocks, where the number of the preset original data blocks is k =5, the number of the preset check data blocks, that is, the coding requirement is r =3, the final generated code block is a D + C part, the total number is k + r =8, and the effective storage ratio is: k/n = 5/8. In the figure, B is a parameter corresponding to a check data block in the coding matrix, D is an original data block, and D + C is the coded original data block and the check data block. The erasure correction system implemented in this way can encode k D to obtain r C, and the encoding mode is shown in fig. 3.

The erasure correction system can decode and recover any r errors in the system after the r codes are realized. Erasure codes belong to a forward error correction technique in coding theory, and are applied to the communication field for the first time to solve the problems of loss and loss in data transmission. Erasure coding techniques have been introduced into the storage area because of their superior effectiveness in preventing data loss. Erasure codes can effectively reduce storage overhead while ensuring the same reliability, and therefore erasure code technology is widely applied to various large storage systems and data centers, such as, for example, Azure by microsoft, F4 by Facebook, and the like. The erasure codes are of various types, and RS codes (Reed-Solomon codes) applied in a distributed environment are more common in a real storage system. The RS code is associated with two parameters k and r. Given two positive integers, k and r, the RS code encodes k original data blocks into r additional check data blocks. The way that the r check data blocks are encoded based on the vandermonde matrix or the cauchy matrix is called RS erasure code encoded by the vandermonde matrix or the cauchy matrix, and for example, a specific encoding process is as follows, and is RS erasure code of the vandermonde matrix; wherein D is₁To D_kFor the original data block, P₁To P_rFor verifying the data block, the corresponding encoding process of the vandermonde matrix is as follows:

；

in addition, the specific encoding process is shown as follows as RS erasure code of cauchy matrix; wherein D is₁To D_kFor the original data block, P₁To P_rFor the check data block, the corresponding encoding process for the cauchy matrix is as follows:

；

in general, the encoding process is performed by encoding the matrix with the original data D₁To D_kMultiply to obtain D₁To D_kOriginal data block of and newly added P₁To P_rCorresponding r check data blocks. When any r original data blocks are in error or lost in transmission and need to be corrected, the inverse matrix of the matrix corresponding to the residual data blocks is used for multiplying the original data blocks and the check data blocks, and the original data block D is obtained₁To D_kI.e. the process of historical decoding recovery; the specific formula is as follows:

；

therefore, it can be seen that the core concept of erasure codes is to construct a reversible coding matrix for generating parity data blocks, and the inverse matrix can be calculated to recover the original data blocks. Common RS erasure codes use the above-described cauchy matrix or vandermonde matrix, which has the advantage that the resulting matrix is definitely reversible, any sub-matrix thereof is also reversible, and the size expansion of the matrix is simple.

Most of the existing erasure algorithms are RS (Reed-Solomon) algorithms, which have the advantages of simple calculation, flexible expansion and the like, and thus have wide application in the industry. The RS algorithm generally employs the van der mond or cauchy algorithm as described above.

Specifically, an example that the number of the preset original data blocks is 5, the number of the preset check blocks is 4, and the number of the preset stripes is 4 is enumerated, at this time, the coding relationship and the decoding relationship are respectively set as follows:

;

;

the coding formula is as follows:

;

with p₁For example, the coding detail formula is:

;

wherein the content of the first and second substances,

is an xor sign. Corresponding in decoding can be obtained in the same way

The relationship (2) of (c). And then obtaining all original data blocks and check data blocks by using the content.

A storage erasure structure obtained based on the number of the preset original data blocks being 5, the number of the preset check blocks being 4, and the number of the preset stripes being 4 is shown in fig. 4; the specific stripes 1, 2, 3 and 4 are 4 stripes, the disks 1, 2, 3, 4 and 5 are data disks, and the disks 6, 7, 8 and 9 are check disks; in the storage erasure structure, each current hard disk is divided into four strips, load balance is not considered, and only the relation between data and verification is considered.

On the premise of storing the erasure structure and all the original data blocks and the parity data blocks as described above, the stripes are grouped, where stripe 1 and stripe 2 are a group, that is, group 1, and stripe 3 and stripe 4 are a group, that is, group 2, and specifically shown in fig. 5. Then, taking the odd-numbered stripes as basic stripes and the even-numbered stripes as operation stripes to obtain a basic stripe group as follows:

；

next, taking group 1 as an example, the parity data block is updated. In group 1, the original data blocks corresponding to disc 6 do not participate in the update step, because the original data blocks corresponding to disc 6 utilize the parameters of the encoding matrix that are the simplest. Determining that the number of the parity data blocks in the disk 7, the disk 8, and the disk 9 in the group 1 is 6, determining that all the original data blocks of the stripe corresponding to the stripe 1 in the group 1 are 5, then, equally dividing the 6 parity data blocks by the 5 original data blocks, and calculating the following formula:

；

in the formula, x is the data quantity of each block after the average division, k is the data disk number and also the original data block number, and r is the verification disk number and also the verification data block number, where k =5 and r =4 as an example above, the quantity rounded up after the operation is 1, and the 1 corresponds to the second numerical value.

Further, a needed basic stripe in the group 1 is determined, the group number of the original data blocks corresponding to the stripe 1 is 6, and the calculation formula is as follows:

；

wherein r is the number of the check disks and the number of the check data blocks, and y is the number to be updated of the data blocks to be updated participating in the updating step in the same stripe group.

Determining 1 original data block in the stripe 2 and a group of original data blocks in the stripe 1 which are mutually subjected to XOR operation, and carrying out XOR operation to obtain a group of operated original data blocks so as to obtain 6 groups of operated original data blocks:

；

and keeping the sequence numbers of the 6 groups of original data blocks unchanged, then determining that the check data blocks correspond to parameters in the coding matrix, wherein the parameters are used as row matrixes, the 6 groups of calculated original data blocks are used as column matrixes, and calculating updated check data blocks, taking the updating of p12, p13 and p14 as an example, the first three groups of 6 groups of calculated original data are used for updating p12, p13 and p14, as shown below:

；

；

；

the last three sets of p22, p23, p24 of the 6 sets of post-operation original data blocks are then used for updating. It will be appreciated that after all encoding and updating operations are completed, the following decoding recovery process is performed when an error occurs. In the first case, an error occurs and the disc 1 is in error and needs to be recovered. The historical erasure code storage system needs to take out the disks 2-6, and 25 data blocks in total are taken out for recovery. The scheme recovery method proposed by the application is as follows: firstly, 10 blocks of data of d12-p11 and d32-p31 are taken out, and the recovery operation of d11 and d31 is completed. And simultaneously taking out p12 'and p 32' to finish the operation of d21 and d41, and obtaining all data recovery of the disk 1, wherein in the recovery operation, 12 blocks of data are taken out in total, and the recovery speed is improved by more than one time compared with the recovery speed of a historical erasure code storage system. In the second case, two errors occur, when disc 1 and disc 2 are in error and need to be recovered. The historical erasure code algorithm needs to take out the discs 3-7, and 25 blocks of data in total are taken out for recovery. The scheme recovery method proposed by the application is as follows: firstly, 8 blocks of data of d13-p11 and d33-p31 are taken out, and then 4 blocks of data of d23, p14 ', d43 and p 34' are taken out, so that the recovery operation of d11, d12, d31 and d32 is completed. And taking out 4 blocks of data of p12 ', p 13', p32 'and p 33' to finish all recovery. In the recovery operation, 16 data blocks are taken out in total, and the recovery speed is improved compared with that of a historical erasure code algorithm. In the third situation, the decoding speed is correspondingly increased until the number of errors reaches r, that is, the number of check blocks, all the residual data blocks need to be read to complete recovery operation, and the speed is the same as that of historical erasure at this time. Therefore, in the erasure correction system, the coding mode and the coding multiplexing operation are changed in the coding aspect, the speed is improved in the data recovery aspect compared with the historical erasure correction system on the premise that the reading of coded data is not increased, namely, the speed loss is hardly increased, and the highest limit of recovery is reached, so that the erasure correction system has the same recovery speed as the historical erasure correction system.

Referring to fig. 6, an embodiment of the present application discloses a data encoding apparatus, including:

a matrix determining module 11, configured to determine an encoding matrix based on the preset number of original data blocks, the preset number of check blocks, and a historical erasure code algorithm;

a structure determining module 12, configured to determine a storage erasure correction structure based on the number of the preset original data blocks, the number of the preset check blocks, and the number of the preset stripes; the storage erasure structure comprises a data disc, a check disc and a stripe;

the encoding module 13 is configured to read original data, and perform encoding calculation by using the encoding matrix based on the original data to obtain an original data block and a check data block;

and the updating module 14 is configured to perform stripe grouping based on the preset number of stripes to obtain a plurality of stripe groups, and update the verification data block in the same stripe group by using the original data block to obtain an updated verification data block.

For more specific working processes of the modules, reference may be made to corresponding contents disclosed in the foregoing embodiments, and details are not repeated here.

Further, an electronic device is provided in the embodiments of the present application, and fig. 7 is a block diagram of an electronic device 20 according to an exemplary embodiment, which should not be construed as limiting the scope of the application.

Fig. 7 is a schematic structural diagram of an electronic device 20 according to an embodiment of the present disclosure. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, an input output interface 24, a communication interface 25, and a communication bus 26. Wherein the memory 22 is used for storing a computer program, and the computer program is loaded and executed by the processor 21 to implement the relevant steps of the data encoding method disclosed in any of the foregoing embodiments.

In this embodiment, the power supply 23 is configured to provide a working voltage for each hardware device on the electronic device 20; the communication interface 25 can create a data transmission channel between the electronic device 20 and an external device, and a communication protocol followed by the communication interface is any communication protocol applicable to the technical solution of the present application, and is not specifically limited herein; the input/output interface 24 is configured to obtain external input data or output data to the outside, and a specific interface type thereof may be selected according to specific application requirements, which is not specifically limited herein.

In addition, the storage 22 is used as a carrier for resource storage, and may be a read-only memory, a random access memory, a magnetic disk or an optical disk, and the storage 22 is used as a non-volatile storage that may include a random access memory as a running memory and a storage purpose for an external memory, and the storage resources on the storage include an operating system 221, a computer program 222, and the like, and the storage manner may be a transient storage or a permanent storage.

The operating system 221 is used for managing and controlling each hardware device and the computer program 222 on the electronic device 20 on the source host, and the operating system 221 may be Windows, Unix, Linux, or the like. The computer program 222 may further include a computer program that can be used to perform other specific tasks in addition to the computer program that can be used to perform the data encoding method disclosed in any of the foregoing embodiments and executed by the electronic device 20.

In this embodiment, the input/output interface 24 may specifically include, but is not limited to, a USB interface, a hard disk reading interface, a serial interface, a voice input interface, a fingerprint input interface, and the like.

Further, the embodiment of the application also discloses a computer readable storage medium for storing a computer program; wherein the computer program realizes the data encoding method disclosed in the foregoing when executed by a processor.

For the specific steps of the method, reference may be made to the corresponding contents disclosed in the foregoing embodiments, and details are not repeated here

A computer-readable storage medium as referred to herein includes a Random Access Memory (RAM), a Memory, a Read-Only Memory (ROM), an electrically programmable ROM, an electrically erasable programmable ROM, a register, a hard disk, a magnetic or optical disk, or any other form of storage medium known in the art. Wherein the computer program when executed by a processor implements the aforementioned data encoding method. For the specific steps of the method, reference may be made to the corresponding contents disclosed in the foregoing embodiments, which are not described herein again.

The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the data coding method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the description of the method part.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The steps of an algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above detailed description is provided for a data encoding method, apparatus, device and medium provided by the present invention, and the present application describes the principle and implementation manner of the present invention by using specific examples, and the description of the above embodiments is only used to help understanding the method and core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method of encoding data, comprising:

determining a coding matrix based on the number of preset original data blocks, the number of preset check blocks and a historical erasure correcting code algorithm, and determining a storage erasure correcting structure based on the number of the preset original data blocks, the number of the preset check blocks and the number of preset strips; the storage erasure structure comprises a data disc, a check disc and a stripe;

reading original data, and performing coding calculation by using the coding matrix based on the original data to obtain an original data block and a check data block corresponding to each strip;

2. The data encoding method of claim 1, wherein the updating the parity data block with the original data block in the same stripe group to obtain an updated parity data block comprises:

3. The data encoding method of claim 2, wherein before updating the parity data block with the original data block in the base stripe and the operation stripe of the same stripe group, the method further comprises:

4. The data encoding method of claim 3, wherein the updating the parity data block with the original data block to obtain an updated parity data block comprises:

5. The data encoding method of claim 4, wherein the updating the plurality of check data blocks to be updated with the plurality of sets of the calculated original data blocks to obtain the updated check data blocks respectively comprises:

6. The data encoding method of claim 5, wherein the determining a group of the calculated original data block and one of the check data blocks to be updated, which all have a corresponding relationship, and updating the check data block to be updated by using the group of the calculated original data block to obtain an updated check data block comprises:

7. The data encoding method according to any one of claims 1 to 6, wherein the performing stripe grouping based on the preset number of stripes to obtain a plurality of stripe groups, and updating the parity data block with the original data block in the same stripe group to obtain an updated parity data block comprises:

8. A data encoding apparatus, comprising:

the matrix determining module is used for determining an encoding matrix based on the number of preset original data blocks, the number of preset check blocks and a historical erasure code algorithm;

the encoding module is used for reading original data and performing encoding calculation by using the encoding matrix based on the original data to obtain an original data block and a check data block corresponding to each strip;

9. An electronic device comprising a processor and a memory; wherein the processor, when executing the computer program stored in the memory, implements the data encoding method of any one of claims 1 to 7.

10. A computer-readable storage medium for storing a computer program; wherein the computer program, when executed by a processor, implements the data encoding method of any one of claims 1 to 7.