Background
The domestic Feiteng series processor is based on ARM64 architecture, is fully compatible with an ARMV8 instruction set, and internally realizes NEON expansion instructions. The extended SIMD instruction partially compensates for the weakness of the Feiteng processor in the aspect of CPU frequency, and can be used for accelerating memory access and data calculation of data intensive applications. Common data intensive applications include graphics computing, entertainment audio and video, data verification, and the like.
With the development of cloud computing and the arrival of big data age, people have higher and higher requirements on data storage. The existing storage devices have been greatly improved in terms of storage capacity and storage speed compared with the prior art, but the following storage reliability and safety are increasingly concerned, and once data is lost, catastrophic losses are caused to enterprises and individuals. The advent of RAID technology has solved to some extent the problems encountered in the data storage arts. In particular to RAID6, which simultaneously combines the storage capacity, the storage speed and the storage security, and is widely applied to various industries in the digitizing field.
In order to ensure the safety of data, RAID6 sacrifices two disks in a disk array for parity check, the two check disks respectively adopt different check algorithms, the dual check mechanism supports at most two damaged disks in the disk array, and once one or two damaged disks occur, the dual check can be used for data recovery. If two disks are damaged and the new disk fails, the user data is lost. Therefore, the data recovery speed of RAID6 is crucial, and if the recovery speed is high, the risk of data loss can be reduced, and the user experience can be improved. At present, RAID6 data recovery on a Feiteng platform is processed by adopting a general SISD (single instruction single data) instruction, the SIMD is time-consuming for the RAID6 recovery operation with intensive computation, and NEON expansion instructions carried by a Feiteng processor are introduced to optimize the RAID6 data recovery, so that the data recovery process of RAID6 can be accelerated.
It is known that none of the existing RAID6 data recovery techniques can solve the problem of slow recovery of data on a Feiteng platform. The current implementation adopts a SISD instruction, only one byte of data can be recovered in a single execution, and under the condition that the CPU frequency is not high, the recovery of a disk array with larger capacity can take a longer time.
The invention provides a method for processing RAID6 data bad blocks (application number CN 201410393820.1), which is specifically realized by firstly reading data strips, establishing a bad block information record table and recording bad block transfer information; then inquiring the recording flow, and firstly inquiring whether the strip is recorded in a mapping table or not by the read-write operation; adding a recording flow; inquiring whether read-write errors exist, if so, judging whether the repair can be carried out; and finally, starting a periodic thread in an idle period for scanning the disk and finding errors in time. The method improves the data access efficiency and reduces the data loss probability by transferring the real data block to the check data block with the same stripe, but does not adopt an effective method to accelerate the recovery speed after RAID6 data loss.
The invention provides a data recovery method for RAID6 (application number CN 201510280713.2), which comprises the following steps that S1, a DBR of an NTFS partition is searched in all disks; s2, searching the starting position of the MFT table in the DBR; s3, calculating the array stripe size; s4, calculating an array starting position; s5, judging a verification area and reconstructing an array; s6, recovering the lost data. The invention uses DBR characteristic and MFT characteristic to scan and obtain the whole array structure, to distinguish the data area and check area data accurately, uses RIAD6 common array arrangement mode to match and reorganize data, and can deal with the data recovery of one hard disk loss and two hard disks loss. The whole array structure is fast in analysis, the data reorganization speed is fast, and the data recovery success rate is high. However, the invention has limited application scenarios, needs a disk array to support DBR features and MFT features, and does not consider the SIMD acceleration function provided by a processor during data recovery.
Disclosure of Invention
In view of the above problems, the present invention provides a RAID6 data recovery optimization method based on a fly-by-fly platform, which changes the previous single byte data recovery processing mode by using the NEON technology supported by the fly-by-fly processing, and realizes recovery of a plurality of byte check data in one CPU instruction cycle, so as to accelerate RAID6 data recovery.
In order to solve the technical problems, the invention adopts the following technical scheme: RAID6 data recovery optimization method based on Feiteng platform comprises the following steps:
s1: applying NEON register to make single multi-byte access;
s2: performing recovery factor conversion;
s3: performing single multi-byte double parity check;
s4: and carrying out restored data storage.
Further, step S1 includes the steps of:
s11: loading multi-byte data from the first corrupted data region into register variable a;
s12: the value of register variable a is copied to register variable B.
Further, in step S2, the recovery factor is converted into a pseudo galois field multiplication table, which specifically includes the following steps:
s21: respectively calculating the upper four bits and the lower four bits of the index of the Galois field multiplication table;
s22: loading a plurality of low entries and a plurality of high entries of the pseudo-galois field multiplication table into a register variable C and a register variable D, respectively;
s23: performing conversion between a pseudo Galois field multiplication table and a Galois field multiplication table;
further, step S21 includes the steps of:
s211: shifting register variable B by 4 bits to the right according to the channel, and then performing AND operation on each channel and 0x0 f;
s212: performing an AND operation on each channel of the register variable A and 0x0 f;
further, the number of low table entries and high table entries of the pseudo galois field multiplication table in step S22 is consistent with the number of NEON register channels;
further, step S23 includes the steps of:
s231: the value of each channel of the register variable A is used as the channel index value of the register variable C and is replaced by the value of the corresponding channel of the register variable C;
s232: and replacing the value of each channel of the register variable B as the channel index value of the register variable D with the value of the corresponding channel of the register variable D.
Further, step S3 includes the steps of:
s31: carrying out exclusive OR operation on the register variable A and the register variable B according to the channel, and storing an operation result into the register variable A;
s32: a plurality of channels for loading a plurality of bytes of data from the second data corruption region to the register variable B;
s33: and carrying out exclusive OR operation on the register variable A and the register variable B according to the channel, and storing an operation result into the register variable B.
Further, step S4 includes the steps of:
s41: storing the value of the register variable A into a first data corruption area as data recovered by the first data corruption area;
s42: the value of the register variable B is stored in the second data corruption area as data recovered by the second data corruption area.
Further, the number of channels of the ENON register is 16.
Further, in step S2, the pseudo Galois field multiplication table is converted into a Galois field multiplication table, the entries of the pseudo Galois field multiplication table are in one-to-one correspondence with the entries of the Galois field multiplication table, and there is a mathematical relationship,
for any one element qfmul a in the galois field multiplication table gfmul,
qfmul[a]=vgfmulL[low4(a)]^vgfmulH[hig4(a)];
where subscript a is a natural number between 0 and 256, vgfmul [ ] represents the lower 16 bytes of the pseudo-galois field multiplication table entry, vgfmul [ ] represents the upper 16 bytes of the pseudo-galois field multiplication table entry, hig (a) represents the upper 4 bits of subscript a, and low4 (a) represents the lower 4 bits of subscript a.
The invention has the advantages and positive effects that:
(1) The design of the acceleration algorithm and the realization of autonomous design are high in autonomous controllability;
(2) The originality of the implementation mode is that the NEON characteristic of the Feiteng processor and the mathematical relationship between RAID6 Galois field multiplication tables are fully utilized to realize the parallelism of data recovery;
(3) The method has the advantages that the single byte processing before optimization is widened to 16 byte processing after optimization, and the CPU instruction period occupied by RAID6 data recovery is greatly saved.
Detailed Description
The invention will be further described with reference to the drawings and the specific examples.
Fig. 2 shows a flowchart of a data recovery optimization method according to an embodiment of the present invention, and specifically shows a logic flow of the data recovery optimization method according to this embodiment, where the embodiment relates to a RAID6 data recovery optimization method based on a fly-by-fly platform, and by using a NEON technology supported by the fly-by-fly processing, a previous single byte data recovery processing manner is changed, so as to recover a plurality of byte check data in a CPU instruction cycle, thereby achieving the purpose of accelerating RAID6 data recovery, performing single multi-byte access by using a NEON register, and performing double parity check according to a recovery factor of a plurality of bytes by a single operation, and performing data recovery.
As shown in FIG. 1, FIG. 1 illustrates the recovery of bad disks by a RAID6 recovery algorithm on a Feiteng platform, where RAID6 uses double parity to secure data, the first is a check using an exclusive OR (XOR) algorithm, and the second uses Galois field multiplication. Therefore, RAID6 uses a specific second order Galois field multiplication table (gfmul) whose elements are defined as an 8-bit unsigned array, and each time data is looked up or read into the table, it is done in a single byte. For this reason, in the prior art, only one byte of data can be processed at a time, the stripe of RAID6 is defaulted to 4KB under the fly-by platform in units of memory page size, and therefore for data recovery of one stripe, 4096 operations need to be repeatedly performed, which consumes a large number of CPU instruction cycles.
The Feiteng full-range processor is 64-bit, the register supports 64-bit wide operands, and the register can be widened to 128-bit operands by adding NEON support. However, since the elements of the galois field multiplication table are single byte wide shaped data, we are restricted to access only to single byte sizes because it is not possible to determine which 16 elements in the corresponding galois field multiplication table after the xor calculation at the time of the first exclusive-or algorithm test of 128 bits, if which 16 elements are known, if they are not consecutive, it is not possible to load the calculation at one time. However, since RAID6 provides a second order pseudo galois field multiplication table (vgfmul), its entries are in one-to-one correspondence with entries of the galois field multiplication table and there is a mathematical relationship:
for any one element qfmul a in gfmul,
qfmul[a]=vgfmulL[low4(a)]^vgfmulH[hig4(a)];
where subscript a is a natural number between 0 and 256, the lower 16 bytes of the pseudo-Galois field multiplication entry are represented by vgfmul [ ], the upper 4 bits of subscript a are represented by hig (a), and the lower 4 bits are represented by low4 (a). This conversion from a pseudo-galois field multiplication table to a galois field multiplication table is suitable for parallel processing by the NEON technique. The conversion relation is the basis that RAID6 data recovery can be accelerated by NEON, RAID6 has no method for operating the table items of a plurality of Galois field multiplication tables at one time during data recovery processing, and can operate the table items of a plurality of pseudo Galois field multiplication tables at one time, and indirectly obtain the table items of a plurality of Galois field multiplication tables after calculation.
As shown in FIG. 2, the RAID6 data recovery optimization method based on the Feiteng platform adopts double parity check to ensure the safety of data, and adopts single multi-byte access and single multi-byte calculation to carry out the RAID6 data recovery optimization method.
RAID6 data recovery optimization method based on Feiteng platform comprises the following steps:
s1: the NEON register is used for single multi-byte access, specifically, the NEON register A and the NEON register B are defined, and single multi-byte storage is sequentially carried out, comprising the following steps,
s11: loading multiple bytes of data from the first corrupted data region into register variable a, defining a 16-lane 128-bit NEON register variable a, each lane being 8 bits, i.e., a NEON instruction may operate in bytes, and then loading 16 bytes of data from the first corrupted data region into the 16 lanes of register variable a, where the multiple bytes of the first corrupted data region are a portion of intermediate data calculated for the first corrupted data region using the uncorrupted data region and the check region;
s12: the value of register variable a is copied to register variable B, defining a 128-bit register variable B of 16 lanes, each lane being 8 bits, copying the value of register variable a one copy to B.
S2: performing recovery factor conversion of converting the Galois field multiplication table into a pseudo Galois field multiplication table according to a mathematical relationship between the pseudo Galois field multiplication table (vgfmul) and the Galois field multiplication table (gfmul)
qfmul [ a ] = vgfmul [ low4 (a) ]vgfmul H [ hig (a) ] is transformed, comprising the steps of:
s21: the method for calculating the upper four bits and the fourth bit of the index of the Galois field multiplication table comprises the following specific steps:
s211: the register variable B is shifted to the right by 4 bits according to the channel, then each channel is subjected to AND operation with 0x0f, and finally each channel is actually reserved with the upper four bits of the index of the Galois field multiplication table;
s212: performing an AND operation on each channel of the register variable A and 0x0f, wherein finally each channel is actually reserved with the lower four bits of the index of the Galois field multiplication table;
s22: two 16-channel 128-bit register variables C and D are defined, a plurality of low table entries and a plurality of high table entries of a pseudo Galois field are respectively loaded into the register variables C and D, wherein the numbers of the low table entries and the high table entries of the pseudo Galois field multiplication table are consistent with the NEON channel number, namely the number of the low table entries of the pseudo Galois field multiplication table is 16, and the number of the high table entries of the pseudo Galois field multiplication table is 16.
S23: the conversion between the pseudo Galois field multiplication table and the Galois field multiplication table is performed, which comprises the following steps,
s231: taking the value of each channel of the register variable A as the channel index value of the register variable C, and replacing the value with the value of the corresponding channel of the register variable C;
s232: the value of each channel of the register variable B is used as the channel index value of the register variable D, and is replaced by the value of the corresponding channel of the register variable D.
S3: the specific process of single multi-byte double parity check comprises the following steps,
s31: carrying out exclusive OR operation on the register variable A and the register variable B according to the channel, and storing an operation result into the register variable A;
s32: loading a plurality of bytes of data from the second data corruption area into a plurality of lanes of register variable B, where the number of lanes of register variable B is 16 lanes;
s33: and carrying out exclusive OR operation on the register variable A and the register variable B according to the channel, and storing an operation result into the register variable B.
S4: the method for restoring the data storage comprises the following steps:
s41: storing the value of the register variable A into a first data corruption area as data recovered by the first data corruption area;
s42: the value of the register variable B is stored in the second data corruption area as data recovered by the second data corruption area.
According to the RAID6 data recovery optimization method based on the Feiteng platform, the pseudo Galois field multiplication table is used as a recovery factor, 16 bytes of data can be processed through the widened NEON register once, and a plurality of instruction periods are saved.
In the following, two bad discs are respectively located in the data area and the first re-checking area for specific explanation, and for other bad disc forms, for example, two bad discs are located in the data area, the recovery principle is the same as that of the two bad discs, and the two parity check of RAID6 is used for calculation.
Data is stored in stripes on a RAID6 array, requiring confirmation that the bad disk is in a particular location of the stripe before recovering the data. Assuming that the current stripe has two bad disks, the first one is in the data area and denoted as D, and the second one is in the first re-parity area and denoted as P, RAID6 will calculate a copy of intermediate data for D and P according to the parity algorithm using the undamaged data area and Q parity area and denoted as VD and VP before recovering D and P, and the second re-parity area is denoted as Q herein for convenience of description.
The RAID6 data recovery optimization method based on the Feiteng platform comprises the following steps:
step S1: a 128-bit NEON register variable a defining one 16-lane, each lane being 8-bits, i.e., a NEON instruction can operate on a byte basis, and then load 16 bytes of data from the VD into the 16 lanes of register variable a.
Step S2: a 128-bit register variable B of 16 channels is defined, each channel being 8 bits, the value of register variable a being copied one copy to register variable B.
Step S3: the register variable B is shifted to the right by 4 bits according to the channel, then each channel is subjected to AND operation with 0x0f, and finally each channel is actually reserved with the upper four bits of the index of the Galois field multiplication table.
Step S4: each lane of register variable a is anded with 0x0f, and finally each lane is effectively the lower four bits of the reserved galois field multiplication table index.
Step S5: two 16-channel 128-bit register variables C and D are defined, and the lower 16 entries and the upper 16 entries of the pseudo-galois field multiplication table are loaded into the register variables C and D, respectively.
Step S6: let each channel of register variable a take the value of the channel as the channel index value of register variable C, and then replace it with the value of the corresponding channel of register variable C.
Step S7: let each channel of register variable B take the value of the channel as the channel index value of register variable D and then replace it with the value of the corresponding channel of register variable D.
Step S8: and carrying out exclusive OR operation on the register variable A and the register variable B according to the channel, and storing an operation result into the register variable A.
Step S9: 16 lanes of 16 bytes of data are loaded from P to register variable B.
Step S10: and carrying out exclusive OR operation on the register variable A and the register variable B according to the channel, and storing an operation result into the register variable B.
Step S11: the value of the register variable A is stored in D and used as the data after the recovery of the data area.
Step S12: and storing the value of the register variable B into P to be used as the data recovered by the first re-checking area.
The invention has the advantages and positive effects that: the design of the acceleration algorithm and the realization of autonomous design are high in autonomous controllability; the originality of the implementation mode is that the NEON characteristic of the Feiteng processor and the mathematical relationship between RAID6 Galois field multiplication tables are fully utilized to realize the parallelism of data recovery; the method has the advantages that the single byte processing before optimization is widened to 16 byte processing after optimization, and the CPU instruction period occupied by RAID6 data recovery is greatly saved.
The foregoing describes one embodiment of the present invention in detail, but the description is only a preferred embodiment of the present invention and should not be construed as limiting the scope of the invention. All equivalent changes and modifications within the scope of the present invention are intended to be covered by the present invention.