CN117093397A

CN117093397A - Data output fault-tolerant device and fault-tolerant method in high-performance computing environment

Info

Publication number: CN117093397A
Application number: CN202311066956.7A
Authority: CN
Inventors: 卫薇; 龙玉江; 李洵; 甘润东; 王策; 钟掖; 龙娜; 陈卿; 袁捷; 卢仁猛
Original assignee: Guizhou Power Grid Co Ltd
Current assignee: Guizhou Power Grid Co Ltd
Priority date: 2023-08-23
Filing date: 2023-08-23
Publication date: 2023-11-21

Abstract

The invention discloses a data output fault-tolerant device and a fault-tolerant method in a high-performance computing environment, and relates to the technical field of data output fault tolerance. The invention comprises a storage monitoring module, a cutting module, an output module, a coding module and a checking and repairing module, wherein the cutting module, the output module and the coding module are in communication connection with the storage monitoring module; the splitting module is used for dividing the data written into the storage monitoring module by the client into data blocks; the coding module generates coding blocks by cross block coding of the data blocks through a coding algorithm and forms coding groups; the checking and repairing module is used for checking the coding group and repairing the missing codes through a tree repairing method. The invention is provided with a plurality of copies, can reduce the loss probability of the content data block in the output process, and the content data block is used as a main part of the data, thereby enhancing the protection of the content data, improving the safety of data output and reducing the cost of data fault tolerance.

Description

Data output fault-tolerant device and fault-tolerant method in high-performance computing environment

Technical Field

The invention belongs to the technical field of data output fault tolerance, relates to a data output fault tolerance device in a high-performance computing environment, and further relates to a fault tolerance method of the data output fault tolerance device in the high-performance computing environment.

Background

Along with the increase of the calculation scale, the data volume of parallel output in the high-performance computer also continuously increases, the scale of various nodes, network equipment and storage equipment participating in the data output process also increases, and simultaneously, various part faults locally occurring in unit time also increase. The data fault tolerance refers to a technology capable of effectively realizing data storage and multiplexing and reducing organization information loss under the condition of local faults and overall faults by means of data stored by a computer, and the modern society development has high dependence on information technology and needs to store a large amount of informationized assets formed by data by means of the computer and a network, so that technical research and management are required to be enhanced, and the fault tolerance capability of the network storage data is improved.

At present, the normalized output node fails, and the multi-copy technology is used as a very mature fault-tolerant technology by a plurality of systems, namely, multiple copies which are identical are generated after one data object is copied for a plurality of times, and then the copies are scattered to a plurality of nodes, so that when some of the copies cannot output data due to node failure, other copies can be directly used, but the utilization rate of storage space is greatly reduced; therefore, more and more storage systems use erasure coding techniques to perform fault-tolerant operations on data, and compared with multi-copy techniques, erasure coding techniques can obtain the same or even higher fault-tolerant capability with much lower storage space overhead, but their higher network and disk resource overhead becomes the performance bottleneck of the whole system, which results in higher coding cost of data and lower data repair efficiency; meanwhile, in the data output process, most of damaged data is only a small part, so that all data cannot be displayed on a client, and the output data is wholly output and sent again when the data is repaired, so that more output resources are occupied by the data.

The existing data output fault-tolerant technology in the high-performance computing environment has the problems of high data coding cost and low data restoration efficiency, and meanwhile, the occupied system resources are high during data restoration, so that the system load is large.

Disclosure of Invention

The invention aims to solve the technical problems that: the data output fault-tolerant device and the fault-tolerant method in the high-performance computing environment are provided, and the problems that the data coding cost is high, the data repairing efficiency is low, and the system load is high due to the high system resources occupied during data repairing in the existing data output fault-tolerant technology in the high-performance computing environment are solved.

The technical scheme adopted by the invention is as follows: a data output fault-tolerant device in a high-performance computing environment comprises a storage monitoring module, a slitting module, an output module, a coding module and a checking and repairing module, wherein the slitting module, the output module, the coding module and the checking and repairing module are in communication connection with the storage monitoring module;

and (3) cutting modules: the data used for writing the storage monitoring module into the client is divided into data blocks;

and a coding module: cross block coding is carried out on the data blocks through a coding algorithm to generate coding blocks, and coding groups are formed;

and an output module: the method comprises the steps of outputting a code group stored on a certain node to a client;

and (3) checking and repairing a module: the method is used for checking the coding group and repairing the missing codes through a tree repair method.

Further, the storage monitoring module is used for storing the data written by the client, monitoring the request output frequency, counting the requested frequency of the data in real time, judging whether the data is hot spot data according to the requested frequency of the data, storing the hot spot data in a duplicate mode, and generating a corresponding number of temporary hot spot duplicate according to the heat of the hot spot data; and redundant storage is carried out on the non-hot spot data, and temporary hot spot copies are deleted.

Further, the data are cut according to the data format as a division policy in the cutting module to form three types of data blocks, namely a file header data block, a content data block and a file number data block, wherein the rest data blocks except the content data block containing text data are regarded as redundant data blocks, the number of the redundant data blocks is far more than that of the content data blocks, meanwhile, temporary hot point copies of hot point data are cut, and the content data blocks in the temporary hot point copies are stored as temporary content copy data blocks.

Further, the cross-block encoding adopted in the encoding module divides the divided redundant data blocks, the content data blocks and the temporary content copy data blocks into a plurality of sub-groups, the number of the sub-groups does not exceed the number of the content data blocks, each sub-group generates a check block which is only related to the data blocks in the group, wherein the number of the data blocks contained in each sub-group in the cross-block encoding is the same, and the data blocks contained in each sub-group are partially the same, the same data block is selected from all the content data blocks and the temporary content copy data blocks in the encoding group, each sub-group consists of a plurality of redundant data blocks, at least one content data block and one temporary content copy data block, the number of the cross-content data blocks in each sub-group is determined by the bandwidth loading capacity of the output node and the new node (the higher the bandwidth loading capacity of the output node and the new node is, the number of the cross-content data blocks in each sub-group is greater but the bandwidth loading capacity of the node is not exceeded).

Further, in the above-mentioned check repair module, the output data is checked by the check block, the lost content copy data block is removed, and it is checked whether the content data block is lost, where the check block is a portion except the block obtained by cutting the data in all the code blocks generated by the code module, one code group of the (k, m, k ') -code module includes k redundant data blocks, a sum of the content data block and a temporary content data block copy, collectively referred to as a data block, and m check blocks, where k' is the remaining data block.

A method of fault tolerance of data output in a high performance computing environment, the method comprising the steps of:

s1, a client requests to write file data into a data server, stores the file data into a storage monitoring module, and meanwhile judges whether the file data is hot spot data according to the accessed frequency of the file data for a period of time, and backs up the hot spot data to generate temporary hot spot copies with corresponding quantity;

s2, the client requests to output file data to the data server, and the data server returns a data block position list to inquire a storage node of a file to be output and call the file data;

s3, the splitting module splits the data and the temporary hot point copy according to the data format serving as a division strategy, and finally a redundant file data block, a content data block and a temporary content copy data block are obtained;

s4, the coding module is used for intersecting the data blocks generated by the segmentation in the segmentation module into a plurality of subgroups, each subgroup generates a check block only related to the data blocks in the group, and all subgroups form a complete coding group;

s5, the output module outputs the code group to the client, the check repair module checks the code group, and the lost data is repaired by a tree repair method.

Further, the encoding module in the step S4 encodes the data block by adopting a centralized conversion method, and the encoding method includes the following specific steps:

s4.1, the coding module downloads all data blocks in the subgroup from the nodes stored in each subgroup;

s4.2, calculating the subgroup data blocks to generate check blocks;

and S4.3, transmitting the check blocks to the nodes stored in the corresponding subgroups.

Further, the method for realizing the coding group of one (k, m, k') -coding module in the cross coding comprises the following specific steps:

a. dividing k data blocks into y subgroups overlapping each other, each subgroup containing n data blocks;

b. the check blocks of each subgroup are generated by n data block operations within the subgroup (the check blocks are generated by the encoding module in a matrix operation).

Further, the tree repair method adopted in the step S5 includes the following specific steps:

s5.1, constructing a spanning tree which takes the alternative node as a root and covers all the provided nodes in the checking and repairing module, namely a repairing tree;

s5.2, the leaf node on the repair tree multiplies the coding block by the corresponding repair coefficient, and then sends the generated intermediate coding block to the father node;

s5.3, the internal node of the repair tree receives intermediate coding blocks from all child nodes, combines the intermediate coding blocks with the self-stored coding blocks, and sends the intermediate coding blocks generated after combination to the parent node of the self-stored coding blocks;

s5.4, the root node further combines the intermediate coding blocks received from all the child nodes to obtain a lost coding block;

wherein the lost code block C _n By calculating the remaining code blocks C ₁ 、C ₂ 、C ₃ C _n-1 Linear group of (2)

...

Repairing, wherein the combination relation is as follows: c (C) _n ＝γ ₁ C ₁ +γ ₂ C ₂ +γ ₃ C ₃ +...+γ _n-1 C _n-1 ，γ ₁ 、γ ₂ 、γ ₃ ...γ _n-1 Is a repair coefficient.

Compared with the prior art, the invention has the following effects:

1) The invention carries out fault tolerance on the output data through a new fault tolerance mechanism generated by combining the copy fault tolerance method and the code correction fault tolerance method, thereby solving the respective defects of the copy fault tolerance method and the code correction fault tolerance method; the method comprises the steps of carrying out relevant monitoring on data to be output, judging the data with high requested frequency as hot spot data according to the requested frequency, and backing up the hot spot data, so that the risk of losing the data in frequent output times can be reduced, dividing the data into three types by a data dividing module according to a data format, dividing the divided data into three types roughly, enabling a file header data block and a file number data block to coexist with a content data block as redundant data blocks, enabling the redundant data blocks not to be lost under normal conditions, enabling the content data block serving as a data main body to be used as a data block to be encoded by an encoding module through backup, wherein the redundant data block, the content data block and a content copy data block are used as data blocks together, and the encoding module adopts a cross block encoding method to encode the data blocks, so that after part of the data blocks are lost, only the data blocks in a cross subgroup are needed to be repaired, and a large number of data blocks are repaired in a group, so that the repairing cost is reduced, and the content data blocks and the content copy data blocks serving as the cross copy data blocks are used as cross copy data blocks, so that the risk of losing the content data blocks is quite reduced;

2) In the invention, the output data is checked through the check repair module, in the checking process, the lost content copy data block is not considered, only the content data block and the redundant data block are checked, if the lost data block is found to be repaired immediately, the workload of data repair is reduced, the content data block is subjected to multiple copies, the loss probability of the content data block can be reduced in the output process, and the content data block is used as a main part of the data, so that the protection of the content data is enhanced, the safety of data output is improved, and meanwhile, the cost of data fault tolerance is reduced.

Drawings

FIG. 1 is a system block diagram of a data output fault tolerance device in a high performance computing environment;

FIG. 2 is a flow chart of a method of fault tolerance of data output in a high performance computing environment.

Detailed Description

The invention will be further described with reference to specific examples.

Example 1: referring to fig. 1, the present invention is a fault-tolerant device for data output in a high-performance computing environment, which includes a storage monitoring module, a splitting module, an output module, an encoding module, and a checking and repairing module, wherein the splitting module, the output module, the encoding module and the checking and repairing module are in communication connection with the storage monitoring module;

The storage monitoring module is used for storing data written in by the client, monitoring the request output frequency, counting the requested frequency of the data in real time, judging whether the data is hot spot data according to the requested frequency of the data, storing copies of the hot spot data, and generating a corresponding number of temporary hot spot copies according to the heat of the hot spot data; and redundant storage is carried out on the non-hot spot data, and temporary hot spot copies are deleted.

The data are cut according to the data format as a division strategy in the cutting module to form three types of data blocks, namely a file header data block, a content data block and a file number data block, wherein the rest data blocks except the content data block containing text data are regarded as redundant data blocks, the number of the redundant data blocks is far more than that of the content data blocks, temporary hot point copies of hot point data are cut at the same time, and the content data blocks in the temporary hot point copies are stored as temporary content copy data blocks.

The cross block coding adopted in the coding module is divided into a plurality of sub-groups by dividing the divided redundant data blocks, the content data blocks and the temporary content copy data blocks, wherein the number of the sub-groups is not more than the number of the content data blocks, each sub-group generates a check block which is only related to the data blocks in the group, the number of the data blocks contained in each sub-group in the cross block coding is the same, and part of the data blocks contained in each group are the same, the same data blocks are selected from all the content data blocks and the temporary content copy data blocks in the coding group, each sub-group consists of a plurality of redundant data blocks, at least one content data block and one temporary content copy data block, and the number of the cross content data blocks in each sub-group is determined by the bandwidth load capacity of an output node and a new node.

The output data is checked through check blocks in the check repair module, the check blocks are parts except for blocks obtained by cutting data in all the code blocks generated by the code module, the code group of one (k, m, k') -code module comprises k redundant data blocks, a content data block and the sum of temporary content data block copies, the sum is called as a data block, m check blocks are called as residual data blocks.

Example 2: referring to fig. 2, a fault tolerance method for data output in a high performance computing environment includes the steps of:

the coding module adopts a centralized conversion method to code the data block, and comprises the following specific steps:

s4.2, calculating the subgroup data blocks to generate check blocks;

s4.3, transmitting the check blocks to the nodes stored in the corresponding subgroups;

s5, the output module outputs the code group to the client, the check repair module checks the code group, and the lost data is repaired by a tree repair method;

wherein, the coding group of one (k, m, k') -coding module in the cross coding comprises the following specific steps:

b. the check blocks of each subgroup are generated by n data block operations in the subgroup;

the tree repair method comprises the following specific steps:

...

In the present invention, for one (k, m, k') -erasure code, k data block constituent vectors are recorded, k+m code block constituent vectors are generated, and code block C is encoded _i The generation of (a) can be expressed as its encoding coefficient vector alpha _i ＝(α _i,j )＝(α _i,1 α _i,2 ...α _i,k ) The product of this and the vector D, _1≤i≤k+m j is more than or equal to 1 and less than or equal to k, as shown in formula (1):

wherein alpha is _i,j Is of size 2 ^q Finite field F of (2) _q The elements above, all operations are also in finite field F _q And (3) performing the process. Matrix g= (α) ₁ α ₂ ...α _k+m ) ^T Is a coding matrix (or called a generating matrix), for a (k, m, k ') -erasure code, the coding of which proves that a left inverse matrix exists in any k' x k submatrix, the upper part of the coding matrix is generally a k x k identity matrix, the lower part of the coding matrix is an m x k matrix, so that the generated m x k coding blocks contain original k data blocks, the data reading is convenient, and the erasure code is called a system code, and the erasure codes used for a storage system are basically erasure codes.

The linear erasure code is decoded by matrix operation without loss of generality, assuming that the block C is to be encoded ₁ C ₂ ...C _k′ To decode the original k data blocks, let G' denote (alpha ₁ α ₂ ...α _k′ ) ^T Let C' represent (C ₁ C ₂ ...C _k′ ) ^T G ' ×d=c ' can be obtained according to formula (1), assuming G ' _L ^-1 Is the left inverse of G ', and the equation G ' x d=c ' is left multiplied by G ' on both sides ' _L ^-1 The original k data blocks can be obtained as follows:

in the invention, the reliability of the cloud storage site is Rstorage, the reliability of the storage node is Rchunk, and the data redundancy factor is p. The reliability model of the cloud storage site consisting of a large number of storage nodes is

R _storage ＝1-(1-R _chunk ) ^p (2)

The reliability of a cloud storage site adopting a fault tolerance technology based on an RS encoding and decoding algorithm is RRS, the reliability of a storage node is Rchunk, the number of file blocks is k, the number of file encoding data blocks is n, and the redundancy multiple is s (s=n/k). The reliability model of the cloud storage site adopting the fault tolerance technology based on the RS codec algorithm is:

the reliability of the cloud storage site combining two fault tolerance mechanisms is RcStor, and the reliability model of RcStor can be obtained by taking the formula (2) into the formula (3):

likewise, taking the reliability rchuk=0.9, the number of block copies p=2, the number of file blocks k=1 to 20 of a single storage node, the file encoded data blocks are encoded according to the encoding redundancy s=1 to 4. Knowing that the number of copies stored by the block copy is 2, the relationship between the cloud storage site combined by two fault tolerance mechanisms and the coding redundancy and the number of file blocks can be seen. When the coding redundancy is equal to 1 (i.e., not coded), rcston decreases as the number of file blocks increases; when the coding redundancy is greater than or equal to 2, rcston increases with the number of file blocks, and when the number of file blocks reaches more than 5 blocks, rcston is greater than 0.999. However, in practical applications, when the coding redundancy is greater than 1, the rcston increases with the increase of the file block number, and the effect that the rcston increases with the increase of the file block number is more obvious with the increase of the coding redundancy. Therefore, the reliability of the cloud storage site, the utilization rate of storage space resources and the like are synthesized, the number of copies stored by the block copies is 2, the coding redundancy is 1.25, the number of file blocks is 8, namely the file coding data blocks are 10, and the reliability of the cloud storage site can reach 0.9999.

The foregoing is merely illustrative of the present invention, and the scope of the present invention is not limited thereto, and any person skilled in the art can easily think about variations or substitutions within the scope of the present invention, and therefore, the scope of the present invention shall be defined by the scope of the appended claims.

Claims

1. A data output fault tolerance apparatus in a high performance computing environment, characterized by: the system comprises a storage monitoring module, a splitting module, an output module, a coding module and a checking and repairing module, wherein the splitting module, the output module and the coding module are in communication connection with the storage monitoring module;

2. The fault-tolerant device for data output in a high-performance computing environment according to claim 1, wherein the storage monitoring module is configured to store data written in by a client, monitor a request output frequency, count a requested frequency of the data in real time, determine whether the data is hot spot data according to the requested frequency, store copies of the hot spot data, and generate a corresponding number of temporary hot spot copies according to a heat degree of the hot spot data; and redundant storage is carried out on the non-hot spot data, and temporary hot spot copies are deleted.

3. The device of claim 2, wherein the splitting module splits the data according to a data format as a partitioning policy to form three types of data blocks, namely a header data block, a content data block and a file number data block, and the remaining data blocks except the content data block containing text data are regarded as redundant data blocks, wherein the number of redundant data blocks is far greater than the number of content data blocks, and simultaneously splits temporary hot copies of hot data, and stores the content data blocks in the temporary hot copies as temporary content copy data blocks.

4. A data output fault tolerance arrangement in a high performance computing environment as claimed in claim 3, wherein the cross-block encoding employed in the encoding module is divided into sub-groups by dividing the divided redundant data blocks, content data blocks and temporary content copy data blocks into a number of sub-groups, the number of sub-groups not exceeding the number of content data blocks, each sub-group yielding a check block associated with only the data blocks within the group, wherein the number of data blocks included in each sub-group in the cross-block encoding is the same and a portion of the data blocks included in each group is the same, the same data block being selected from all of the content data blocks and temporary content copy data blocks in the encoding group, each sub-group comprising a number of redundant data blocks, at least one content data block and a temporary content copy data block, the number of cross content data blocks in each sub-group being determined by the output node and the bandwidth load capacity of the new node.

5. The fault-tolerant apparatus for data output in a high performance computing environment according to claim 4, wherein the check repair module performs a check on the output data by using a check block, excludes a lost content copy data block, and checks whether there is a loss of the content data block, the check block is a portion of all the code blocks generated by the code module except for a block obtained by cutting the data, and a code group of (k, m, k ') -code modules includes k redundant data blocks, a sum of content data blocks and temporary content data block copies, collectively referred to as data blocks, m check blocks, and k' is a remaining data block.

6. A data output fault tolerance method in a high performance computing environment, applied to a data output fault tolerance device in a high performance computing environment according to any one of claims 1 to 5, comprising the steps of:

s5, the output module outputs the coding group to the client, the verification and repair module verifies the coding group, eliminates the lost content copy data block, verifies whether the content data block is lost or not, and repairs the lost data through a tree repair method.

7. The fault-tolerant method for data output in a high-performance computing environment according to claim 6, wherein the encoding module encodes the data blocks by a centralized conversion method in step S4, the encoding method comprising the following specific steps:

s4.2, calculating the data blocks of the subgroups to generate check blocks;

8. The method for fault tolerance of data output in a high performance computing environment according to claim 7, wherein the method for implementing the code group of one (k, m, k') -coding module in the cross-coding comprises the following specific steps:

b. the check blocks of each subgroup are generated by n data block operations within the subgroup.

9. The fault-tolerant method for data output in a high-performance computing environment according to claim 8, wherein the tree repair method adopted in step S5 comprises the following specific steps:

wherein the lost code block C _n By calculating the remaining code blocks C ₁ 、C ₂ 、C ₃ C _n-1 Is a linear group.