CN116610645B

CN116610645B - Data distributed storage method and system based on heterogeneous regenerated code conversion

Info

Publication number: CN116610645B
Application number: CN202310869214.1A
Authority: CN
Inventors: 张帆
Original assignee: Shandong Management University
Current assignee: Shandong Management University
Priority date: 2023-07-17
Filing date: 2023-07-17
Publication date: 2023-12-05
Anticipated expiration: 2043-07-17
Also published as: CN116610645A

Abstract

The present disclosure provides a data distributed storage method and system based on heterogeneous regenerated code transformation, and relates to the technical field of data distributed storage, including obtaining a data file with a set size; encoding the data file with a regeneration code modified based on PM-RC transformation and storing across nodes in a heterogeneous DSS comprising a plurality of storage nodes; in the heterogeneous DSS, a PM-based regenerated code structure is transformed into a group of new regenerated code structures HCT-RC by utilizing a heterogeneous coding transformation strategy, and a codeword matrix of the regenerated code structures HCT-RC is an irregular matrix; the heterogeneous transcoding strategy is to rearrange a plurality of data symbols of a regular codeword matrix of the PM-based on-reproduction code into an irregular matrix in a certain order, thereby converting codewords of the PM-based on-reproduction code into codewords of HCT-RC. The present disclosure reduces the additional workload to the system of repairing a failed node.

Description

Data distributed storage method and system based on heterogeneous regenerated code conversion

Technical Field

The disclosure relates to the technical field of data distributed storage, in particular to a data distributed storage method and system based on heterogeneous regenerative code conversion.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

DSS (distributed storage system) is one of the mainstream schemes for solving mass data storage at home and abroad, such as GFS, (Google File System, distributed file system), in which data files are stored in a plurality of storage nodes in a distributed manner and are connected as a whole by a network to provide storage services to the outside. As a redundancy technique for ensuring availability and reliability of DSS, the regenerated code exhibits high efficiency in both node storage and repair bandwidth, and research and application of the regenerated code in DSS has attracted a great deal of research attention. As two special classes of explicit regeneration code structures, PM-MBR (PM-minimum bandwidth regeneration code) and PM-MSR (PM-minimum storage regeneration code) constructed using a PM (product matrix) framework provide minimum repair bandwidth and minimum node storage, respectively, for isomorphic DSS. An homogeneous DSS means that all nodes in the system have the same characteristics, e.g. the same node storage and repair bandwidth. Unlike an isomorphic DSS, a heterogeneous DSS contains storage nodes with different characteristics, i.e., nodes in the system have different storage capacities and repair bandwidths. The flexible characteristic brings wider application space to heterogeneous systems, such as P2P (peer-to-peer) cloud storage and Internet caching systems for video on demand, which both adopt heterogeneous DSSs.

In recent years, the design of a regeneration code suitable for heterogeneous DSS has become a research hotspot and has achieved a series of research results. For example, for heterogeneous DSS with different node storage capacities, some scholars provide an explicit regenerated code construction scheme by utilizing the Combinatorial Designs (combinatorial design) approach; some researchers design a flexible regeneration code structure capable of adjusting the node storage capacity based on mathematical methods such as Zigzag codes, permutation matrixes and the like; existing regenerated code schemes (such as PM-MBR and PM-MSR) constructed using PM framework are only applicable to isomorphic DSS and cannot be applied to more flexible heterogeneous scenarios.

Disclosure of Invention

The present disclosure provides a data distributed storage method and system based on heterogeneous regeneration code transformation, and provides a simple and effective coding transformation principle based on an irregular matrix, which can obtain a new regeneration code structure applicable to heterogeneous DSS through coding transformation of PM-RC (regeneration code structure based on PM, including PM-MBR and PM-MSR) codes in isomorphic DSS, named HCT-RC (regeneration code structure based on heterogeneous coding transformation), and has the properties of data reconstruction and data restoration, thereby ensuring the availability and reliability of the system.

According to some embodiments, the present disclosure employs the following technical solutions:

a data distributed storage method based on heterogeneous regenerative code transformation comprises the following steps:

acquiring a data file with a set size;

encoding the data file with a regeneration code modified based on PM-RC transformation and storing across nodes in a heterogeneous DSS comprising a plurality of storage nodes; the data files stored across the nodes can be subjected to data reconstruction and data repair;

in the heterogeneous DSS, a PM-RC (physical layer-to-physical layer) based on a PM regeneration code structure is transformed into a group of new regeneration code structures HCT-RC by utilizing a heterogeneous coding transformation strategy (HCT), and a codeword matrix of the regeneration code structures HCT-RC is an irregular matrix;

the heterogeneous transcoding strategy is: the plurality of data symbols of the regular codeword matrix of the PM-RC are rearranged into the irregular matrix in a set order, thereby converting the codeword of the PM-RC into a codeword of the HCT-RC.

a heterogeneous regenerative code conversion based data distributed storage system comprising:

the data acquisition module is used for acquiring the data file with the set size;

a data storage module for encoding the data file with a regeneration code modified based on PM-RC conversion and storing the encoded data file across nodes in a heterogeneous DSS comprising a plurality of storage nodes; the data files stored across the nodes can be subjected to data reconstruction and data repair;

in the heterogeneous DSS, a heterogeneous code transformation strategy (HCT) is utilized to transform PM-RC of a regeneration code structure based on PM into a group of new regeneration code structure HCT-RC, and a codeword matrix of the regeneration code structure HCT-RC is an irregular matrix;

Compared with the prior art, the beneficial effects of the present disclosure are:

according to the data distributed storage method based on heterogeneous regenerated code transformation, a simple and effective code transformation principle is provided based on an irregular matrix, PM-RC codes in isomorphic DSSs can be subjected to code transformation to obtain a new regenerated code structure suitable for the heterogeneous DSSs, and the new regenerated code structure is named as HCT-RC. The regenerated code structure provided by the disclosure has the properties of data reconstruction and data restoration, so that the usability and reliability of the system are ensured. In addition, for a super node scene in the heterogeneous DSS, compared with the existing PM-RC scheme suitable for the isomorphic DSS, the HCT-RC suitable for the super node scene has smaller Repair Locality (Repair regional degree, namely the number of help nodes required for repairing one fault node) under the condition of the same total storage consumption and Repair bandwidth consumption, so that the extra workload brought to the system by repairing the fault node is reduced.

The technical scheme provided by the disclosure provides a new thought and a new solution for the design of the regenerated codes in the heterogeneous DSS, and the application of the power-assisted regenerated code technology in the DSS is developed.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate and explain the exemplary embodiments of the disclosure and together with the description serve to explain the disclosure, and do not constitute an undue limitation on the disclosure.

Fig. 1 is a schematic diagram of a transformation process of the regenerated code structure transformation strategy of the present disclosure.

Detailed Description

The disclosure is further described below with reference to the drawings and examples.

It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the present disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments in accordance with the present disclosure. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

Example 1

In one embodiment of the present disclosure, a data distributed storage method based on heterogeneous regenerative code conversion is provided, including:

step one: acquiring a data file with a set size;

step two: encoding the data file with a regeneration code modified based on PM-RC transformation and storing across nodes in a heterogeneous DSS comprising a plurality of storage nodes; the data files stored across the nodes can be subjected to data reconstruction and data repair;

Furthermore, the HCT strategy is applied to the super node scene (a special heterogeneous DSS), so that the HCT-RC applicable to the super node scene can be obtained, and the performance of the HCT-RC is superior to that of the existing PM-RC.

As an embodiment, the implementation process of the data distributed storage method based on heterogeneous regenerative code transformation is as follows:

step 1: when an existing DSS-based distributed storage system model encodes a file with a set size by using a reproduction code and stores the file across nodes in a DSS containing a plurality of storage nodes, the existing DSS-based distributed storage system model generally comprises:

in particular, a data file of size F symbols is obtained, a regenerative code technique applied in DSS encodes and stores the data file of size F symbols across nodes in DSS network, a possible (n, k, d, α, β) regenerative code is capable of storing a data file of size F in a DSS containing n storage nodes, each node storing α symbols from a finite field F of size q _q . DSS requires that a legitimate user (DC) be able to connect k out of n nodes and download data to recover the original file, a process called data reconstruction.

Meanwhile, when one node fails, a new replacement node is allowed to connect d nodes (called help nodes) among the remaining (n-l) surviving nodes, and beta data is downloaded from each help node to repair the failed nodeAt this point, this process is called data repair. Among these, data reconstruction and data repair are two important attributes that a viable regenerated code structure must possess. Where α and β are node storage and repair bandwidths, respectively. Each node in the homogeneous DSS has the same node storage and repair bandwidth. Unlike the isomorphic DSS, a generic heterogeneous DSS contains h nodes, denoted V ₁ ，...，V _h Their storage capacity and repair bandwidth are unequal, denoted as alpha respectively ₁ ，...，α _h (assumption of lossless generality. Alpha _max ＝α ₁ ≥α ₂ ≥...≥α _h-1 ≥α _h ) And beta ₁ ，...，β _h 。

To ensure data reconstruction and data repair, a method suitable for heterogeneous DSS (h, k, d, alpha _i ，β _i ) The regeneration codes (i is more than or equal to 1 and less than or equal to h, k is more than 1, and d is less than h) are required to satisfy the following conditions: (1) Legal users (DC) can connect k nodes in the h nodes and download data to restore the original file; (2) When node V _i In case of failure, a new replacement node may be obtained by the slave help node V _j (1. Ltoreq.j. Noteq.i.ltoreq.h) download β _j (≤α _j ) Data symbols to complete data repair. For the fault node V _i For example, the value of Repair locality is d, meaning that d helper nodes are required to repair V _i Is a data of (a) a data of (b).

As a particular heterogeneous DSS, the present disclosure further contemplates a supernode model comprising h _s (1≤h _s D) super nodes, each super node having a larger storage capacity alpha _s And higher repair bandwidth beta _s . Residual (h-h _s ) Each common node has the same storage capacity and repair bandwidth, denoted as α respectively ₀ And beta ₀ And satisfies the following relationship:

α _s ＝2α ₀ ， _s β＝2β ₀ . (1)

obviously, the supernode scenario is heterogeneous. One of the simplest cases of this scenario is h _s =1, i.e. there is only one supernode in the scene. One practical implementation of this situationThe application case is a P2P backup system, where the super node may be a server with higher service capabilities than other peer nodes.

Step 2: the present disclosure designs a Heterogeneous Code Transformation (HCT) strategy capable of transforming PM-RC in an isomorphic DSS into a new regenerated code structure HCT-RC, making it applicable to heterogeneous DSSs;

specifically, given a PM-RC, its codeword can be represented by a matrix C of size (nxα), where n and α represent the number of nodes and the node storage capacity, respectively, in the original isomorphic scene. The ith row ci of C represents a data stored in node U _i Alpha data symbols (i.e., encoded symbols) on (1.ltoreq.i.ltoreq.n) and element c _ij Representing node U _i Is the j-th code symbol of (c). The present disclosure proposes a novel heterogeneous transcoding (HCT) strategy that transforms PM-RC into a set of new regenerated code structures, named HCT-RC, whose codeword matrixIs an irregular matrix.

The heterogeneous transcoding (HCT) strategy specifically includes:

s1: for the general heterogeneous DSS case

Consider first a general rule comprising h nodes V _i (1.ltoreq.i.ltoreq.h) heterogeneous DSS for positive integer m _i Satisfy the following requirementsAnd m is _h =1, if the system parameters satisfy:

α _i ＝m _i α,1≤i≤h (2)

by rearranging the nα data symbols in codeword matrix C to the irregular matrix in a left-to-right, top-to-bottom orderThereby converting the codewords of the PM-RC into codewords of the HCT-RC as shown in fig. 1.

In this way the first and second light sources,comprises h rows, each row respectively comprising alpha ₁ ，...，α _h And a symbol. Matrix->I < th > row->The representation being stored at node V _i Alpha of (a) _i The data symbols are given by:

to guarantee data repair properties, node V in HCT-RC _i Is the repair bandwidth beta of (2) _i The following must be satisfied:

β _i ＝m _i β,1≤i≤h (4)

where β represents the repair bandwidth of each node in the PM-RC. Thus, when conditional expressions (2) and (4) are satisfied, PM-RC in the homogeneous DSS can be converted into HCT-RC encoding suitable for general heterogeneous scenarios. The data reconstruction and data repair properties of HCT-RC are given by the following two theorem.

Theorem 1 (data reconstruction property): let omega _k Represents any subset of the set {1,2,., h } and its radix is |Ω } _k |=k. HCT-RC coding is used as long as:

then a legitimate user (DC) can be assembled via the connection set V _i ,i∈Ω _k K nodes in } to reconstruct the original data file, where k _p Representing the number of nodes that need to be connected to reconstruct the original file in PM-RC encoding.

And (3) proving: from equation (3), it can be seen that the codeword vectorThe token is stored at node V _i Data of (1) consisting of m _i (1.ltoreq.i.ltoreq.h) subvectors and each subvector +.>Corresponding to the original node stored in the PM-RCIs a data set of the data set. This means that it is stored at node V _i Is composed of the data stored in m _i Original nodesIs composed of data in (a). Thus, from the set { V } in HCT-RC _i ,i∈Ω _k Connecting k nodes to download stored data is equivalent to being aggregated in PM-RCMiddle connection->The original nodes download the stored data. Data reconstruction properties based on PM-RC, i.e. by concatenating k _p The original file with the size of F can be reconstructed by each node, and the condition given by the formula (5) in the HCT-RC can be simply satisfied without difficulty, namely { V } can be connected _i ,i∈Ω _k K nodes in the file to reconstruct the original data file of size F.

Theorem 2 (data repair property): let omega _d Represents any subset of the set {1,2,., h } and its radix is |Ω } _d |=d. Using HCT-RC coding, when node V _f (1. Ltoreq.f.ltoreq.h) fail-over, an alternate node may be configured to perform the following operation m _f Secondary to repair lost data: at set omega _d ＝{V _i ,i∈Ω _d Connect d help nodes in i +.f }, and from each help node V _i I pieces of repair data are downloaded. The precondition for completing data repair is:

wherein d is _p Representing PM-RC coded repairability (i.e., the number of helping nodes needed to repair a failed node).

And (3) proving: from equation (5), codeword vectorThe tokens are stored at the faulty node V _f Data of (1) consisting of m _f Sub-vectors are composed and each sub-vector +.>Corresponding to the original node stored in the PM-RCIs a data set of the data set. Thus, the fault node V is repaired in the HCT-RC _f Is equivalent to repairing m in PM-RC _f Original node->Here, attention is first paid to repair +.>Is the first subvector +.>According to equation (4), the set ω _d ＝{V _i ,i∈Ω _d Each help node V in i +.f _i Can provide beta _i ＝m _i Beta repair data symbols and->The individual repair data symbols are linearly independent. Thus, if the condition of expression (6) is satisfied, the sum ω is set _d The repair data provided by d help nodes in (a) is +.>On the other hand, the data repair property of PM-RC encoding indicates d _p The beta repair data symbols may repair any failed node in the PM-RC. Thus, subvector +.>Can be repaired. Similarly, the same operation procedure can be independently performed to repair the failed node V _f The remainder (m) _f -1) subvectors.

Notably, parameters in PM-RC (including k _p And d _p ) Is known and since for any i e {1, 2..h } has m _i 1. Gtoreq.therefore, a positive integer k (.ltoreq.k) must be present in the HCT-RC encoding _p ) And d is less than or equal to d _p ) So that the formulas (5) and (6) are established.

S2: for super node scene situations

The method applies the HCT coding transformation strategy to heterogeneous super node scenes, and obtains a special HCT-RC coding structure suitable for a super node model. Let nodeIs h _s The storage capacity and the repair bandwidth of the super node and the common node are respectively alpha ₀ ＝α，β ₀ =β. According to formula (1), super node V _i (1≤i≤h _s ) And a common node V _j (h _s +1.ltoreq.j.ltoreq.h) may be given by:

α _i ＝2α，α _j ＝α；β _i ＝2β，β _j ＝β， (7)

the super node and the normal node need to satisfy the conversion conditions of the formulas (2) and (4), so there are:

m _i ＝2，m _j ＝1；h＝n-h _s . (8)

based on HCT strategy, HCT-RC coded codeword matrix under super node modelGiven by the formula:

from FIG. 1 and equation (9), it can be seen thatI (1 is less than or equal to i is less than or equal to h) _s ) The rows are composed of the (2 i-1) th row and the (2 i) th row of the C matrix,/->Is (h) _s +1.ltoreq.j.ltoreq.h) corresponds to (h) of C _s +j) row. This means that in HCT-RC encoding, a supernode V _j Corresponds to node U in PM-RC encoding _2i-1 And U _2i Whereas the normal node V in HCT-RC encoding _j Corresponds to node +.>Thus, the HCT-RC code proposed herein can be applied to a code having h _s In the scenario of individual supernodes.

Performance analysis

The performance of the HCT-RC code is analyzed, and performance comparison of the HCT-RC code and the PM-RC code in three aspects of Repair regional, storage consumption and Repair bandwidth consumption is given for the DSS of the super node scene.

Storage consumption

Obviously, the PM-RC has a total storage capacity of nα, i.e. a data file of size F is encoded and stored on n nodes and each node stores α code symbols. Similarly, under HCT-RC encoding in the super node scenario, a data file of size F is stored on h nodes, one for each node V _i Store alpha _i (1.ltoreq.i.ltoreq.h) code symbols. Thus, it is not difficult to obtain the consumption required for HCT-RC encoding based on the formulas (7) and (8)Is provided.

Repair locality and repair bandwidth consumption

In PM-RC coding, the repair bandwidth brought by repairing any fault node is gamma _p ＝d _p Beta. According to theorem 2, as long as the condition is satisfiedThen the failed node V is repaired in the HCT-RC _f The repair bandwidth of (1.ltoreq.f.ltoreq.h) is:

using HCT-RC coding in super node scenarios, according to equation (8), one can arrange

d＝d _p -d _s (11)

To realizeWherein d is _s ＝|{1,2,...,h _s }∩Ω _d I represents the node V participating in repairing the fault _f (1. Ltoreq.f. Ltoreq.h).

In particular, to repair a failed supernode, a new replacement node needs to connect the remaining d _s ＝h _s -1 supernode and download 2 beta data symbols from each node and concatenate d _p -1d _s And each common node downloads beta data symbols from each node. On the other hand, if a normal node fails, the replacement node needs to be connected with d _s ＝h _s Super nodes are connected with d _p -2d _s And a common node completes data repair. In this way, the expression (11) can be satisfied in many cases. Thus, the HCT-RC encoded Repair localisation (i.e. d) is less than the PM-RC encoded Repair localisation (i.e. d) _p ). Such performance advantages are beneficial for DSS because of data modificationThe complex session places a burden on the helping node.

By the formulas (10) and (11), there is gamma _f ＝m _f d _p β＝m _f γ _p . Based on the formula (8), the repairing bandwidth required for repairing one super node is 2 gamma _p And the repair bandwidth required for repairing a common node is gamma _p . Assuming that the PM-RC and HCT-RC codes have the same average node failure rate over a period of time λ, denoted as F _λ . Then the total repair bandwidth consumption from using PM-RC encoding within λ is B _PM ＝nF _λ γ _p . This means that in total nF is required over a period of time λ in order to maintain the reliability of DSS _λ γ _p And repair data symbols. Similarly, according to equations (8), (10), (11), in the supernode DSS, the total repair bandwidth over a period of time λ after using HCT-RC encoding can be obtained by:

thus, there is B _PM ＝B _HCT . It is noted that in practical application systems, the failure rate of the super node is lower than that of the common node. Thus, the total repair bandwidth of the HCT-RC is lower than the results of the above analysis in a practical system.

The performance comparisons for PM-RC and HCT-RC are summarized in Table 1. From the above analysis, the HCT-RC codes proposed by the present disclosure have smaller Repair localities in heterogeneous super node scenarios than existing PM-RC codes. Meanwhile, the heterogeneous code transformation strategy provided by the disclosure provides a new method and a new idea for the design of the regenerated code based on the PM theory.

TABLE 1 comparison of PM-RC versus HCT-RC Performance

Example 2

In one embodiment of the present disclosure, there is provided a data distributed storage system based on heterogeneous regenerative transcoding, including:

The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the specific embodiments of the present disclosure have been described above with reference to the drawings, it should be understood that the present disclosure is not limited to the embodiments, and that various modifications and changes can be made by one skilled in the art without inventive effort on the basis of the technical solutions of the present disclosure while remaining within the scope of the present disclosure.

Claims

1. The data distributed storage method based on heterogeneous regenerated code transformation is characterized by comprising the following steps of:

acquiring a data file with a set size;

in the heterogeneous DSS, a PM-RC (particulate matter-based) regeneration code structure is transformed into a group of new regeneration code structure HCT-RC by utilizing a heterogeneous coding transformation strategy, and a codeword matrix of the regeneration code structure HCT-RC is an irregular matrix;

the heterogeneous transcoding strategy is: rearranging a plurality of data symbols of a regular codeword matrix of PM-RC into an irregular matrix according to a set order, thereby converting the codeword of PM-RC into a codeword of HCT-RC; the heterogeneous transcoding strategy specifically includes: consider the inclusion ofhPersonal nodeIf the system parameters meet the conditions, by dividing the code word rule matrix C into two different code wordsThe data symbols are rearranged in a left to right, top to bottom order into an irregular matrix +.>Thereby converting PM-RC code words into HCT-RC code words, and using heterogeneous coding transformation strategy for super node heterogeneous DSIn S, obtaining an HCT-RC coding structure suitable for super node heterogeneous DSS;

s1: for the general heterogeneous DSS case

Consider first a general inclusionhPersonal nodeIs a positive integer +.>Satisfy the following requirementsAnd->If the system parameters meet the following conditions:

by combining the code words in matrix CThe data symbols are rearranged in a left to right, top to bottom order into an irregular matrix +.>Thereby converting the codeword of the PM-RC into a codeword of the HCT-RC;

in this way the first and second light sources,Includedhlines, each line containing ∈ ->A number of symbols; matrix->Is the first of (2)iGo->The representation is stored at node->Is->The data symbols are given by:

to guarantee data repair properties, nodes in HCT-RCRepair Bandwidth->The following must be satisfied:

wherein the method comprises the steps ofRepresenting a repair bandwidth for each node in the PM-RC;

when the condition is satisfiedAnd->When the PM-RC in the isomorphic DSS is converted into HCT-RC coding suitable for a general heterogeneous scene;

s2: for super node scene situations

Applying the code transformation strategy HCT to a heterogeneous super node scene to obtain a special HCT-RC coding structure suitable for a super node model; at the position ofIn HCT-RC coding, a super nodeCorresponds to node +.>Andwhereas the normal node in HCT-RC coding is +.>Corresponds to node +.>The method comprises the steps of carrying out a first treatment on the surface of the The proposed HCT-RC coding is applied to a coding with +.>In the scenario of individual supernodes.

2. The heterogeneous regenerative code conversion based data distributed storage method of claim 1, wherein acquiring the data file of the set size comprises: obtain the size ofFEncoding a data file of symbols and storing the data file across nodes in a DSS comprising a plurality of storage nodes, wherein a PM-based regeneration code comprisesnStoring a DSS with a size ofFEach node stores data files ofAnd a symbol.

3. The heterogeneous regenerative code transform based data distributed storage method of claim 1, wherein the process of data reconstruction comprises: in DSS, legal user connectionsnIn individual nodeskThe individual nodes then download the data to recover the original file, a process that reconstructs the data.

4. The heterogeneous regenerative transcoding based data distribution storage method of claim 1, wherein the process of data recovery comprises: allowing new replacement nodes to connect with the remainder when one node failsnOf 1 surviving nodedA node, saiddThe individual nodes are called help nodes and are downloaded from each help nodeData is used for repairing the fault node, and the process is data repairing.

5. The method for distributed storage of data based on heterogeneous regenerative code conversion as claimed in claim 1, wherein the reconstruction and repair of data are two properties of a viable regenerative code structure,and->The bandwidth is stored and repaired for the nodes respectively, and each node in the isomorphic DSS has the same node storage and repair bandwidth.

6. The heterogeneous regenerative code transform based data distribution storage method as claimed in claim 1, wherein the heterogeneous DSS comprisesEach super node has larger storage capacity and higher repair bandwidth, the restEach common node has the same storage capacity and repair bandwidth.

7. The heterogeneous regenerative code transform based data distributed storage method of claim 1, wherein the design is adapted for heterogeneous DThe regeneration code of the SS realizes data reconstruction and data repair, and then the new heterogeneous coding transformation strategy is utilized to transform the regeneration code structure based on the PM into a group of new regeneration code structures HCT-RC, which comprises the following steps: given a PM-based regeneration code PM-RC, the codeword size of PM-RC isIs represented by a rule matrix C of>And->Respectively representing the number of nodes and the storage capacity of the nodes in the original isomorphic scene; rule matrix C->Go->The representation is stored at node->Upper->Data symbols.

8. A data distributed storage system based on heterogeneous regenerative code conversion, comprising:

the heterogeneous transcoding strategy is: rearranging a plurality of data symbols of a regular codeword matrix of PM-RC into an irregular matrix according to a set order, thereby converting the codeword of PM-RC into a codeword of HCT-RC; the heterogeneous transcoding strategy specifically includes: consider the inclusion ofhPersonal nodeIf the system parameters meet the conditions, by dividing the code word rule matrix C into two different code wordsThe data symbols are rearranged in a left to right, top to bottom order into an irregular matrix +.>Thereby converting the code word of the PM-RC into the code word of the HCT-RC, and using the heterogeneous coding transformation strategy in the super node heterogeneous DSS to obtain the HCT-RC coding structure suitable for the super node heterogeneous DSS;

s1: for the general heterogeneous DSS case

to guarantee data repair properties, nodes in HCT-RCRepair Bandwidth->Must meet the requirements of：

s2: for super node scene situations

Applying the code transformation strategy HCT to a heterogeneous super node scene to obtain a special HCT-RC coding structure suitable for a super node model; in HCT-RC encoding, a super nodeCorresponds to node +.>Andwhereas the normal node in HCT-RC coding is +.>Corresponds to node +.>The method comprises the steps of carrying out a first treatment on the surface of the The proposed HCT-RC coding is applied to a coding with +.>Super-nodeIn the scene of points.