CN116610645B - Data distributed storage method and system based on heterogeneous regenerated code conversion - Google Patents

Data distributed storage method and system based on heterogeneous regenerated code conversion Download PDF

Info

Publication number
CN116610645B
CN116610645B CN202310869214.1A CN202310869214A CN116610645B CN 116610645 B CN116610645 B CN 116610645B CN 202310869214 A CN202310869214 A CN 202310869214A CN 116610645 B CN116610645 B CN 116610645B
Authority
CN
China
Prior art keywords
heterogeneous
data
node
hct
code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310869214.1A
Other languages
Chinese (zh)
Other versions
CN116610645A (en
Inventor
张帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Management University
Original Assignee
Shandong Management University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Management University filed Critical Shandong Management University
Priority to CN202310869214.1A priority Critical patent/CN116610645B/en
Publication of CN116610645A publication Critical patent/CN116610645A/en
Application granted granted Critical
Publication of CN116610645B publication Critical patent/CN116610645B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure provides a data distributed storage method and system based on heterogeneous regenerated code transformation, and relates to the technical field of data distributed storage, including obtaining a data file with a set size; encoding the data file with a regeneration code modified based on PM-RC transformation and storing across nodes in a heterogeneous DSS comprising a plurality of storage nodes; in the heterogeneous DSS, a PM-based regenerated code structure is transformed into a group of new regenerated code structures HCT-RC by utilizing a heterogeneous coding transformation strategy, and a codeword matrix of the regenerated code structures HCT-RC is an irregular matrix; the heterogeneous transcoding strategy is to rearrange a plurality of data symbols of a regular codeword matrix of the PM-based on-reproduction code into an irregular matrix in a certain order, thereby converting codewords of the PM-based on-reproduction code into codewords of HCT-RC. The present disclosure reduces the additional workload to the system of repairing a failed node.

Description

Data distributed storage method and system based on heterogeneous regenerated code conversion
Technical Field
The disclosure relates to the technical field of data distributed storage, in particular to a data distributed storage method and system based on heterogeneous regenerative code conversion.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
DSS (distributed storage system) is one of the mainstream schemes for solving mass data storage at home and abroad, such as GFS, (Google File System, distributed file system), in which data files are stored in a plurality of storage nodes in a distributed manner and are connected as a whole by a network to provide storage services to the outside. As a redundancy technique for ensuring availability and reliability of DSS, the regenerated code exhibits high efficiency in both node storage and repair bandwidth, and research and application of the regenerated code in DSS has attracted a great deal of research attention. As two special classes of explicit regeneration code structures, PM-MBR (PM-minimum bandwidth regeneration code) and PM-MSR (PM-minimum storage regeneration code) constructed using a PM (product matrix) framework provide minimum repair bandwidth and minimum node storage, respectively, for isomorphic DSS. An homogeneous DSS means that all nodes in the system have the same characteristics, e.g. the same node storage and repair bandwidth. Unlike an isomorphic DSS, a heterogeneous DSS contains storage nodes with different characteristics, i.e., nodes in the system have different storage capacities and repair bandwidths. The flexible characteristic brings wider application space to heterogeneous systems, such as P2P (peer-to-peer) cloud storage and Internet caching systems for video on demand, which both adopt heterogeneous DSSs.
In recent years, the design of a regeneration code suitable for heterogeneous DSS has become a research hotspot and has achieved a series of research results. For example, for heterogeneous DSS with different node storage capacities, some scholars provide an explicit regenerated code construction scheme by utilizing the Combinatorial Designs (combinatorial design) approach; some researchers design a flexible regeneration code structure capable of adjusting the node storage capacity based on mathematical methods such as Zigzag codes, permutation matrixes and the like; existing regenerated code schemes (such as PM-MBR and PM-MSR) constructed using PM framework are only applicable to isomorphic DSS and cannot be applied to more flexible heterogeneous scenarios.
Disclosure of Invention
The present disclosure provides a data distributed storage method and system based on heterogeneous regeneration code transformation, and provides a simple and effective coding transformation principle based on an irregular matrix, which can obtain a new regeneration code structure applicable to heterogeneous DSS through coding transformation of PM-RC (regeneration code structure based on PM, including PM-MBR and PM-MSR) codes in isomorphic DSS, named HCT-RC (regeneration code structure based on heterogeneous coding transformation), and has the properties of data reconstruction and data restoration, thereby ensuring the availability and reliability of the system.
According to some embodiments, the present disclosure employs the following technical solutions:
a data distributed storage method based on heterogeneous regenerative code transformation comprises the following steps:
acquiring a data file with a set size;
encoding the data file with a regeneration code modified based on PM-RC transformation and storing across nodes in a heterogeneous DSS comprising a plurality of storage nodes; the data files stored across the nodes can be subjected to data reconstruction and data repair;
in the heterogeneous DSS, a PM-RC (physical layer-to-physical layer) based on a PM regeneration code structure is transformed into a group of new regeneration code structures HCT-RC by utilizing a heterogeneous coding transformation strategy (HCT), and a codeword matrix of the regeneration code structures HCT-RC is an irregular matrix;
the heterogeneous transcoding strategy is: the plurality of data symbols of the regular codeword matrix of the PM-RC are rearranged into the irregular matrix in a set order, thereby converting the codeword of the PM-RC into a codeword of the HCT-RC.
According to some embodiments, the present disclosure employs the following technical solutions:
a heterogeneous regenerative code conversion based data distributed storage system comprising:
the data acquisition module is used for acquiring the data file with the set size;
a data storage module for encoding the data file with a regeneration code modified based on PM-RC conversion and storing the encoded data file across nodes in a heterogeneous DSS comprising a plurality of storage nodes; the data files stored across the nodes can be subjected to data reconstruction and data repair;
in the heterogeneous DSS, a heterogeneous code transformation strategy (HCT) is utilized to transform PM-RC of a regeneration code structure based on PM into a group of new regeneration code structure HCT-RC, and a codeword matrix of the regeneration code structure HCT-RC is an irregular matrix;
the heterogeneous transcoding strategy is: the plurality of data symbols of the regular codeword matrix of the PM-RC are rearranged into the irregular matrix in a set order, thereby converting the codeword of the PM-RC into a codeword of the HCT-RC.
Compared with the prior art, the beneficial effects of the present disclosure are:
according to the data distributed storage method based on heterogeneous regenerated code transformation, a simple and effective code transformation principle is provided based on an irregular matrix, PM-RC codes in isomorphic DSSs can be subjected to code transformation to obtain a new regenerated code structure suitable for the heterogeneous DSSs, and the new regenerated code structure is named as HCT-RC. The regenerated code structure provided by the disclosure has the properties of data reconstruction and data restoration, so that the usability and reliability of the system are ensured. In addition, for a super node scene in the heterogeneous DSS, compared with the existing PM-RC scheme suitable for the isomorphic DSS, the HCT-RC suitable for the super node scene has smaller Repair Locality (Repair regional degree, namely the number of help nodes required for repairing one fault node) under the condition of the same total storage consumption and Repair bandwidth consumption, so that the extra workload brought to the system by repairing the fault node is reduced.
The technical scheme provided by the disclosure provides a new thought and a new solution for the design of the regenerated codes in the heterogeneous DSS, and the application of the power-assisted regenerated code technology in the DSS is developed.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate and explain the exemplary embodiments of the disclosure and together with the description serve to explain the disclosure, and do not constitute an undue limitation on the disclosure.
Fig. 1 is a schematic diagram of a transformation process of the regenerated code structure transformation strategy of the present disclosure.
Detailed Description
The disclosure is further described below with reference to the drawings and examples.
It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the present disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments in accordance with the present disclosure. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
Example 1
In one embodiment of the present disclosure, a data distributed storage method based on heterogeneous regenerative code conversion is provided, including:
step one: acquiring a data file with a set size;
step two: encoding the data file with a regeneration code modified based on PM-RC transformation and storing across nodes in a heterogeneous DSS comprising a plurality of storage nodes; the data files stored across the nodes can be subjected to data reconstruction and data repair;
in the heterogeneous DSS, a PM-RC (physical layer-to-physical layer) based on a PM regeneration code structure is transformed into a group of new regeneration code structures HCT-RC by utilizing a heterogeneous coding transformation strategy (HCT), and a codeword matrix of the regeneration code structures HCT-RC is an irregular matrix;
the heterogeneous transcoding strategy is: the plurality of data symbols of the regular codeword matrix of the PM-RC are rearranged into the irregular matrix in a set order, thereby converting the codeword of the PM-RC into a codeword of the HCT-RC.
Furthermore, the HCT strategy is applied to the super node scene (a special heterogeneous DSS), so that the HCT-RC applicable to the super node scene can be obtained, and the performance of the HCT-RC is superior to that of the existing PM-RC.
As an embodiment, the implementation process of the data distributed storage method based on heterogeneous regenerative code transformation is as follows:
step 1: when an existing DSS-based distributed storage system model encodes a file with a set size by using a reproduction code and stores the file across nodes in a DSS containing a plurality of storage nodes, the existing DSS-based distributed storage system model generally comprises:
in particular, a data file of size F symbols is obtained, a regenerative code technique applied in DSS encodes and stores the data file of size F symbols across nodes in DSS network, a possible (n, k, d, α, β) regenerative code is capable of storing a data file of size F in a DSS containing n storage nodes, each node storing α symbols from a finite field F of size q q . DSS requires that a legitimate user (DC) be able to connect k out of n nodes and download data to recover the original file, a process called data reconstruction.
Meanwhile, when one node fails, a new replacement node is allowed to connect d nodes (called help nodes) among the remaining (n-l) surviving nodes, and beta data is downloaded from each help node to repair the failed nodeAt this point, this process is called data repair. Among these, data reconstruction and data repair are two important attributes that a viable regenerated code structure must possess. Where α and β are node storage and repair bandwidths, respectively. Each node in the homogeneous DSS has the same node storage and repair bandwidth. Unlike the isomorphic DSS, a generic heterogeneous DSS contains h nodes, denoted V 1 ,...,V h Their storage capacity and repair bandwidth are unequal, denoted as alpha respectively 1 ,...,α h (assumption of lossless generality. Alpha max =α 1 ≥α 2 ≥...≥α h-1 ≥α h ) And beta 1 ,...,β h
To ensure data reconstruction and data repair, a method suitable for heterogeneous DSS (h, k, d, alpha i ,β i ) The regeneration codes (i is more than or equal to 1 and less than or equal to h, k is more than 1, and d is less than h) are required to satisfy the following conditions: (1) Legal users (DC) can connect k nodes in the h nodes and download data to restore the original file; (2) When node V i In case of failure, a new replacement node may be obtained by the slave help node V j (1. Ltoreq.j. Noteq.i.ltoreq.h) download β j (≤α j ) Data symbols to complete data repair. For the fault node V i For example, the value of Repair locality is d, meaning that d helper nodes are required to repair V i Is a data of (a) a data of (b).
As a particular heterogeneous DSS, the present disclosure further contemplates a supernode model comprising h s (1≤h s D) super nodes, each super node having a larger storage capacity alpha s And higher repair bandwidth beta s . Residual (h-h s ) Each common node has the same storage capacity and repair bandwidth, denoted as α respectively 0 And beta 0 And satisfies the following relationship:
α s =2α 0 s β=2β 0 . (1)
obviously, the supernode scenario is heterogeneous. One of the simplest cases of this scenario is h s =1, i.e. there is only one supernode in the scene. One practical implementation of this situationThe application case is a P2P backup system, where the super node may be a server with higher service capabilities than other peer nodes.
Step 2: the present disclosure designs a Heterogeneous Code Transformation (HCT) strategy capable of transforming PM-RC in an isomorphic DSS into a new regenerated code structure HCT-RC, making it applicable to heterogeneous DSSs;
specifically, given a PM-RC, its codeword can be represented by a matrix C of size (nxα), where n and α represent the number of nodes and the node storage capacity, respectively, in the original isomorphic scene. The ith row ci of C represents a data stored in node U i Alpha data symbols (i.e., encoded symbols) on (1.ltoreq.i.ltoreq.n) and element c ij Representing node U i Is the j-th code symbol of (c). The present disclosure proposes a novel heterogeneous transcoding (HCT) strategy that transforms PM-RC into a set of new regenerated code structures, named HCT-RC, whose codeword matrixIs an irregular matrix.
The heterogeneous transcoding (HCT) strategy specifically includes:
s1: for the general heterogeneous DSS case
Consider first a general rule comprising h nodes V i (1.ltoreq.i.ltoreq.h) heterogeneous DSS for positive integer m i Satisfy the following requirementsAnd m is h =1, if the system parameters satisfy:
α i =m i α,1≤i≤h (2)
by rearranging the nα data symbols in codeword matrix C to the irregular matrix in a left-to-right, top-to-bottom orderThereby converting the codewords of the PM-RC into codewords of the HCT-RC as shown in fig. 1.
In this way the first and second light sources,comprises h rows, each row respectively comprising alpha 1 ,...,α h And a symbol. Matrix->I < th > row->The representation being stored at node V i Alpha of (a) i The data symbols are given by:
to guarantee data repair properties, node V in HCT-RC i Is the repair bandwidth beta of (2) i The following must be satisfied:
β i =m i β,1≤i≤h (4)
where β represents the repair bandwidth of each node in the PM-RC. Thus, when conditional expressions (2) and (4) are satisfied, PM-RC in the homogeneous DSS can be converted into HCT-RC encoding suitable for general heterogeneous scenarios. The data reconstruction and data repair properties of HCT-RC are given by the following two theorem.
Theorem 1 (data reconstruction property): let omega k Represents any subset of the set {1,2,., h } and its radix is |Ω } k |=k. HCT-RC coding is used as long as:
then a legitimate user (DC) can be assembled via the connection set V i ,i∈Ω k K nodes in } to reconstruct the original data file, where k p Representing the number of nodes that need to be connected to reconstruct the original file in PM-RC encoding.
And (3) proving: from equation (3), it can be seen that the codeword vectorThe token is stored at node V i Data of (1) consisting of m i (1.ltoreq.i.ltoreq.h) subvectors and each subvector +.>Corresponding to the original node stored in the PM-RCIs a data set of the data set. This means that it is stored at node V i Is composed of the data stored in m i Original nodesIs composed of data in (a). Thus, from the set { V } in HCT-RC i ,i∈Ω k Connecting k nodes to download stored data is equivalent to being aggregated in PM-RCMiddle connection->The original nodes download the stored data. Data reconstruction properties based on PM-RC, i.e. by concatenating k p The original file with the size of F can be reconstructed by each node, and the condition given by the formula (5) in the HCT-RC can be simply satisfied without difficulty, namely { V } can be connected i ,i∈Ω k K nodes in the file to reconstruct the original data file of size F.
Theorem 2 (data repair property): let omega d Represents any subset of the set {1,2,., h } and its radix is |Ω } d |=d. Using HCT-RC coding, when node V f (1. Ltoreq.f.ltoreq.h) fail-over, an alternate node may be configured to perform the following operation m f Secondary to repair lost data: at set omega d ={V i ,i∈Ω d Connect d help nodes in i +.f }, and from each help node V i I pieces of repair data are downloaded. The precondition for completing data repair is:
wherein d is p Representing PM-RC coded repairability (i.e., the number of helping nodes needed to repair a failed node).
And (3) proving: from equation (5), codeword vectorThe tokens are stored at the faulty node V f Data of (1) consisting of m f Sub-vectors are composed and each sub-vector +.>Corresponding to the original node stored in the PM-RCIs a data set of the data set. Thus, the fault node V is repaired in the HCT-RC f Is equivalent to repairing m in PM-RC f Original node->Here, attention is first paid to repair +.>Is the first subvector +.>According to equation (4), the set ω d ={V i ,i∈Ω d Each help node V in i +.f i Can provide beta i =m i Beta repair data symbols and->The individual repair data symbols are linearly independent. Thus, if the condition of expression (6) is satisfied, the sum ω is set d The repair data provided by d help nodes in (a) is +.>On the other hand, the data repair property of PM-RC encoding indicates d p The beta repair data symbols may repair any failed node in the PM-RC. Thus, subvector +.>Can be repaired. Similarly, the same operation procedure can be independently performed to repair the failed node V f The remainder (m) f -1) subvectors.
Notably, parameters in PM-RC (including k p And d p ) Is known and since for any i e {1, 2..h } has m i 1. Gtoreq.therefore, a positive integer k (.ltoreq.k) must be present in the HCT-RC encoding p ) And d is less than or equal to d p ) So that the formulas (5) and (6) are established.
S2: for super node scene situations
The method applies the HCT coding transformation strategy to heterogeneous super node scenes, and obtains a special HCT-RC coding structure suitable for a super node model. Let nodeIs h s The storage capacity and the repair bandwidth of the super node and the common node are respectively alpha 0 =α,β 0 =β. According to formula (1), super node V i (1≤i≤h s ) And a common node V j (h s +1.ltoreq.j.ltoreq.h) may be given by:
α i =2α,α j =α;β i =2β,β j =β, (7)
the super node and the normal node need to satisfy the conversion conditions of the formulas (2) and (4), so there are:
m i =2,m j =1;h=n-h s . (8)
based on HCT strategy, HCT-RC coded codeword matrix under super node modelGiven by the formula:
from FIG. 1 and equation (9), it can be seen thatI (1 is less than or equal to i is less than or equal to h) s ) The rows are composed of the (2 i-1) th row and the (2 i) th row of the C matrix,/->Is (h) s +1.ltoreq.j.ltoreq.h) corresponds to (h) of C s +j) row. This means that in HCT-RC encoding, a supernode V j Corresponds to node U in PM-RC encoding 2i-1 And U 2i Whereas the normal node V in HCT-RC encoding j Corresponds to node +.>Thus, the HCT-RC code proposed herein can be applied to a code having h s In the scenario of individual supernodes.
Performance analysis
The performance of the HCT-RC code is analyzed, and performance comparison of the HCT-RC code and the PM-RC code in three aspects of Repair regional, storage consumption and Repair bandwidth consumption is given for the DSS of the super node scene.
Storage consumption
Obviously, the PM-RC has a total storage capacity of nα, i.e. a data file of size F is encoded and stored on n nodes and each node stores α code symbols. Similarly, under HCT-RC encoding in the super node scenario, a data file of size F is stored on h nodes, one for each node V i Store alpha i (1.ltoreq.i.ltoreq.h) code symbols. Thus, it is not difficult to obtain the consumption required for HCT-RC encoding based on the formulas (7) and (8)Is provided.
Repair locality and repair bandwidth consumption
In PM-RC coding, the repair bandwidth brought by repairing any fault node is gamma p =d p Beta. According to theorem 2, as long as the condition is satisfiedThen the failed node V is repaired in the HCT-RC f The repair bandwidth of (1.ltoreq.f.ltoreq.h) is:
using HCT-RC coding in super node scenarios, according to equation (8), one can arrange
d=d p -d s (11)
To realizeWherein d is s =|{1,2,...,h s }∩Ω d I represents the node V participating in repairing the fault f (1. Ltoreq.f. Ltoreq.h).
In particular, to repair a failed supernode, a new replacement node needs to connect the remaining d s =h s -1 supernode and download 2 beta data symbols from each node and concatenate d p -1d s And each common node downloads beta data symbols from each node. On the other hand, if a normal node fails, the replacement node needs to be connected with d s =h s Super nodes are connected with d p -2d s And a common node completes data repair. In this way, the expression (11) can be satisfied in many cases. Thus, the HCT-RC encoded Repair localisation (i.e. d) is less than the PM-RC encoded Repair localisation (i.e. d) p ). Such performance advantages are beneficial for DSS because of data modificationThe complex session places a burden on the helping node.
By the formulas (10) and (11), there is gamma f =m f d p β=m f γ p . Based on the formula (8), the repairing bandwidth required for repairing one super node is 2 gamma p And the repair bandwidth required for repairing a common node is gamma p . Assuming that the PM-RC and HCT-RC codes have the same average node failure rate over a period of time λ, denoted as F λ . Then the total repair bandwidth consumption from using PM-RC encoding within λ is B PM =nF λ γ p . This means that in total nF is required over a period of time λ in order to maintain the reliability of DSS λ γ p And repair data symbols. Similarly, according to equations (8), (10), (11), in the supernode DSS, the total repair bandwidth over a period of time λ after using HCT-RC encoding can be obtained by:
thus, there is B PM =B HCT . It is noted that in practical application systems, the failure rate of the super node is lower than that of the common node. Thus, the total repair bandwidth of the HCT-RC is lower than the results of the above analysis in a practical system.
The performance comparisons for PM-RC and HCT-RC are summarized in Table 1. From the above analysis, the HCT-RC codes proposed by the present disclosure have smaller Repair localities in heterogeneous super node scenarios than existing PM-RC codes. Meanwhile, the heterogeneous code transformation strategy provided by the disclosure provides a new method and a new idea for the design of the regenerated code based on the PM theory.
TABLE 1 comparison of PM-RC versus HCT-RC Performance
Example 2
In one embodiment of the present disclosure, there is provided a data distributed storage system based on heterogeneous regenerative transcoding, including:
the data acquisition module is used for acquiring the data file with the set size;
a data storage module for encoding the data file with a regeneration code modified based on PM-RC conversion and storing the encoded data file across nodes in a heterogeneous DSS comprising a plurality of storage nodes; the data files stored across the nodes can be subjected to data reconstruction and data repair;
in the heterogeneous DSS, a PM-RC (physical layer-to-physical layer) based on a PM regeneration code structure is transformed into a group of new regeneration code structures HCT-RC by utilizing a heterogeneous coding transformation strategy (HCT), and a codeword matrix of the regeneration code structures HCT-RC is an irregular matrix;
the heterogeneous transcoding strategy is: the plurality of data symbols of the regular codeword matrix of the PM-RC are rearranged into the irregular matrix in a set order, thereby converting the codeword of the PM-RC into a codeword of the HCT-RC.
The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the specific embodiments of the present disclosure have been described above with reference to the drawings, it should be understood that the present disclosure is not limited to the embodiments, and that various modifications and changes can be made by one skilled in the art without inventive effort on the basis of the technical solutions of the present disclosure while remaining within the scope of the present disclosure.

Claims (8)

1. The data distributed storage method based on heterogeneous regenerated code transformation is characterized by comprising the following steps of:
acquiring a data file with a set size;
encoding the data file with a regeneration code modified based on PM-RC transformation and storing across nodes in a heterogeneous DSS comprising a plurality of storage nodes; the data files stored across the nodes can be subjected to data reconstruction and data repair;
in the heterogeneous DSS, a PM-RC (particulate matter-based) regeneration code structure is transformed into a group of new regeneration code structure HCT-RC by utilizing a heterogeneous coding transformation strategy, and a codeword matrix of the regeneration code structure HCT-RC is an irregular matrix;
the heterogeneous transcoding strategy is: rearranging a plurality of data symbols of a regular codeword matrix of PM-RC into an irregular matrix according to a set order, thereby converting the codeword of PM-RC into a codeword of HCT-RC; the heterogeneous transcoding strategy specifically includes: consider the inclusion ofhPersonal nodeIf the system parameters meet the conditions, by dividing the code word rule matrix C into two different code wordsThe data symbols are rearranged in a left to right, top to bottom order into an irregular matrix +.>Thereby converting PM-RC code words into HCT-RC code words, and using heterogeneous coding transformation strategy for super node heterogeneous DSIn S, obtaining an HCT-RC coding structure suitable for super node heterogeneous DSS;
s1: for the general heterogeneous DSS case
Consider first a general inclusionhPersonal nodeIs a positive integer +.>Satisfy the following requirementsAnd->If the system parameters meet the following conditions:
by combining the code words in matrix CThe data symbols are rearranged in a left to right, top to bottom order into an irregular matrix +.>Thereby converting the codeword of the PM-RC into a codeword of the HCT-RC;
in this way the first and second light sources,Includedhlines, each line containing ∈ ->A number of symbols; matrix->Is the first of (2)iGo->The representation is stored at node->Is->The data symbols are given by:
to guarantee data repair properties, nodes in HCT-RCRepair Bandwidth->The following must be satisfied:
wherein the method comprises the steps ofRepresenting a repair bandwidth for each node in the PM-RC;
when the condition is satisfiedAnd->When the PM-RC in the isomorphic DSS is converted into HCT-RC coding suitable for a general heterogeneous scene;
s2: for super node scene situations
Applying the code transformation strategy HCT to a heterogeneous super node scene to obtain a special HCT-RC coding structure suitable for a super node model; at the position ofIn HCT-RC coding, a super nodeCorresponds to node +.>Andwhereas the normal node in HCT-RC coding is +.>Corresponds to node +.>The method comprises the steps of carrying out a first treatment on the surface of the The proposed HCT-RC coding is applied to a coding with +.>In the scenario of individual supernodes.
2. The heterogeneous regenerative code conversion based data distributed storage method of claim 1, wherein acquiring the data file of the set size comprises: obtain the size ofFEncoding a data file of symbols and storing the data file across nodes in a DSS comprising a plurality of storage nodes, wherein a PM-based regeneration code comprisesnStoring a DSS with a size ofFEach node stores data files ofAnd a symbol.
3. The heterogeneous regenerative code transform based data distributed storage method of claim 1, wherein the process of data reconstruction comprises: in DSS, legal user connectionsnIn individual nodeskThe individual nodes then download the data to recover the original file, a process that reconstructs the data.
4. The heterogeneous regenerative transcoding based data distribution storage method of claim 1, wherein the process of data recovery comprises: allowing new replacement nodes to connect with the remainder when one node failsnOf 1 surviving nodedA node, saiddThe individual nodes are called help nodes and are downloaded from each help nodeData is used for repairing the fault node, and the process is data repairing.
5. The method for distributed storage of data based on heterogeneous regenerative code conversion as claimed in claim 1, wherein the reconstruction and repair of data are two properties of a viable regenerative code structure,and->The bandwidth is stored and repaired for the nodes respectively, and each node in the isomorphic DSS has the same node storage and repair bandwidth.
6. The heterogeneous regenerative code transform based data distribution storage method as claimed in claim 1, wherein the heterogeneous DSS comprisesEach super node has larger storage capacity and higher repair bandwidth, the restEach common node has the same storage capacity and repair bandwidth.
7. The heterogeneous regenerative code transform based data distributed storage method of claim 1, wherein the design is adapted for heterogeneous DThe regeneration code of the SS realizes data reconstruction and data repair, and then the new heterogeneous coding transformation strategy is utilized to transform the regeneration code structure based on the PM into a group of new regeneration code structures HCT-RC, which comprises the following steps: given a PM-based regeneration code PM-RC, the codeword size of PM-RC isIs represented by a rule matrix C of>And->Respectively representing the number of nodes and the storage capacity of the nodes in the original isomorphic scene; rule matrix C->Go->The representation is stored at node->Upper->Data symbols.
8. A data distributed storage system based on heterogeneous regenerative code conversion, comprising:
the data acquisition module is used for acquiring the data file with the set size;
a data storage module for encoding the data file with a regeneration code modified based on PM-RC conversion and storing the encoded data file across nodes in a heterogeneous DSS comprising a plurality of storage nodes; the data files stored across the nodes can be subjected to data reconstruction and data repair;
in the heterogeneous DSS, a PM-RC (particulate matter-based) regeneration code structure is transformed into a group of new regeneration code structure HCT-RC by utilizing a heterogeneous coding transformation strategy, and a codeword matrix of the regeneration code structure HCT-RC is an irregular matrix;
the heterogeneous transcoding strategy is: rearranging a plurality of data symbols of a regular codeword matrix of PM-RC into an irregular matrix according to a set order, thereby converting the codeword of PM-RC into a codeword of HCT-RC; the heterogeneous transcoding strategy specifically includes: consider the inclusion ofhPersonal nodeIf the system parameters meet the conditions, by dividing the code word rule matrix C into two different code wordsThe data symbols are rearranged in a left to right, top to bottom order into an irregular matrix +.>Thereby converting the code word of the PM-RC into the code word of the HCT-RC, and using the heterogeneous coding transformation strategy in the super node heterogeneous DSS to obtain the HCT-RC coding structure suitable for the super node heterogeneous DSS;
s1: for the general heterogeneous DSS case
Consider first a general inclusionhPersonal nodeIs a positive integer +.>Satisfy the following requirementsAnd->If the system parameters meet the following conditions:
by combining the code words in matrix CThe data symbols are rearranged in a left to right, top to bottom order into an irregular matrix +.>Thereby converting the codeword of the PM-RC into a codeword of the HCT-RC;
in this way the first and second light sources,Includedhlines, each line containing ∈ ->A number of symbols; matrix->Is the first of (2)iGo->The representation is stored at node->Is->The data symbols are given by:
to guarantee data repair properties, nodes in HCT-RCRepair Bandwidth->Must meet the requirements of:
Wherein the method comprises the steps ofRepresenting a repair bandwidth for each node in the PM-RC;
when the condition is satisfiedAnd->When the PM-RC in the isomorphic DSS is converted into HCT-RC coding suitable for a general heterogeneous scene;
s2: for super node scene situations
Applying the code transformation strategy HCT to a heterogeneous super node scene to obtain a special HCT-RC coding structure suitable for a super node model; in HCT-RC encoding, a super nodeCorresponds to node +.>Andwhereas the normal node in HCT-RC coding is +.>Corresponds to node +.>The method comprises the steps of carrying out a first treatment on the surface of the The proposed HCT-RC coding is applied to a coding with +.>Super-nodeIn the scene of points.
CN202310869214.1A 2023-07-17 2023-07-17 Data distributed storage method and system based on heterogeneous regenerated code conversion Active CN116610645B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310869214.1A CN116610645B (en) 2023-07-17 2023-07-17 Data distributed storage method and system based on heterogeneous regenerated code conversion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310869214.1A CN116610645B (en) 2023-07-17 2023-07-17 Data distributed storage method and system based on heterogeneous regenerated code conversion

Publications (2)

Publication Number Publication Date
CN116610645A CN116610645A (en) 2023-08-18
CN116610645B true CN116610645B (en) 2023-12-05

Family

ID=87683871

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310869214.1A Active CN116610645B (en) 2023-07-17 2023-07-17 Data distributed storage method and system based on heterogeneous regenerated code conversion

Country Status (1)

Country Link
CN (1) CN116610645B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108512918A (en) * 2018-03-23 2018-09-07 山东大学 The data processing method of heterogeneous distributed storage system
US10310765B1 (en) * 2017-08-31 2019-06-04 Amazon Technologies, Inc. Record-oriented data storage for a distributed storage system
CN110764950A (en) * 2019-10-31 2020-02-07 深圳信息职业技术学院 Hybrid coding method, data restoration method and system based on RS (Reed-Solomon) code and regeneration code

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10310765B1 (en) * 2017-08-31 2019-06-04 Amazon Technologies, Inc. Record-oriented data storage for a distributed storage system
CN108512918A (en) * 2018-03-23 2018-09-07 山东大学 The data processing method of heterogeneous distributed storage system
CN110764950A (en) * 2019-10-31 2020-02-07 深圳信息职业技术学院 Hybrid coding method, data restoration method and system based on RS (Reed-Solomon) code and regeneration code

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"SatNetCode: Functional Design and Experimental Validation of Network Coding over Satellite";M. A. Vázquez-Castro等;《2018 International Symposium on Networks, Computers and Communications (ISNCC)》;第1-4页 *
"一种异构分布式存储再生码变换原理";苗斌等;《现代电子技术》;参照摘要,第1-2章 *
"低复杂度的最小冗余再生码的矩阵构造方法";汪汉新等;《中南民族大学学报(自然科学版)》;第85-88页 *

Also Published As

Publication number Publication date
CN116610645A (en) 2023-08-18

Similar Documents

Publication Publication Date Title
WO2013191658A1 (en) System and methods for distributed data storage
CN109643258B (en) Multi-node repair using high-rate minimal storage erase code
Papailiopoulos et al. Simple regenerating codes: Network coding for cloud storage
US8775860B2 (en) System and method for exact regeneration of a failed node in a distributed storage system
CN103688515B (en) The coding of a kind of minimum bandwidth regeneration code and memory node restorative procedure
US9961142B2 (en) Data storage method, device and distributed network storage system
CN107003933B (en) Method and device for constructing partial copy code and data restoration method thereof
CN109491835B (en) Data fault-tolerant method based on dynamic block code
US11500725B2 (en) Methods for data recovery of a distributed storage system and storage medium thereof
Mahdaviani et al. Product matrix MSR codes with bandwidth adaptive exact repair
CN112799605B (en) Square part repeated code construction method, node repair method and capacity calculation method
CN106484559A (en) A kind of building method of check matrix and the building method of horizontal array correcting and eleting codes
CN108762978B (en) Grouping construction method of local part repeated cyclic code
CN108279995A (en) A kind of storage method for the distributed memory system regenerating code based on safety
CN116610645B (en) Data distributed storage method and system based on heterogeneous regenerated code conversion
CN104782101A (en) Encoding, reconstructing, and recovering methods used for self-repairing code stored by distributed network
Li et al. Parallelizing degraded read for erasure coded cloud storage systems using collective communications
Bhuvaneshwari et al. Review on LDPC codes for big data storage
CN111224747A (en) Coding method capable of reducing repair bandwidth and disk reading overhead and repair method thereof
CN110781025B (en) Symmetrical partial repetition code construction and fault node repairing method based on complete graph
US8510625B1 (en) Multi-site data redundancy
CN110781163B (en) Heterogeneous part repeated code construction and fault node repairing method based on complete graph
Subedi et al. FINGER: a novel erasure coding scheme using fine granularity blocks to improve Hadoop write and update performance
Singal et al. Storage vs repair bandwidth for network erasure coding in distributed storage systems
Oggier et al. Self-repairing codes: local repairability for cheap and fast maintenance of erasure coded data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant