CN116578439A - Repair tree construction method and data repair method based on simulated annealing algorithm - Google Patents

Repair tree construction method and data repair method based on simulated annealing algorithm Download PDF

Info

Publication number
CN116578439A
CN116578439A CN202310191885.7A CN202310191885A CN116578439A CN 116578439 A CN116578439 A CN 116578439A CN 202310191885 A CN202310191885 A CN 202310191885A CN 116578439 A CN116578439 A CN 116578439A
Authority
CN
China
Prior art keywords
sequence
prufer
repair
simulated annealing
new solution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310191885.7A
Other languages
Chinese (zh)
Inventor
杜献智
邓志航
王伟平
王建新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN202310191885.7A priority Critical patent/CN116578439A/en
Publication of CN116578439A publication Critical patent/CN116578439A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Error Detection And Correction (AREA)

Abstract

The application discloses a repair tree construction method based on a simulated annealing algorithm, which comprises the steps of obtaining node numbers and corresponding adjacency matrixes; setting initial parameters and control parameters; randomly generating an initial Prufer sequence, executing a simulated annealing algorithm, and recording the Prufer sequence corresponding to the maximum bottleneck bandwidth as a current solution; generating disturbance to the current solution to obtain a legal new solution; calculating bottleneck bandwidths of the new solution and the current solution and determining whether to accept the new solution; the two steps are repeated until the set condition is met, a Prufer sequence corresponding to the maximum bottleneck bandwidth is obtained, the Prufer sequence is decoded to obtain a root-free tree, an auxiliary joint in the root-free tree is used as a root node, and a final repair tree is obtained. The method has the advantages of high encoding and decoding speed, flexible algorithm, good searching effect and high reliability.

Description

Repair tree construction method and data repair method based on simulated annealing algorithm
Technical Field
The application belongs to the technical field of computers, and particularly relates to a repair tree construction method and a data repair method based on a simulated annealing algorithm.
Background
With the rapid development of computer technology and the continuous popularization of internet application, the network information content shows explosive growth, and the data content of a large-scale distributed data system generally reaches PB level. In view of cost, components of distributed storage systems often use a business architecture, which makes node failures in the system quite common. In a cluster with a storage capacity of hundreds of PB, for example, in Facebook, there are more than 50 machine unavailability times per day for more than 15 minutes, and thus it is important to ensure the reliability and availability of the distributed storage system. To improve reliability, the system often adds redundant information to ensure that data corruption can be repaired.
The traditional method uses a mirror image method, namely, original data is directly prepared into parts; when the data fails, the data blocks are directly pulled from the backup for restoration. Such a scheme, while very simple and efficient, requires the storage of data several times the original file size, with a significant storage overhead. And with the increasing amount of data today, the cost of mirroring is becoming increasingly more and more unacceptable.
For this reason, researchers have proposed using erasure codes to encode data and recover the data according to certain rules when the data is corrupted. The conventional Erasure Code (Erasure Code) divides the original data into k data blocks, and multiplies the k data blocks by a generator matrix of n×m to obtain n Code blocks, which are stored in different hosts. In contrast to the original file, erasure codes produce n-k redundant blocks, classical erasure codes can provide the system with the ability to tolerate any n-k block corruption. When a certain data block is not available, the data is downloaded from any k other surviving nodes to complete the repair through the decoding operation. Erasure codes that meet this property are also called maximum distance separable (Maximum Distance Separable, MDS) codes.
The erasure codes can greatly reduce the storage cost of data, but because the encoding and decoding operations of the erasure codes require larger calculation cost, and most erasure codes often need to transmit data which is several times as large as damaged data to finish repairing, so that the network traffic of the erasure codes is greatly increased and blocking is easy to cause, and therefore, the erasure codes can take more time to repair compared with a mirror image method. Therefore, erasure codes sacrifice a portion of repair efficiency to increase storage efficiency; therefore, increasing the repair speed of erasure codes has become a hot point of research.
To reduce the transmission traffic during repair, researchers have proposed a regeneration code (Regenerating Code). There is a trade-off between node storage capacity and repair bandwidth for the regenerated codes, corresponding to two classes of optimal regenerated code families, namely minimum bandwidth regenerated codes (Minimum Bandwidth Regenerating, MBR) and minimum storage regenerated codes (Minimum Storage Regenerating, MSR), respectively. The minimum bandwidth regeneration code downloads the data packet with the total equivalent to the lost data quantity from the remaining n-1 nodes to repair the failure node, and the transmitted data quantity is minimum, but the scheme still has larger calculation cost and transmission cost.
In complex network environments, it is not common to directly transmit each node's data to the auxiliary nodes for repair; on the one hand, because the auxiliary nodes can process the downloading and calculating tasks simultaneously, if the data quantity transmitted by one node is too large, congestion effect can be generated, and on the other hand, because the bandwidth of the nodes directly to the auxiliary nodes is very low, higher transmission rate can be obtained through transfer. In general, a repair tree with high transmission rate from each data node to the auxiliary node needs to be constructed through reasonable planning of transmission paths. However, according to the Kaiset theorem, in the complete graph of n points, n is common n-2 It is obviously impractical to traverse all spanning trees and find the optimal spanning tree from them.
Therefore, researchers have proposed a parallel repair algorithm (PPR) based on divide-and-conquer. In the scheme, nodes needing to transmit data are divided into a plurality of groups every two for each round, and then intra-group transmission is carried out until all data are transmitted to the same new node. Such a transmission may be represented by a binary repair tree. Although the algorithm relieves the congestion phenomenon to a certain extent, obviously, the bandwidth resource of each node is not fully utilized, and the heterogeneity of transmission bandwidths among nodes in the actual network environment is not considered.
In addition, researchers also put forward a pipeline acceleration algorithm, which divides a data packet to be transmitted into a plurality of pieces, and transmits the pieces in a pipeline manner; when the number of slices sliced is large enough, the transmission speed of a link is infinitely close to the lowest bandwidth in the link. And a repair tree can be regarded as parallel operation of a plurality of links, so that the repair speed of the repair tree can be determined by the minimum value of the bandwidths represented in all sides of the repair tree, the optimal repair tree is actually the maximum bottleneck tree, and the problem can be solved by using a method for solving the minimum spanning tree. If under ideal conditions, without any constraints, a polynomial-time algorithm such as the plymouth algorithm or the kruercard algorithm can be used to solve the maximum bottleneck tree. However, in the practical application scenario, the number of tasks that can be processed by one server at the same time is limited, that is, the number of other nodes that can be connected to one node in the repair process is limited, and the number of other nodes that can be connected to one node is called as the degree, that is, the limit of the degree of the nodes in the practical network environment. After the limitation is added, solving the maximum bottleneck tree becomes an NP-hard problem, and a solving algorithm of polynomial time complexity is avoided; there is an exponential level of complexity to traverse the entire search space, and such time overhead is unacceptable in practical environments.
In summary, the existing method for constructing the repair tree and the method for recovering data by adopting the constructed repair tree all have the problems of high encoding and decoding time consumption, complex algorithm and poor reliability.
Disclosure of Invention
The application aims to provide a repair tree construction method based on a simulated annealing algorithm, which has the advantages of high coding and decoding speed, flexible algorithm, good searching effect and high reliability.
The second object of the present application is to provide a data recovery method including the repair tree construction method based on the simulated annealing algorithm.
The repair tree construction method based on the simulated annealing algorithm provided by the application comprises the following steps:
s1, acquiring the number of nodes and an adjacent matrix representing transmission bandwidth among all nodes;
s2, setting initial parameters and control parameters of a simulated annealing algorithm;
s3, randomly generating a Prufer sequence meeting the limiting requirement as an initial sequence;
s4, executing a simulated annealing algorithm on the initial sequence generated in the step S3, and recording a Prufer sequence corresponding to the maximum bottleneck bandwidth obtained in the algorithm process as a current solution;
s5, generating a disturbance to the current solution, so as to obtain a legal new solution;
s6, calculating the bottleneck bandwidths of the new solution obtained in the step S5 and the corresponding current solution, and determining whether to accept the new solution according to a Monte Carlo criterion;
s7, repeating the steps S5 to S6 until the set condition is met, and obtaining a Prufer sequence corresponding to the maximum bottleneck bandwidth;
s8, decoding the Prufer sequence obtained in the step S7 to obtain a root-free tree, and taking auxiliary joints in the root-free tree as root nodes to obtain a final repair tree.
The step S2 specifically comprises the following steps:
the set parameters include maximum degree d and initial temperature T 0 A temperature decay coefficient alpha, a correction coefficient k, a termination temperature ET and a Markov chain length L; setting the current iteration number as 0, and setting the current temperature T as the initial temperature T 0
The step S3 specifically comprises the following steps:
randomly generating a Prufer sequence as an initial sequence, wherein the length of the generated Prufer sequence is n-2, n is the total point number involved in repair, and the repair tree corresponding to the generated Prufer sequence meets the limit of the set degree.
The step S4 specifically comprises the following steps:
when calculating the bottleneck bandwidth of the Prufer sequence, decoding the Prufer sequence to obtain a spanning tree represented by the Prufer sequence; the decoding operation includes:
the prufer sequence is denoted a= [ a ] 1 ,a 2 ,...,a n-2 ]Newly created set g= [1,2, ], n];
B. Obtaining the minimum number which does not appear in the sequence a from the set G, connecting an edge between a node represented by the minimum number and a node represented by the first item in the Prufer sequence, deleting the minimum number from the set G, and deleting the first item in the Prufer sequence;
C. and (3) repeating the step B for n-2 times, and finally connecting one edge between the nodes represented by the two remaining numbers in the set G, thereby finally completing the decoding operation.
The step S5 specifically comprises the following steps:
recording the occurrence times of each node;
randomly changing the number of one position of the current solution, and judging:
and if the obtained new solution does not meet the set degree constraint rule, canceling the current change, and randomly changing the number of one position of the current solution again until the obtained new solution meets the set degree constraint rule.
The step S6 specifically comprises the following steps:
calculating the bottleneck bandwidth of the new solution and the bottleneck bandwidth of the old solution, and determining whether to accept the new solution according to the Monte Carlo criterion, so as to jump out the local optimum and go to the global optimum:
if the bottleneck bandwidth of the new solution is better, directly accepting the new solution;
if the bottleneck bandwidth of the new solution is worse, accepting the new solution with a set probability.
The method for accepting the new solution with the set probability specifically comprises the following steps:
and (3) judging:
if it isThen accept the new solution; otherwise, not accepting the new solution; wherein random (0, 1) is a randomly generated fraction between 0 and 1; f_new is the bottleneck bandwidth of the new solution; f_old is the bottleneck bandwidth of the old solution; t is the current temperature; k is a correction coefficient.
The step S7 specifically comprises the following steps:
a. the current iteration number is increased by 1, and judgment is carried out:
if the set Markov chain length is reached, resetting the current iteration number to 0, and carrying out the subsequent steps;
if the set Markov chain length is not reached, returning to the step S5, and carrying out the next iteration;
b. the current temperature is multiplied by the temperature decay coefficient to be used as the updated current temperature, and judgment is carried out:
if the current temperature is less than the termination temperature, ending the iteration;
if the current temperature is not less than the end temperature, returning to the step S5, and performing the next iteration.
The decoding in step S8 specifically includes the following steps:
1) The final Prufer sequence is expressed as b= [ b 1 ,b 2 ,...,b n-2 ]Newly created set g= [1,2, ], n];
2) Acquiring the minimum number which does not appear in the sequence b from the set g, connecting an edge between a node represented by the minimum number and a node represented by the first item in the Prufer sequence, deleting the minimum number from the set b, and deleting the first item in the Prufer sequence;
3) Repeating the step 2) for n-2 times, and finally connecting one edge between the nodes represented by the two remaining numbers in the set g to obtain a spanning tree represented by the Prufer sequence, thereby completing the decoding operation.
The application also provides a data recovery method comprising the repair tree construction method based on the simulated annealing algorithm, which comprises the following steps:
(1) Acquiring n-1 surviving nodes of data to be transmitted and 1 auxiliary node to be recovered;
(2) Constructing and obtaining an approximate optimal repair tree by adopting the repair tree construction method based on the simulated annealing algorithm according to the nodes obtained in the step (1);
(3) According to the approximate optimal repair tree obtained in the step (2), calculating a data transmission path to obtain the data transmission path;
(4) And (3) transmitting the backup data to an auxiliary node to be recovered according to the data transmission path obtained in the step (3), and completing the recovery of the data through calculation.
The application provides a repair tree construction method and a data repair method based on a simulated annealing algorithm, which designs a heuristic construction method of a repair tree, and based on the idea of random optimization, performs a round of Markov process on each temperature to ensure that an optimal solution is searched with higher probability, and uses a Monte Carlo criterion to ensure that the optimal solution is gradually approximated and local optimal is jumped out; therefore, the method has the advantages of high encoding and decoding speed, flexible algorithm, good searching effect and high reliability.
Drawings
FIG. 1 is a schematic flow chart of the construction method of the present application.
FIG. 2 is a schematic representation of the conversion of trees to Prufer sequences in the construction method of the present application.
FIG. 3 is a schematic representation of the reduction of Prufer sequences to trees in the construction method of the present application.
Fig. 4 is a diagram of bottleneck bandwidth calculation in an embodiment of the construction method of the present application.
FIG. 5 is a schematic diagram of a repair tree in an embodiment of the construction method of the present application.
FIG. 6 is a schematic flow chart of the repairing method of the present application.
Detailed Description
FIG. 1 is a schematic flow chart of the construction method of the present application: the repair tree construction method based on the simulated annealing algorithm provided by the application comprises the following steps:
s1, acquiring the number of nodes and an adjacent matrix representing transmission bandwidth among all nodes;
s2, setting initial parameters and control parameters of a simulated annealing algorithm; the method specifically comprises the following steps:
the set parameters include maximum degree d and initial temperature T 0 A temperature decay coefficient alpha, a correction coefficient k, a termination temperature ET and a Markov chain length L; setting the current iteration number as 0, and setting the current temperature T as the initial temperature T 0
S3, randomly generating a Prufer sequence meeting the limiting requirement as an initial sequence; the method specifically comprises the following steps:
randomly generating a Prufer sequence as an initial sequence, wherein the length of the generated Prufer sequence is n-2, n is the total point number involved in repair, and the repair tree corresponding to the generated Prufer sequence meets the limit of the set degree;
in the implementation, a data set or a hash table is used for recording the occurrence times of each node; establishing an array with the length of n-2, traversing n-2 positions of the array, and generating a random integer i of [1, n ]: regenerating if the number of occurrences of the number i is not less than the maximum value d of the set degree; otherwise, assigning i to the current position of the array, and adding one to the occurrence number of the number; the array obtained after the traversal is completed is a legal Prufer sequence which is required to be randomly generated; the application adopts an array to record the occurrence times of each node.
S4, executing a simulated annealing algorithm on the initial sequence generated in the step S3, and recording a Prufer sequence corresponding to the maximum bottleneck bandwidth obtained in the algorithm process as a current solution; the method specifically comprises the following steps:
when calculating the bottleneck bandwidth of the Prufer sequence, decoding the Prufer sequence to obtain a spanning tree represented by the Prufer sequence; the decoding operation includes:
the prufer sequence is denoted a= [ a ] 1 ,a 2 ,...,a n-2 ]Newly created set g= [1,2, ], n];
B. Obtaining the minimum number which does not appear in the sequence a from the set G, connecting an edge between a node represented by the minimum number and a node represented by the first item in the Prufer sequence, deleting the minimum number from the set G, and deleting the first item in the Prufer sequence;
C. repeating the step B for n-2 times, and finally connecting one edge between the nodes represented by the two remaining numbers in the set G, thereby finally completing the decoding operation;
s5, generating a disturbance to the current solution, so as to obtain a legal new solution; the method specifically comprises the following steps:
recording the occurrence times of each node;
randomly changing the number of one position of the current solution, and judging:
if the obtained new solution does not meet the set degree constraint rule, canceling the current change, and randomly changing the number of one position of the current solution again until the obtained new solution meets the set degree constraint rule;
in specific implementation, if the degree limit is set so that the degree of each node does not exceed d, the group of cnt number (the number of occurrences of each node is recorded by using the array cnt) of the Prufer sequence satisfying the degree limit should satisfy the following conditions:
max(cnt[i])+1≤d
wherein i is more than or equal to 1 and less than or equal to n; the perturbation mode has the advantages that any feasible solution can be searched theoretically, and the optimal solution of the problem can be searched with probability;
s6, calculating the bottleneck bandwidths of the new solution obtained in the step S5 and the corresponding current solution, and determining whether to accept the new solution according to a Monte Carlo criterion; the method specifically comprises the following steps:
calculating the bottleneck bandwidth of the new solution and the bottleneck bandwidth of the old solution, and determining whether to accept the new solution according to the Monte Carlo criterion, so as to jump out the local optimum and go to the global optimum:
if the bottleneck bandwidth of the new solution is better, directly accepting the new solution;
if the bottleneck bandwidth of the new solution is worse, accepting the new solution with the set probability; the method specifically comprises the following steps:
and (3) judging:
if it isThen accept the new solution; otherwise, not accepting the new solution;
wherein random (0, 1) is a randomly generated fraction between 0 and 1; f_new is the bottleneck bandwidth of the new solution; f_old is the bottleneck bandwidth of the old solution; t is the current temperature; k is a correction coefficient.
The method is to simulate the annealing process of the solid in the physical, the initial temperature is higher, the movement of the molecules in the solid is intense, and the components of random search are higher; along with the annealing process, the temperature is gradually reduced, the state in the solid is gradually stabilized, and the searching process gradually goes to the optimal solution;
s7, repeating the steps S5 to S6 until the set condition is met, and obtaining a Prufer sequence corresponding to the maximum bottleneck bandwidth; the method specifically comprises the following steps:
a. the current iteration number is increased by 1, and judgment is carried out:
if the set Markov chain length is reached, resetting the current iteration number to 0, and carrying out the subsequent steps;
if the set Markov chain length is not reached, returning to the step S5, and carrying out the next iteration;
b. the current temperature is multiplied by the temperature decay coefficient to be used as the updated current temperature, and judgment is carried out:
if the current temperature is less than the termination temperature, ending the iteration;
if the current temperature is not less than the termination temperature, returning to the step S5, and performing the next iteration;
s8, decoding the Prufer sequence obtained in the step S7 to obtain a root-free tree, and taking an auxiliary joint in the root-free tree as a root node to obtain a final repair tree; the decoding specifically comprises the following steps:
1) The final Prufer sequence is expressed as b= [ b 1 ,b 2 ,...,b n-2 ]Newly created set g= [1,2, ], n];
2) Acquiring the minimum number which does not appear in the sequence b from the set g, connecting an edge between a node represented by the minimum number and a node represented by the first item in the Prufer sequence, deleting the minimum number from the set b, and deleting the first item in the Prufer sequence;
3) Repeating the step 2) for n-2 times, and finally connecting one edge between the nodes represented by the two remaining numbers in the set g to obtain a spanning tree represented by the Prufer sequence, thereby completing the decoding operation.
The repair tree is a spanning tree with n nodes, and auxiliary nodes are selected as root nodes to receive data transmitted by other nodes and process the data to recover original data, and the other nodes represent surviving nodes needing to transmit data. The edge of the repair tree has a weight value which represents the bandwidth between two nodes, the transmission link from each leaf node to the root node adopts a pipeline acceleration algorithm to transmit data, and the transmission speed is limited by the minimum bandwidth on the link, so the repair speed of the repair tree is limited by the minimum edge, namely the bottleneck bandwidth. In an actual network, each node can process tasks in parallel, so that the number of other nodes connected in the repair tree by each node is assumed to be d at maximum. Any tree of n nodes may also represent a repair tree that represents the data transmission path of each node.
In practice, a repair tree containing n nodes can be represented by a Prufer sequence of length n-2, which has a one-to-one correspondence: a method for converting a tree with n nodes into Prufer sequence is as follows: the puncturing is iterated until only two points remain in the graph. Each iteration finds the point of smallest sequence number among all leaf nodes (points of degree one), adds the point adjacent to it to the Prufer sequence, and deletes it and the edge connected to it. A co-iteration and the number of occurrences of a node in the Prufer sequence is equal to the degree of that node in the tree minus one.
As shown in FIG. 2, which is a graph of the conversion of a tree into Prufer sequences [3, 5], it is easy to see that the degree of each node is [1,1,3,1,2], respectively, corresponding to the number of times each node appears in the Prufer sequence plus one. The Prufer sequence is therefore well suited as a representation of spanning trees with degree constraints and is also well suited to use as a model for the state of solids in annealing algorithms.
As shown in fig. 3, the process diagram of restoring the Prufer sequence [3, 5] into a tree is shown, and the minimum weight of the connected edge in the decoding process is the bottleneck bandwidth.
The construction method of the present application is further described with reference to one embodiment as follows:
giving 5 nodes (4 surviving nodes needing to transmit data and an auxiliary node receiving the data), and the bandwidth matrix is as followsIs generated by the following steps:
(a) The parameters n=5, d=2, t=1000, α=0.95, k=10, et=0.001, l=50, g=0. An array cnt of length n is created, representing the number of occurrences of each node, initially 0. Then creating an array a with a length of n-2 to represent Prufer sequence, traversing each position of the sequence, randomly obtaining a number x between 1 and n for the position i (1.ltoreq.i.ltoreq.n-2), if cnt [ x ] +1<d makes a [ i ] =x, and cnt [ x ] =cnt [ x ] +1, otherwise regenerating x. In this way, an initial sequence can be obtained, for example a= [4,3,1]. Recording the currently found optimal solution m=a and the maximum bottleneck bandwidth fm=0, and then starting to execute the simulated annealing algorithm
(b) Perturbation is performed on the current solution a= [4,3,1] to produce a new solution. A position is randomly selected, for example, the selected position i=3, and a random number is generated from 1 to n, for example, x=2, where cnt [ x ] =0, so cnt [ x ] +1<d =2 satisfies the degree constraint, and thus b=a, b [3] =2, to obtain a new solution b= [4,3,2].
(c) The entangled bottleneck bandwidth is calculated by the decoding operation, as shown in fig. 4, fa=15, fb=20. Because fb is>fa, a new solution is accepted according to the monte carlo criterion, i.e. let a=b, and the cnt array is updated to [0,1,1,1,0 ]]. And because fb>fm, then m=b, fm=fb=20 is updated. The operation is repeated until the next cycle, assuming that b= [1,3,2 is obtained]The bottleneck bandwidth fa=20 and fb=15 is calculated. Because fb is<=fa, so according to monte carlo rule, probability is requiredAccepting the new solution;
(d) Let iteration number g=g+1, if iteration number g equals markov chain length L, reset g=0, execute step (e), otherwise return to step (b);
(e) Multiplying the current temperature by a temperature decay coefficient, ending the algorithm if the current temperature is less than the termination temperature, otherwise returning to step (b);
(f) Decoding the searched Prufer sequence with the maximum bottleneck bandwidth to obtain a root-free tree, and taking the auxiliary node as a root node to obtain an approximate optimal repair tree.
FIG. 5 shows an example of the optimized search results of the present construction method for the repair tree, which is not difficult to prove to be optimal, because the present method is based on the simulated annealing algorithm, which is essentially a heuristic solution based on probability optimization, so that the present method can obtain an optimal solution.
FIG. 6 is a schematic diagram of a repairing method according to the present application: the data recovery method comprising the repair tree construction method based on the simulated annealing algorithm provided by the application comprises the following steps:
(1) Acquiring n-1 surviving nodes of data to be transmitted and 1 auxiliary node to be recovered;
(2) Constructing and obtaining an approximate optimal repair tree by adopting the repair tree construction method based on the simulated annealing algorithm according to the nodes obtained in the step (1);
(3) According to the approximate optimal repair tree obtained in the step (2), calculating a data transmission path to obtain the data transmission path;
(4) And (3) transmitting the backup data to an auxiliary node to be recovered according to the data transmission path obtained in the step (3), and completing the recovery of the data through calculation.

Claims (10)

1. A repair tree construction method based on a simulated annealing algorithm comprises the following steps:
s1, acquiring the number of nodes and an adjacent matrix representing transmission bandwidth among all nodes;
s2, setting initial parameters and control parameters of a simulated annealing algorithm;
s3, randomly generating a Prufer sequence meeting the limiting requirement as an initial sequence;
s4, executing a simulated annealing algorithm on the initial sequence generated in the step S3, and recording a Prufer sequence corresponding to the maximum bottleneck bandwidth obtained in the algorithm process as a current solution;
s5, generating a disturbance to the current solution, so as to obtain a legal new solution;
s6, calculating the bottleneck bandwidths of the new solution obtained in the step S5 and the corresponding current solution, and determining whether to accept the new solution according to a Monte Carlo criterion;
s7, repeating the steps S5 to S6 until the set condition is met, and obtaining a Prufer sequence corresponding to the maximum bottleneck bandwidth;
s8, decoding the Prufer sequence obtained in the step S7 to obtain a root-free tree, and taking auxiliary joints in the root-free tree as root nodes to obtain a final repair tree.
2. The method for constructing a repair tree based on a simulated annealing algorithm according to claim 1, wherein said step S2 comprises the steps of:
the set parameters include maximum degree d and initial temperature T 0 A temperature decay coefficient alpha, a correction coefficient k, a termination temperature ET and a Markov chain length L; setting the current iteration number as 0, and setting the current temperature T as the initial temperature T 0
3. The method for constructing a repair tree based on a simulated annealing algorithm according to claim 2, wherein said step S3 comprises the steps of:
randomly generating a Prufer sequence as an initial sequence, wherein the length of the generated Prufer sequence is n-2, n is the total point number involved in repair, and the repair tree corresponding to the generated Prufer sequence meets the limit of the set degree.
4. The method for constructing a repair tree based on a simulated annealing algorithm as claimed in claim 3, wherein said step S4 comprises the steps of:
when calculating the bottleneck bandwidth of the Prufer sequence, decoding the Prufer sequence to obtain a spanning tree represented by the Prufer sequence; the decoding operation includes:
the prufer sequence is denoted a= [ a ] 1 ,a 2 ,...,a n-2 ]Newly created set g= [1,2, ], n];
B. Obtaining the minimum number which does not appear in the sequence a from the set G, connecting an edge between a node represented by the minimum number and a node represented by the first item in the Prufer sequence, deleting the minimum number from the set G, and deleting the first item in the Prufer sequence;
C. and (3) repeating the step B for n-2 times, and finally connecting one edge between the nodes represented by the two remaining numbers in the set G, thereby finally completing the decoding operation.
5. The method for constructing a repair tree based on a simulated annealing algorithm as claimed in claim 4, wherein said step S5 comprises the steps of:
recording the occurrence times of each node;
randomly changing the number of one position of the current solution, and judging:
and if the obtained new solution does not meet the set degree constraint rule, canceling the current change, and randomly changing the number of one position of the current solution again until the obtained new solution meets the set degree constraint rule.
6. The method for constructing a repair tree based on a simulated annealing algorithm as claimed in claim 5, wherein said step S6 comprises the steps of:
calculating the bottleneck bandwidth of the new solution and the bottleneck bandwidth of the old solution, and determining whether to accept the new solution according to the Monte Carlo criterion, so as to jump out the local optimum and go to the global optimum:
if the bottleneck bandwidth of the new solution is better, directly accepting the new solution;
if the bottleneck bandwidth of the new solution is worse, accepting the new solution with a set probability.
7. The method for constructing a repair tree based on a simulated annealing algorithm as claimed in claim 6, wherein said accepting new solutions with a set probability comprises the steps of:
and (3) judging:
if it isThen accept the new solution; otherwise, not accepting the new solution; wherein random (0, 1) is a randomly generated fraction between 0 and 1; f_new is the bottleneck bandwidth of the new solution; f_old is the bottleneck bandwidth of the old solution; t is the current temperature; k is a correction coefficient.
8. The method for constructing a repair tree based on a simulated annealing algorithm as claimed in claim 7, wherein said step S7 comprises the steps of:
a. the current iteration number is increased by 1, and judgment is carried out:
if the set Markov chain length is reached, resetting the current iteration number to 0, and carrying out the subsequent steps;
if the set Markov chain length is not reached, returning to the step S5, and carrying out the next iteration;
b. the current temperature is multiplied by the temperature decay coefficient to be used as the updated current temperature, and judgment is carried out:
if the current temperature is less than the termination temperature, ending the iteration;
if the current temperature is not less than the end temperature, returning to the step S5, and performing the next iteration.
9. The method for constructing a repair tree based on a simulated annealing algorithm as claimed in claim 8, wherein said decoding of step S8 comprises the steps of:
1) The final Prufer sequence is expressed as b= [ b 1 ,b 2 ,...,b n-2 ]Newly created set g= [1,2, ], n];
2) Acquiring the minimum number which does not appear in the sequence b from the set g, connecting an edge between a node represented by the minimum number and a node represented by the first item in the Prufer sequence, deleting the minimum number from the set b, and deleting the first item in the Prufer sequence;
3) Repeating the step 2) for n-2 times, and finally connecting one edge between the nodes represented by the two remaining numbers in the set g to obtain a spanning tree represented by the Prufer sequence, thereby completing the decoding operation.
10. A data recovery method comprising the repair tree construction method based on the simulated annealing algorithm according to any one of claims 1 to 9, comprising the steps of:
(1) Acquiring n-1 surviving nodes of data to be transmitted and 1 auxiliary node to be recovered;
(2) Constructing and obtaining an approximate optimal repair tree by adopting the repair tree construction method based on the simulated annealing algorithm according to the node obtained in the step (1) and one of the claims 1-9;
(3) According to the approximate optimal repair tree obtained in the step (2), calculating a data transmission path to obtain the data transmission path;
(4) And (3) transmitting the backup data to an auxiliary node to be recovered according to the data transmission path obtained in the step (3), and completing the recovery of the data through calculation.
CN202310191885.7A 2023-03-02 2023-03-02 Repair tree construction method and data repair method based on simulated annealing algorithm Pending CN116578439A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310191885.7A CN116578439A (en) 2023-03-02 2023-03-02 Repair tree construction method and data repair method based on simulated annealing algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310191885.7A CN116578439A (en) 2023-03-02 2023-03-02 Repair tree construction method and data repair method based on simulated annealing algorithm

Publications (1)

Publication Number Publication Date
CN116578439A true CN116578439A (en) 2023-08-11

Family

ID=87540151

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310191885.7A Pending CN116578439A (en) 2023-03-02 2023-03-02 Repair tree construction method and data repair method based on simulated annealing algorithm

Country Status (1)

Country Link
CN (1) CN116578439A (en)

Similar Documents

Publication Publication Date Title
US9647698B2 (en) Method for encoding MSR (minimum-storage regenerating) codes and repairing storage nodes
CN110212923B (en) Distributed erasure code storage system data restoration method based on simulated annealing
EP1506621B1 (en) Decoding of chain reaction codes through inactivation of recovered symbols
Sasidharan et al. A high-rate MSR code with polynomial sub-packetization level
WO2008003094A2 (en) Efficient representation of symbol-based transformations with application to encoding and decoding of forward error correction codes
US20210271557A1 (en) Data encoding, decoding and recovering method for a distributed storage system
JP2005514828A (en) Multistage code generator and decoder for communication systems
CN107689983B (en) Cloud storage system and method based on low repair bandwidth
CN107003933B (en) Method and device for constructing partial copy code and data restoration method thereof
CN109194444A (en) A kind of balanced binary tree restorative procedure based on network topology
CN112035059A (en) Single-point failure recovery method for distributed storage system, electronic equipment and storage medium
CN113687975A (en) Data processing method, device, equipment and storage medium
CN108762978B (en) Grouping construction method of local part repeated cyclic code
KR101621752B1 (en) Distributed Storage Apparatus using Locally Repairable Fractional Repetition Codes and Method thereof
CN113626250A (en) Strip merging method and system based on erasure codes
CN112799605A (en) Square part repeated code construction method, node repair method and capacity calculation method
US10187084B2 (en) Method of encoding data and data storage system
CN104052499A (en) Erasure correcting decoding method and system of LDPC code
CN110781024B (en) Matrix construction method of symmetrical partial repetition code and fault node repairing method
CN116578439A (en) Repair tree construction method and data repair method based on simulated annealing algorithm
CN108647108B (en) Construction method of minimum bandwidth regeneration code based on cyclic VFRC
Li et al. Parallelizing degraded read for erasure coded cloud storage systems using collective communications
CN111224747A (en) Coding method capable of reducing repair bandwidth and disk reading overhead and repair method thereof
WO2017041233A1 (en) Encoding and storage node repairing method for functional-repair regenerating code
El Rouayheb et al. Synchronization and deduplication in coded distributed storage networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination