CN113285985A - RS code node repairing method based on genetic algorithm under multi-data center background - Google Patents

RS code node repairing method based on genetic algorithm under multi-data center background Download PDF

Info

Publication number
CN113285985A
CN113285985A CN202110482403.4A CN202110482403A CN113285985A CN 113285985 A CN113285985 A CN 113285985A CN 202110482403 A CN202110482403 A CN 202110482403A CN 113285985 A CN113285985 A CN 113285985A
Authority
CN
China
Prior art keywords
node
repair
genetic algorithm
repairing
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110482403.4A
Other languages
Chinese (zh)
Inventor
王勇
锁欣
叶苗
蔡月
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Electronic Technology
Original Assignee
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Electronic Technology filed Critical Guilin University of Electronic Technology
Priority to CN202110482403.4A priority Critical patent/CN113285985A/en
Publication of CN113285985A publication Critical patent/CN113285985A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/004Arrangements for detecting or preventing errors in the information received by using forward error control
    • H04L1/0056Systems characterized by the type of code used
    • H04L1/0061Error detection codes

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Physiology (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a RS code node repairing method based on a genetic algorithm under the background of multiple data centers. The invention aims to solve the problem that a global optimal bottleneck bandwidth repair scheme cannot be obtained in a traditional data repair mode under the background of multiple data centers, and provides an optimal bottleneck bandwidth path selection method based on a genetic algorithm. The RS code node repairing method based on the genetic algorithm in the multi-data center background overcomes the problems of long repairing time delay and large bandwidth consumption of the traditional star repairing scheme and the traditional assembly line repairing scheme, reduces redundant data transmission, improves repairing efficiency and reduces repairing time.

Description

RS code node repairing method based on genetic algorithm under multi-data center background
Technical Field
The invention belongs to the field of distributed erasure code storage systems, and particularly relates to a RS code node repairing method based on a genetic algorithm under the background of multiple data centers.
Background
The distributed storage system becomes a mainstream storage system in the current large-scale data storage field by virtue of excellent performance and low construction cost, but node failure becomes a normal state because the underlying equipment of the distributed storage system generally adopts cheap commercial hardware.
In order to prevent service loss caused by data failure due to node failure, an erasure code and multiple copies of redundant data are often used in a distributed storage system to ensure the integrity and reliability of data, wherein the erasure code is widely applied because of low extra storage overhead, but when data failure occurs due to failure of a part of nodes, the erasure code needs to read other data blocks and perform coding and decoding to recover the failed data, and a large amount of recovery traffic overhead is generated in the process, and the recovery speed is low.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a RS code node repairing method based on a genetic algorithm under the background of multiple data centers, which reduces the repairing flow overhead and accelerates the repairing efficiency.
Taking (n, k) RS code as an example, n data blocks of the set of codes are stored in 2 data centers, respectively, and the network topology of the two data centers can be represented as G ═ V, E, W, V ═ 1,2, …, n1,n1+1, …, n } represents a set of n nodes with (n, k) codes stored in the cluster, where cloud 1 contains n1 repair nodes, cloud 2 contains n2 repair nodes, n1+n2=n;E={e11,e12,…,eijDenotes the link connections inside the cluster, eijRepresenting an optimal link between nodes; wijIndicates the available bandwidth between nodes and, when i ═ j, indicates the processing power of the nodes. When a node failure occurs in one cloud center, the cloud center C1 and the cloud center C2 respectively provide k1 and k2 data blocks as provider nodes to repair the failed node, and k is used as a provider node to repair the failed node1+k2=k。
When erasure code data is repaired, data blocks transmitted by all provider nodes can be recovered after all data blocks reach the repaired nodesAnd repeating the failed data blocks, wherein the link with the minimum bandwidth in the tree-shaped repair topology directly affects the efficiency of the whole data transmission process, the link is called a bottleneck link, the available bandwidth of the link is called bottleneck bandwidth, and the larger the bottleneck bandwidth of the repair tree is, the higher the efficiency of the repair tree in transmitting the repair data is. The problem of building a repair tree can therefore be generalized to finding one in graph G rooted at the reconstruction point and C1、C2Each of which includes k1、k2The repair tree of a provider node maximizes the bottleneck bandwidth, which can be expressed as the following formula, where w (i, j) is the available bandwidth of the current link, D is the data block transmitted on the link, and the quotient of the two is the transmission delay:
T=min(max{D/w(i,j)})
meanwhile, the tree topology has extra traffic to transmit compared with other topologies, so that each node in the repair tree performs merging operation on the child nodes and the coding blocks of the node, thereby saving the traffic transmission overhead, reducing the network load as much as possible and ensuring that the repair process does not cause great influence on the network. The calculated delay across the link can be expressed as the following equation, where the processiRepresenting the computational power of node i:
t(vi,vj)=Di/processi+Dj/processj
when the traditional repair tree is added with the condition of node processing capability, the objective function in this chapter can be expressed as the following formula:
Trepair=min(max(D/wij+Di/processi+Dj/processj))
s.t k1+k2=k
count(C1<k1),count(C2<k2)
the technical scheme adopted by the invention is as follows:
compared with the prior art, the method has the advantages that the repair bandwidth overhead is effectively reduced by introducing the factor of heterogeneous node processing capacity, and the repair time delay of the reconstruction tree is effectively reduced by designing a genetic algorithm.
Drawings
FIG. 1 is a flow chart of an algorithm;
FIG. 2 shows a coding scheme;
FIG. 3 is a crossover operation;
FIG. 4 shows a mutation operation;
fig. 5 shows the results of the experiment.
Detailed Description
The present invention will be further described with reference to the accompanying drawings, in which the described embodiments are merely illustrative and are not to be construed as limiting the present invention.
The invention relates to the problem of data recovery in a distributed storage system, and a node repair path is selected by utilizing an RS code node repair method based on a genetic algorithm under the background of multiple data centers, and lost data is repaired by transmitting data of a survival point through the path. The invention mainly comprises the following steps:
step 1: constructing a multi-center cluster which comprises a plurality of nodes, wherein the bandwidth distribution of the nodes among the data centers is 1G, the connection among the nodes in the data centers selects a partial topological structure in Internet 2OS3E topology, the network bandwidth is distributed according to data obtained by a planetlab, and the performance parameter x of the nodeskSelecting I/O, CPU, memory, and chip, each parameter corresponding to weight rk40%, 30%, 20%, 10%, corresponding value range is [21,265 ]],[1,90],[0.3,76],[0.5,23]The processing capacity of each node is:
Figure BDA0003048914690000031
step 2: starting to run a genetic algorithm by taking an optional point in the data center as a failure reconstruction node, wherein the algorithm steps are shown in figure 1;
and step 3: constructing a repair tree which takes the failure node as a root and meets the node repair constraint by taking the initialization function of the genetic algorithm as a basis, and repeating the popsize times to form a population; assuming that the data center is C1/C2, the specific repair tree generation algorithm is as follows:
(1) all the alternative points are divided into two sets: selected points and unselected points, wherein the subset of the selected points are C1_ provider and C2_ provider respectively, the selected points are a node set in the reconstruction tree, the unselected points are C1_ alive and C2_ alive, and the code T is [0,0,0, …,0 ];
(2) randomly selecting a point i from a union set of C1_ provider and C2_ provider, finding a node set S directly connected with the point i, then selecting a point j from an intersection of the S and C1_ alive and C2_ alive, taking the point as a child node of the point i, adding the j into a C1_ provider or C2_ provider set, and deleting the j from C1_ alive or C2_ alive;
(3) detecting the length of the queue of the C1_ provider and the C2_ provider, checking whether the constraint is met simultaneously, if not, repeatedly executing the step 2, and if so, outputting the code T, as shown in FIG. 2;
and 4, step 4: and randomly selecting two repair trees from the population according to the crossing probability f1 to cross, and generating new individuals. The crossover operation firstly compares the coding sequences of two parents, then reserves the common edge of the parents in the coding sequence of a new offspring, then detects whether the reserved part is a tree with one root as a reconstruction point, if not, expands the tree into a tree with one root as a reconstruction point through an upper and a lower adding points and a rotating direction, and finally carries out constraint detection on the tree, when the points existing in the coding sequence do not meet the constraint conditions of C1 and C2 on the nodes, randomly selects points from a gene pool for removing all the nodes in the expanded tree, adds the points into the coding sequence until the constraint is met, and generates a new crossover individual, as shown in FIG. 3;
and 5: and randomly selecting a repair tree from the population according to the mutation probability f2 to carry out mutation, and generating new individuals. In the mutation operation, a node is divided into two sets, namely a selection pool and a node set, one point is selected from the selection pool as a mutation point, then a subtree of the mutation point is detected, one point which is not in the mutation subtree and can be directly connected with the mutation point is selected from a reconstruction tree, and the point is used as a father node of the mutation point, so that a new mutation individual is generated, as shown in fig. 4;
step 6: and (4) selecting the population after cross variation to generate a new population, and then circulating the steps 4 and 5 again until the specified algebra is circulated, and outputting the optimal repair tree. The selection operation adopts a mechanism of 10% elite selection and 90% random selection, so that the convergence of the algorithm is ensured, and the diversity of the population is kept.
The experimental result is shown in fig. 5, and it can be seen that after the factor of node processing capacity is introduced, the solution solved by the primMST algorithm based on the greedy mechanism is not a local optimal solution. Meanwhile, compared with the traditional pipeline repairing scheme, the tree repairing scheme belongs to a plurality of nodes to concurrently execute repairing, so that the repairing time delay is obviously shorter than that of the pipeline repairing scheme.

Claims (2)

1. A RS code node repairing method based on genetic algorithm in the background of multiple data centers is characterized by comprising the following steps:
step 1: constructing a multi-center cluster which comprises a plurality of nodes, wherein the bandwidth distribution of the nodes among the data centers is 1G, the connection among the nodes in the data centers selects a partial topological structure in Internet 2OS3E topology, the network bandwidth is distributed according to data obtained by a planetlab, and the performance parameter x of the nodeskSelecting I/O, CPU, memory, and chip, each parameter corresponding to weight rk40%, 30%, 20%, 10%, corresponding value range is [21,265 ]],[1,90],[0.3,76],[0.5,23]The processing capacity of each node is:
Figure FDA0003048914680000011
step 2: optionally selecting one point from the data center as a failure reconstruction node;
and step 3: constructing a repair tree which takes the failure node as a root and meets the node repair constraint by taking the initialization function of the genetic algorithm as a basis, and repeating the popsize times to form a population;
and 4, step 4: randomly selecting two repair trees from the population according to the crossing probability f1 to carry out crossing to generate new individuals;
and 5: randomly selecting a repair tree from the population according to the mutation probability f2 to carry out mutation so as to generate new individuals;
step 6: and (4) selecting the population after cross variation to generate a new population, and then circulating the steps 4 and 5 again until the specified algebra is circulated, and outputting the optimal repair tree.
2. The node repair algorithm according to claim 1, wherein: step 4, a parent public link is reserved, which shows that the link has excellent performance and good transmission delay and calculation delay; in step 5, one node is selected to change its parent node in order to prevent the repair tree from falling into local optimum and being unable to jump out of the area.
CN202110482403.4A 2021-04-30 2021-04-30 RS code node repairing method based on genetic algorithm under multi-data center background Pending CN113285985A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110482403.4A CN113285985A (en) 2021-04-30 2021-04-30 RS code node repairing method based on genetic algorithm under multi-data center background

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110482403.4A CN113285985A (en) 2021-04-30 2021-04-30 RS code node repairing method based on genetic algorithm under multi-data center background

Publications (1)

Publication Number Publication Date
CN113285985A true CN113285985A (en) 2021-08-20

Family

ID=77278035

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110482403.4A Pending CN113285985A (en) 2021-04-30 2021-04-30 RS code node repairing method based on genetic algorithm under multi-data center background

Country Status (1)

Country Link
CN (1) CN113285985A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113938376A (en) * 2021-11-25 2022-01-14 桂林电子科技大学 Method for repairing fault node in distributed storage system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060218301A1 (en) * 2000-01-25 2006-09-28 Cisco Technology, Inc. Methods and apparatus for maintaining a map of node relationships for a network
CN104093166A (en) * 2014-07-08 2014-10-08 南京信息工程大学 Wireless sensor network connection recovery method based on minimum movement of nodes
CN105046022A (en) * 2015-08-27 2015-11-11 安徽工业大学 Self-healing method of smart distribution network on the basis of improved ant colony algorithm
CN105072194A (en) * 2015-08-27 2015-11-18 南京大学 Structure and method for recovering stored data in distributed file system
CN109889440A (en) * 2019-02-20 2019-06-14 哈尔滨工程大学 A kind of correcting and eleting codes failure node reconstruct routing resource based on maximum spanning tree
CN112035059A (en) * 2020-08-04 2020-12-04 烽火通信科技股份有限公司 Single-point failure recovery method for distributed storage system, electronic equipment and storage medium
WO2020247949A1 (en) * 2019-06-07 2020-12-10 The Regents Of The University Of California General form of the tree alternating optimization (tao) for learning decision trees
CN112256471A (en) * 2020-10-19 2021-01-22 北京京航计算通讯研究所 Erasure code repairing method based on separation of network data forwarding and control layer

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060218301A1 (en) * 2000-01-25 2006-09-28 Cisco Technology, Inc. Methods and apparatus for maintaining a map of node relationships for a network
CN104093166A (en) * 2014-07-08 2014-10-08 南京信息工程大学 Wireless sensor network connection recovery method based on minimum movement of nodes
CN105046022A (en) * 2015-08-27 2015-11-11 安徽工业大学 Self-healing method of smart distribution network on the basis of improved ant colony algorithm
CN105072194A (en) * 2015-08-27 2015-11-18 南京大学 Structure and method for recovering stored data in distributed file system
CN109889440A (en) * 2019-02-20 2019-06-14 哈尔滨工程大学 A kind of correcting and eleting codes failure node reconstruct routing resource based on maximum spanning tree
WO2020247949A1 (en) * 2019-06-07 2020-12-10 The Regents Of The University Of California General form of the tree alternating optimization (tao) for learning decision trees
CN112035059A (en) * 2020-08-04 2020-12-04 烽火通信科技股份有限公司 Single-point failure recovery method for distributed storage system, electronic equipment and storage medium
CN112256471A (en) * 2020-10-19 2021-01-22 北京京航计算通讯研究所 Erasure code repairing method based on separation of network data forwarding and control layer

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
郑杰辉: "分布式存储系统的节点修复技术研究", 《太原学院学报(自然科学版)》 *
郭威: "分布式存储系统中失效节点修复技术研究", 《万方数据》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113938376A (en) * 2021-11-25 2022-01-14 桂林电子科技大学 Method for repairing fault node in distributed storage system
CN113938376B (en) * 2021-11-25 2023-08-01 桂林电子科技大学 Fault node repairing method in distributed storage system

Similar Documents

Publication Publication Date Title
CN106786546B (en) Power distribution network fault recovery strategy optimization method based on risk assessment
CN109889440B (en) Erasure code failure node reconstruction path selection method based on maximum spanning tree
Altiparmak et al. Optimal design of reliable computer networks: A comparison of metaheuristics
CN112035059A (en) Single-point failure recovery method for distributed storage system, electronic equipment and storage medium
Hu et al. Adaptive slave controller assignment for fault-tolerant control plane in software-defined networking
CN111475953B (en) Energy supply reliability influence analysis method, device equipment and storage medium
CN109617800B (en) Data center network fault-tolerant safe routing method based on balance hypercube
CN113285985A (en) RS code node repairing method based on genetic algorithm under multi-data center background
CN109194444A (en) A kind of balanced binary tree restorative procedure based on network topology
CN112822052A (en) Network fault root cause positioning method based on network topology and alarm
KR20090060910A (en) Availablity prediction method for ha cluster
CN113225395A (en) Data distribution strategy and data restoration algorithm under multi-data center environment
CN109242242B (en) Method and system for determining risk modeling of system protection private network business
CN113938376B (en) Fault node repairing method in distributed storage system
CN115883577B (en) Block chain network clustering and data transmission method
CN108768748B (en) Fault diagnosis method and device for power communication service and storage medium
CN111600752A (en) Power communication service reliability optimization method and related device
JP2014187624A (en) Repair method and repair program of network
CN111412795A (en) Test point setting scheme generation method and device
CN116737787A (en) Block chain data storage query method based on improved cuckoo filter
CN112307607B (en) Edge coupling-based dependent network seepage analysis method and analysis system
Thompson et al. Decentralised data fusion in 2-tree sensor networks
Adhikari et al. On a new interconnection network for large scale parallel systems
CN111385200B (en) Control method and device for data block repair
JP6981232B2 (en) Network design equipment, methods, and programs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210820