CN103885856B - Diagram calculation fault-tolerant method and system based on information regeneration mechanism - Google Patents

Diagram calculation fault-tolerant method and system based on information regeneration mechanism Download PDF

Info

Publication number
CN103885856B
CN103885856B CN201410085478.9A CN201410085478A CN103885856B CN 103885856 B CN103885856 B CN 103885856B CN 201410085478 A CN201410085478 A CN 201410085478A CN 103885856 B CN103885856 B CN 103885856B
Authority
CN
China
Prior art keywords
snapshot
superledge
data
calculate node
calculating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201410085478.9A
Other languages
Chinese (zh)
Other versions
CN103885856A (en
Inventor
薛继龙
曲直
杨智
代亚非
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN201410085478.9A priority Critical patent/CN103885856B/en
Publication of CN103885856A publication Critical patent/CN103885856A/en
Application granted granted Critical
Publication of CN103885856B publication Critical patent/CN103885856B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Hardware Redundancy (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention relates to a diagram calculation fault-tolerant method and system based on an information regeneration mechanism. According to the method, in the diagram calculating process, change information of a diagram structure between two adjacent snapshots is stored for diagram structure data, a peak value set is stored for information data, a diagram calculation system is recovered to a Superstep corresponding to the effectively snapshots through the stored change information of the diagram structure and the stored peak value set when efficacy losing occurs, and then calculation of the next superstep is started. The system comprises a control node, multiple calculation nodes and a distributed file system, and in order to better adapt to the efficacy losing condition of the diagram calculation system, the method and the system can lighten snapshot data, greatly shorten the generation time and the recover time of the snapshots, and improve the fault tolerance capacity of the diagram calculation system.

Description

A kind of figure based on message regenerative system calculates fault-tolerance approach and system
Technical field
The invention belongs to field of cloud calculation is and in particular to a kind of height for figure computing system based on message regenerative system Effect fault-tolerance approach and system.
Background technology
During figure computing system is applied to daily data analysiss and calculates by all kinds of enterprises more and more widely.Opening Source circle has emerged the figure Computational frame that many is increased income, such as apache giraph, powergraph etc..These figure computing systems Framework is inspired by pregel, all employs the semantic model based on summit and bsp(bulk synchronous Parallel) computation model, reads in diagram data from distributed memory system, is iterated calculating, finally will count in internal memory Calculate result to export in data base or file system.
One figure calculating task is divided into several superledges (superstep), in each superledge, each summit Jiang Chu The reason message that a superstep sends on neighbours summit, and disappeared to surrounding vertex transmission according to specific arithmetic logic Breath uses for next superstep.Between this message based summit, communication mechanism allows this Computational frame by summit Easily it is divided in multiple machines, thus reaching distributed and parallel computation purpose.
As a distributed computing framework, fault-tolerant it is very important.Traditional mapreduce Computational frame is often adopted With by subtask recalculation method, this is because the subtask of each calculating task has the property that
1. subtask does not have state, and all calculating desired datas can obtain from file system;
2. between subtask, there is no any dependence or communication.
And for figure computing system, two such characteristic does not exist:
1. there is state subtask: the structured data of figure and summit state are all saved in internal memory, and cannot be from file Obtain in system;
2. there is substantial amounts of message exchange between subtask, the subtask on every machine has cached one on other machines The message data of individual superstep.
This makes figure computing system cannot use the subtask recalculation method of mapreduce.Increase income in realization existing, Often using checkpoint(snapshot) mechanism, that is, after each superstep terminates, by the structured data of figure, apex-like State and epicycle receive the message being used for next superstep and are integrally saved in distributed file system as one soon According to.When occurring losing efficacy, all nodes abandon current calculating state and intermediate data, again read in recently from file system One snapshot.Although the method implements very simply, its network and disk expense are all very big, seriously drag slow calculating speed Degree, therefore often closes this function in actual applications.
Content of the invention
In order to better adapt to the failure conditions of figure computing system, the present invention propose a kind of for figure computing system Based on the efficient fault-tolerance approach of message regenerative system, and the figure computing system using the method, it is capable of the light of snapshot data Quantify, greatly shorten generation and the recovery time of snapshot, improve the fault-tolerant ability of figure computing system.
For achieving the above object, the technical solution used in the present invention is as follows:
A kind of figure based on message regenerative system calculates fault-tolerance approach, and its step includes:
1) in figure calculating process, graph structure data is preserved to the change information of graph structure between two neighboring snapshot, Summit value set is preserved for message data;
2) when occurring losing efficacy, using change information and the described summit value set of the described graph structure preserving, by figure meter Calculate system recovery to the effective corresponding superledge of snapshot (superstep) before, then start the calculating of next superledge.
Further, step 2) described before effectively snapshot be lost efficacy before any one effective snapshot, preferably Last effective snapshot before inefficacy.
Further, snapshot generating process at the end of superstep k for the step 1) is as follows:
A) storage is with respect to the figure change information δ of a upper snapshotK-1, k
B) store the value on all summits, form summit value set vk.
Further, if δI, jFor j-th superstep increments of change with respect to i-th superstep, gkFor extensive The multiple graph structure data snapshot to reading required during k-th superstep, then step 2) in returned to when occurring and losing efficacy The process of state during superstep k is as follows:
A) it is successively read g δ0,1δ1,2...δK-2, k-1, obtain gk-1
B) read summit value set vk, according to vkRecover massage set mk
C) read δK-1, k, form gk-1δK-1, k=gk.
A kind of figure computing system of employing said method, including a control node, multiple calculate node and distributed literary composition Part system, wherein, control node is responsible for distributing calculating task to calculate node, synchronous between calculate node, detects calculate node Failure conditions, simultaneously control calculating process carrying out;Calculate node is responsible for the evaluation work of specific tasks, each calculate node Internal memory in preserve a figure part summit and its with other summits couple situation;Distributed file system is responsible for storage figure Static information and running in snapshot data, the change information including graph structure between two neighboring snapshot and message The summit value set of data;When control node detects calculate node and lost efficacy, before controlling calculate node to return to effectively State.
The figure based on message regenerative system of the present invention calculates fault-tolerance approach, by based on graph structure data increment snapshot and Message data regenerative system it is achieved that a kind of lightweight snapshot and its restoration methods, when greatly reducing snapshot and generating and recover Required data volume, thus reducing the network bandwidth and disk request, improves fault recovery efficiency, be one accurate, careful Method, implementation complexity low it is easy to safeguard, there is higher practical value.
Brief description
Fig. 1 is that pagerank algorithm is realized and its regenerate functional arrangement.
Fig. 2 is that sssp algorithm is realized and its regenerate functional arrangement.
Fig. 3 is figure computing system structural representation.
Fig. 4 is calculating and the Snapshot time comparison diagram of giraph system and present system.
Fig. 5 is the checkpoint size comparison diagram of the k-core task of giraph system and present system.
Fig. 6 is pagerank comparison diagram recovery time of giraph system and present system.
Specific embodiment
Below by specific embodiments and the drawings, the present invention will be further described.
Traditional snapshot data includes two parts: a) graph structure data snapshot;B) cache makes for next superstep Message data snapshot.The present invention reduces this two-part data volume by two kinds of different methods respectively, is tied by figure Structure data increment snapshot and message data regenerative system, realize the lightweight of snapshot data, reach the mesh quickly generating and recovering , substantially reduce generation and the recovery time of snapshot.
1. increment graph structured data snapshot
For graph structure data, with respect to the method for traditional snapshot whole graph structure data, the present invention uses increment type Snapshot, preserves the change of graph structure between two snapshots using the method adding based on daily record.Formally, if δI, jFor I superstep increments of change with respect to i-th superstep, then the data preserving at the end of k-th superstep is δK-1, k.Return to x-th superstep if necessary, the graph structure data snapshot reading required for it is:
gx=g δ0,1δ1,2...δX-1, x, wherein gxRepresent the graph structure data at the end of x-th superledge, g represents original Graph structure data.
2. the message snapshot based on message data regenerative system
If the massage set that superstep k produces is mk, all summits value set is vk.For message data, with respect to The whole m of traditional preservationkMethod, the present invention only preserves vertex value collection vk, during recovery, then this section is directly recovered by vertex value The message that point sends in this superstep.
If the average number of degrees on each summit are β, then vkSnapshot size be mk1/ β.Because computation model is different, this The bright computation model that enhances is semantic, exposes restoration interface to upper layer application, voluntarily determines how the value by summit by programmer Recover the message of this superstep transmission.Its typical routine interface is:
Void regenerate (t value), wherein t value are current vertex at the end of superstep k Value.
In recovery process, system can call regenerate function to each summit, recover and send from this summit Message.Through test, the algorithm that the overwhelming majority runs in figure calculating platform is suitable for this interface.As example, Fig. 1 and Tu 2 sets forth two typical algorithm pagerank and sssp(signal source shortest path) algorithm realize and its corresponding Regenerate function.
3. snapshot processes
At the end of superstep k, its snapshot generating process is as follows:
A) storage is with respect to the figure change information δ of a upper snapshotK-1, k
B) store the value on all summits, form vk.
4. recovery process
When occurring losing efficacy, if wanting state when returning to superstep k, process is as follows:
A) it is successively read g δ0,1δ1,2...δK-2, k-1, obtain gk-1
B) read vk, according to vkRegenerate function is called for each summit, recovers mk
C) read δK-1, k, form gk-1δK-1, k=gk.
By above step, can be by the recovering state of figure computing system to superstep k, now recovery process is complete Become, next step can be started and calculate.
The experiment being carried out for the inventive method described in detail below.This experiment intactly achieve figure computing system and Previously described snapshot and restoration methods, and it is used true diagram data and common nomography as task load, analog systemss are lost The situation of effect, and is compared with traditional snapshot restoration methods, the performance measuring the method with this and effectiveness.
1. implementation steps
1) realize required computing system framework first, its system structure is as shown in Figure 3:
This system comprises a control node and multiple calculate node.Wherein, control node is responsible for distributing to calculate node Calculating task, synchronous between calculate node, detect the failure conditions of calculate node, control the carrying out of calculating process simultaneously;Calculate Node then be responsible for specific tasks evaluation work, save in the internal memory of each calculate node a figure part summit and its with The connection situation on other summits.Carry out message communicating using the apache mina communications framework similar with giraph between node.Point Cloth file system is then responsible for the snapshot data in the static information and running of storage figure.
2) activation system, and distribute calculating task to system, calculate node is by graph structure data from distributed file system Middle reading, and start to calculate under the coordination of control node.After each superstep terminates, calculate node will be snapshot data It is persisted in distributed file system.
3) when control node detects calculate node and lost efficacy, calculate node can be controlled to proceed as follows:
A) terminate the calculating of current superstep, carry out resource cleaning, enter reforestation practices;
B) find last effective snapshot k, and read in g from file systemk-1
C) read vk, to figure, each summit upper calls regenerate function to recover mk, disappear between different calculate nodes Breath exchanges;
D) read δK-1, k, form gk-1δK-1, k=gk
E) exit reforestation practices, start the calculating of superstep k+1.
2. implementation result
Tested for this algorithm, experiment uses 16 servers, every server configures are 12 core amd Opteron4180 and 48g internal memory, is connected with gigabit networking.Experiment diagram data used is certain social networkies local in 2009 Relation data, totally 2,500 ten thousand summit, about 1,500,000,000 sides, the average number of degrees are about 60.For compatible consideration, bottom distributed File system adopts hdfs1.1.0, and is used the apache giraph1.0.0 of current popular to compare as performance.
Experiment using 5 figure calculating fields typical algorithm as load: randomwalk, sssp, wcc, k-core and Pagerank, and corresponding regenerate function is achieved for each algorithm according to the demand of this method.All systems and Algorithm is all realized using java1.7.
Fig. 4 illustrate the calculating under several typical missions of the experimental system of giraph system and the present invention and The checkpoint time.Because data set and algorithm are realized essentially identical, the calculating time of two systems is basically identical, but due to In this method, so that the snapshot creation time of experimental system greatly shortens, the overall operation time accelerates for the addition of light weight snapshot.
Fig. 5 illustrate the experimental system of giraph system and the present invention when running k-core task snapshot document with appointing The cumulative size that business runs, wherein every 5 superstep of giraph system carry out a snapshot, and the pilot system of the present invention Each superstep all carries out a snapshot.The far smaller than traditional snapshot volume of file size of light weight snapshot can be seen.
On recovery time, it is used pagerank to be tested as Several Typical Load, and artificially in the 9th superstep Introduce inefficacy, as shown in Figure 6 it can be seen that having benefited from the regenerative system of message data, after the experimental system fault of the present invention The data volume that needs read in is less, and resume speed will be far faster than giraph system.
Above example only in order to technical scheme to be described rather than be limited, the ordinary skill of this area Personnel can modify to technical scheme or equivalent, without departing from the spirit and scope of the present invention, this The protection domain of invention should be to be defined described in claim.

Claims (6)

1. a kind of figure based on message regenerative system calculates fault-tolerance approach, and its step includes:
1) in figure calculating process, graph structure data is preserved to the change information of graph structure between two neighboring snapshot, for Message data preserves summit value set;
2) when occurring losing efficacy, using change information and the described summit value set of the described graph structure preserving, figure is calculated system System effective corresponding superledge of snapshot before returning to, then starts the calculating of next superledge;If δI, jFor j-th superledge relatively In the increments of change of i-th superledge, g represents original graph structured data, gkFor representing the graph structure number at the end of k-th superledge According to the process of state when then returning to k-th superledge when occurring and losing efficacy is as follows:
A) it is successively read g δ0,1δ1,2...δK-2, k-1, obtain gk-1
B) read summit value set vk, according to vkRecover massage set mk
C) read δK-1, k, form gk-1δK-1, k=gk.
2. the method for claim 1 is it is characterised in that step 1) snapshot generating process at the end of k-th superledge As follows:
A) storage is with respect to the figure change information δ of a upper snapshotK-1, k
B) store the value on all summits, form summit value set vk.
3. the method for claim 1 is it is characterised in that step b) is according to vkRecover massage set mkThe journey being adopted Sequence interface is: void regenerate (t value), wherein t value are current vertex at the end of superstep k Value.
4. a kind of figure computing system of employing claim 1 methods described is it is characterised in that include control node, multiple Calculate node and distributed file system, wherein, control node is responsible for distributing calculating task to calculate node, between calculate node Synchronous, detect the failure conditions of calculate node, control the carrying out of calculating process simultaneously;Calculate node is responsible for the calculating of specific tasks Work, in the internal memory of each calculate node preserve a figure part summit and its with other summits couple situation;Distributed File system is responsible for the snapshot data in the static information and running of storage figure, including graph structure between two neighboring snapshot Change information and message data summit value set;When control node detects calculate node and lost efficacy, control calculate node Effective state before returning to, that is, control calculate node to proceed as follows:
A) terminate the calculating of current superledge, carry out resource cleaning, enter reforestation practices;
B) effective snapshot k before finding inefficacy, and read in g from file systemk-1
C) read vk, according to vkRecover mk, message exchange between different calculate nodes;
D) read δK-1, k, form gk-1δk-1, k=gk
E) exit reforestation practices, start the calculating of+1 superledge of kth.
5. system as claimed in claim 4 it is characterised in that: before the described inefficacy of step b), effective snapshot k is inefficacy Last effective snapshot front.
6. system as claimed in claim 4 it is characterised in that: disappeared using apache mina communications framework between each node Message communication.
CN201410085478.9A 2014-03-10 2014-03-10 Diagram calculation fault-tolerant method and system based on information regeneration mechanism Expired - Fee Related CN103885856B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410085478.9A CN103885856B (en) 2014-03-10 2014-03-10 Diagram calculation fault-tolerant method and system based on information regeneration mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410085478.9A CN103885856B (en) 2014-03-10 2014-03-10 Diagram calculation fault-tolerant method and system based on information regeneration mechanism

Publications (2)

Publication Number Publication Date
CN103885856A CN103885856A (en) 2014-06-25
CN103885856B true CN103885856B (en) 2017-01-25

Family

ID=50954764

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410085478.9A Expired - Fee Related CN103885856B (en) 2014-03-10 2014-03-10 Diagram calculation fault-tolerant method and system based on information regeneration mechanism

Country Status (1)

Country Link
CN (1) CN103885856B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095371B (en) * 2015-06-29 2018-08-10 清华大学 The diagram data management method and its device of sequence diagram
CN106610876B (en) * 2015-10-23 2020-11-03 中兴通讯股份有限公司 Data snapshot recovery method and device
CN111309976B (en) * 2020-02-24 2021-06-25 北京工业大学 GraphX data caching method for convergence graph application
CN113448692B (en) * 2020-03-25 2024-06-14 杭州海康威视数字技术股份有限公司 Method, device, equipment and storage medium for calculating distributed graph
CN115391341A (en) * 2022-08-23 2022-11-25 抖音视界有限公司 Distributed graph data processing system, method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103294799A (en) * 2013-05-27 2013-09-11 北京大学 Method and system for parallel batch importing of data into read-only query system
CN103336808A (en) * 2013-06-25 2013-10-02 中国科学院信息工程研究所 System and method for real-time graph data processing based on BSP (Board Support Package) model
CN103345508A (en) * 2013-07-04 2013-10-09 北京大学 Data storage method and system suitable for social network graph
WO2013149381A1 (en) * 2012-04-05 2013-10-10 Microsoft Corporation Platform for continuous graph update and computation
CN103488637A (en) * 2012-06-11 2014-01-01 北京大学 Method for carrying out expert search based on dynamic community mining

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013149381A1 (en) * 2012-04-05 2013-10-10 Microsoft Corporation Platform for continuous graph update and computation
CN103488637A (en) * 2012-06-11 2014-01-01 北京大学 Method for carrying out expert search based on dynamic community mining
CN103294799A (en) * 2013-05-27 2013-09-11 北京大学 Method and system for parallel batch importing of data into read-only query system
CN103336808A (en) * 2013-06-25 2013-10-02 中国科学院信息工程研究所 System and method for real-time graph data processing based on BSP (Board Support Package) model
CN103345508A (en) * 2013-07-04 2013-10-09 北京大学 Data storage method and system suitable for social network graph

Also Published As

Publication number Publication date
CN103885856A (en) 2014-06-25

Similar Documents

Publication Publication Date Title
CN103885856B (en) Diagram calculation fault-tolerant method and system based on information regeneration mechanism
CN110334075B (en) Data migration method based on message middleware and related equipment
CN103218233B (en) Data allocation strategy in Hadoop isomeric group
CN107291550A (en) A kind of Spark platform resources dynamic allocation method and system for iterated application
CN104216782A (en) Dynamic resource management method for high-performance computing and cloud computing hybrid environment
CN106878111A (en) The cloud monitoring system and monitoring method of a kind of High Availabitity
CN115150471B (en) Data processing method, apparatus, device, storage medium, and program product
CN104570081A (en) Pre-stack reverse time migration seismic data processing method and system by integral method
Gurusamy et al. The real time big data processing framework: Advantages and limitations
Gu et al. Chronos: An elastic parallel framework for stream benchmark generation and simulation
CN103885829A (en) Virtual machine cross-data-center dynamic migration optimization method based on statistics
CN102799424A (en) Interface architecture of agile efficient layering server side
CN103441918A (en) Self-organizing cluster server system and self-organizing method thereof
CN104077438A (en) Power grid large-scale topological structure construction method and system
CN106598700A (en) Second-level high availability realization method of virtual machine based on pacemaker
CN104301434A (en) High speed communication architecture and method based on trunking
CN106095335A (en) A kind of electric power big data elastic cloud calculates storage platform architecture method
Zvara et al. Optimizing distributed data stream processing by tracing
CN105069029A (en) Real-time ETL (extraction-transformation-loading) system and method
CN109344009A (en) Mobile cloud system fault-tolerance approach based on classification checkpoint
CN105610879B (en) Data processing method and device
CN104299170B (en) Intermittent energy source mass data processing method
CN111160535A (en) DGCNN model acceleration method based on Hadoop
Sukhwani et al. Largeness avoidance in availability modeling using hierarchical and fixed-point iterative techniques
CN105843706B (en) A kind of Dynamic Packet system based on MPI high-performance calculation layering rollback and recovery agreement

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170125

Termination date: 20200310

CF01 Termination of patent right due to non-payment of annual fee