CN107122248A - A kind of distributed figure processing method of storage optimization - Google Patents
A kind of distributed figure processing method of storage optimization Download PDFInfo
- Publication number
- CN107122248A CN107122248A CN201710301095.4A CN201710301095A CN107122248A CN 107122248 A CN107122248 A CN 107122248A CN 201710301095 A CN201710301095 A CN 201710301095A CN 107122248 A CN107122248 A CN 107122248A
- Authority
- CN
- China
- Prior art keywords
- node
- subregion
- processing
- data
- working
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/546—Message passing systems or structures, e.g. queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/064—Management of blocks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/0644—Management of space entities, e.g. partitions, extents, pools
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/52—Program synchronisation; Mutual exclusion, e.g. by means of semaphores
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Human Computer Interaction (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of distributed figure processing method based on storage optimization, belong to figure calculating field.The present invention includes:Data preprocessing phase carries out data division;Distribute figure partition data;Start data iterative processing;New information is transmitted;Working node extends decision-making;Data processing terminates.The present invention proposes to carry out subregion and storage to diagram data using uniformity hash algorithm, and design realizes the distributed figure processing system based on external memory pattern, the strategy optimized using dynamic memory, according to the partitioned storage of adjustment of load figure, realize that diagram data handles load balance, accelerate diagram data processing speed, solve the laod unbalance that prior art is present, overall performance caused by focus is caused to decline problem in diagram data processing procedure, so as to improve the performance of figure processing.
Description
Technical field
The invention belongs to figure calculating field, more particularly, to a kind of distributed figure processing method of storage optimization.
Background technology
Scheme, as classical data structure, complicated data relationship to be expressed by putting with side, society is widely used to
Each field, including the social data analysis of internet arena is interacted with the protein of excavation, chemical field, medical domain disease is sudden and violent
Prediction, the adduction relationship of sphere of learning Literature in path etc. are sent out, many important algorithms are then derived, including
PageRank, shortest path, connected component, maximal independent set etc..Just because of diagram data has great importance, need again a large amount of
Calculating, then occur in that various figure processing systems.
It is distributed memory ideograph processing system first, including Pregel, GraphLab etc., these systems are first figure
All information are all put into internal memory start to process again, and it is fast that this mode performs speed, but cost is big, cost is high, scale after
Under the figure application background of continuous increase, challenge more and more significant.And single processor relatively limited, the processing system that can assemble amount of ram
It is extending transversely can only horizontal supplement process machine quantity, this will inevitably increase figure subregion, further increase trimming quantity,
Increase interprocessor communication pressure, aggravate network I/O latency, thus will offset provided parallel advantage extending transversely, tie down figure
Process performance.
In experience contradiction extending transversely, a collection of unit external memory ideograph for taking Longitudinal Extension to design processing skill is emerged
Art system, including GraphChi, X-Stream etc., it is excellent using external memory is cheap relative to internal memory and capacity is more easily extensible
Most of data of figure are resided at external memory by point, and low volume data is loaded only when calculating has dependence and enters internal memory, the information master of figure
Dependence to being communicated between multimachine is reduced by the income of disk access, and can realize the resources such as internal memory height by
The processing of performance acceptable figure is carried out on the common machines of limit, but the performance of this system is seriously by the shadow of disk I/O
Ring.
In the epoch of big data, the scale of diagram data is increasing, and more and more higher is required to autgmentability, concurrency.At figure
Unit Longitudinal Extension is either taken in structure for reason system or cluster is extending transversely to face respective limitation.With regard to unit
Say, its resource expansion, either computing capability or memory source, I/O bandwidth have deficiency, review distributed structure/architecture, the conjunction of figure
Reason division turns into classical challenge already, although good data divide the load of energy EQUILIBRIUM CALCULATION FOR PROCESS, communication overhead reduced, so that at acceleration
Reason, but this division is a NP-hard problem in itself, that is, enables and realize approximate data, often also to take a substantial amount of time
And resource is pre-processed, lose more than gain, in view of this, prior art still only carries out simple diagram data division, such as
The Pregel division based on hash, Gemini's is continuously divided by section.This simple diagram data is divided at distributed figure
During reason, it is difficult to the problem of avoiding laod unbalance, the processing focus of dynamic change is caused, as tying down at whole figure iteration
The short slab of reason, influence figure processing overall performance.
The content of the invention
For the disadvantages described above of prior art, the present invention provides a kind of distributed figure processing method of storage optimization, to figure
Data carry out partitioned storage and IO balances, realize that diagram data handles load balance, accelerate diagram data processing speed, solve existing skill
The laod unbalance that art is present, causes overall performance caused by focus to decline problem in diagram data processing procedure.
To achieve the above object, the present invention provides a kind of distributed figure processing method of storage optimization, comprises the following steps:
(1) initialize;Figure processing system processor node is divided into a main controlled node and multiple working nodes, each work
Making node is used for the basic process of completion figure processing, realizes the computation model of figure processing;Main controlled node is used to control each work
Node;
Configuration of the main controlled node according to user initially, can be the file for including each nodal information, by this file Lai
The initial message routing table of generation, all working node preserves the message routing table copy and synchronizes renewal;It is described to disappear
Breath routing table is used to record the routing iinformation between each working node;Main controlled node controls the execution of whole figure processing system, work
Node completes the basic process of figure processing;The routing iinformation, transmission and master control section for new information between working node
Point communicates between arriving each working node;
Main controlled node divides diagram data according to the message routing table, i.e., the large-scale data of information is represented with graph structure,
Divided by the node id of figure;Divide block number identical with working node number in message routing table;Piecemeal is according to figure node sum
It is average to divide;All nodes one annulus of formation of the figure, figure node id minimum value and figure node id maximum
It is adjacent;By dividing, two kinds of situations are presented in each figure node subregion, and a kind of subregion is that have continuous id, and another subregion is that have two
Duan Lianxu id;It is exactly a figure subregion in above-mentioned annulus to scheming by the segmentation of figure node " continuous ", " continuous " here
It is continuous;This is a kind of block-based partition method, and the system does not carry out division deliberately to diagram data initially, do not ensured
The balance of subregion, but only by by figure node it is average assign to each figure subregion up, and a figure subregion possesses this
Subregion point it is all go out side;
(2) distribution of diagram data;Main controlled node divides step (1) member in obtained each figure node subregion and the subregion
Data are sent in message routing table corresponding working node according to uniformity hash algorithm, and the metadata includes the side of full figure
Number, the side number of the nodal point number of full figure, the type of figure, each subregion, the id of each subregion, the figure node id of the starting of each subregion, each point
Point diagram node id in the figure node id of end of extent, figure subregion;Subregion can use the information again below;
(3) iterations differentiates;Each working node starts diagram data iterative processing under main controlled node control;Before iteration,
Main controlled node differentiates whether iterations reaches iterations preset value, is to go to step (6);Otherwise, go to step (4);Here
Main controlled node plays a part of a barrier, and barrier is that a working node will wait until all working nodes in figure processing procedure
All complete that after a wheel iteration next round iteration could be carried out;
(4) new information is transmitted, and each working node performs MGA and calculated, including:
First, each edge to figure subregion is carried out a Map operation, if PageRank algorithms are to each figure node
Weight is divided equally according to the quantity for going out side, and each figure node produces a new information while Map operations are performed, and is sent to
Corresponding destination address;New information is that concrete application is produced, new information structure include needing to be transmitted to the data of adjacent node with
One destination address, the i.e. address of adjacent node;Secondly, each figure node performs a Gather operation collection and passes to the figure knot
All new informations of point;3rd, each figure node performs an Apply operation, changes this knot with the new information of collection
The data of point;The transmission of the new information is occurred only between working node, and new information is sent according to message routing table
To corresponding working node;The MGA calculating is the abbreviation of above-mentioned Map, Gather and Apply operation, is changed in the wheel of figure processing one
Dai Zhong, each figure node will undergo these three stages;
In this step, according to MGA computation models, (MGA is the letter of Map-Gather-Apply processes to each working node
Claim, in iteration is taken turns in figure processing one, each figure node will undergo these three stages), each edge to figure subregion is carried out
Map operation (producing new information according to concrete application), produces a new information (new information knot in working node
Structure passes to the data for facing node and a destination address comprising one), it is sent to corresponding destination address;Then, each figure knot
Point performs a Gather operation (collection passes to all new informations of the node), will be assigned to the new information of the node
Collect, then perform an Apply operation (node state is updated according to new information);Therefore in each round iteration, work
Node needs the new information of generation being sent to corresponding working node, and the transmission of this new information occurs only at work
Between node, new information is sent to corresponding working node according to distributed message routing table;All working node and master control
Node will preserve a distributed system message routing table, the distributed system message road that each of which working node is preserved
By table be the distributed system routing table that main controlled node is preserved copy, when main controlled node distributed system message route
When table updates, main controlled node by the message routing table synchronization and can update all working node message routing tables;
(5) extension process;Main controlled node differentiates that each working node load is according to collected working node running status
No equilibrium:
It is then, without splitting and extending, to go to step (3) and carry out next round iteration;
Otherwise line splitting is entered to the figure partition data for loading maximum functional node, i.e., the diagram data of processing split,
Then it is extending transversely, that most long node is taken to eliminate focus, i.e. processing data, uniformity hash algorithm is used for work
Make node distribution figure partition data, reach the purpose of load regulation and control;Then new information routing table, goes to step (3) progress next
Take turns iteration;Figure node is divided into two parts by described refer to;Extending transversely refer to adds a working node, to the figure separated
Data are handled;Uniformity hash algorithm is for the operation of working node partition data, to add each time after working node, all
Redistribute once;Extension is the key method of equally loaded, is also to speed up the important means that a wheel figure handles iteration;
(6) diagram data processing terminates, while exporting result of calculation.
Further, MGA calculating process ensure that to memory using the streaming reading of diagram data in the step (4)
Sequential access, is utilized so as to ensure that to external memory IO maximum.
Further, the working node running status collected in the step (5) includes disk I/O, network I/O and calculating
Consumption cost.
Further, focus described in the step (5), which refers in a wheel iteration, runs most slow working node, to focus
The principle that figure subregion enters line splitting be make two subregions being divided into load cost as close possible to;
COST=α | V |+| E |
Wherein, for diagram data processing, α takes the average in-degree of figure, α | V | represent what a figure subregion to be received
New information, | E | represent the load that COST in a figure subregion new information to be sent, formula weighs a figure;Division
Purpose be to find a figure node on the figure subregion for needing to divide, make the load for the two cross-talk subregions that the figure subregion splits into
Cost is roughly the same.COST can preferably weigh the load of a figure in the formula.Obviously, when extending decision-making, needing to divide
A figure node can be always found on the figure subregion split, makes the load cost substantially phase for the two cross-talk subregions that the figure subregion splits into
When.
The partition method that data use storage optimization is divided in the present invention, in the step 1, this partitioning strategies is thought substantially
Want to carry out when initial simply dividing by section, subsequently during figure is handled, system can be according to the load of working node
Situation carries out dividing by section again to subregion.Scheme one annulus of all node formation, figure node id minimum value and figure are tied
Point id maximum is adjacent, is exactly a figure subregion above-mentioned to scheming by the segmentation of figure node " continuous ", " continuous " here
Annulus is continuous.
In the present invention, MGA computation models particular content in the step 4 is as shown in figure 4, MGA processes use diagram data
Streaming reads the sequential access that ensure that to disk, and external memory IO maximum is utilized so as to ensure that.
In general, by the contemplated above technical scheme of the present invention compared with prior art, with following beneficial effect
Really:
The present invention stores diagram data when handling diagram data using uniformity Hash, then takes what dynamic memory optimized
Strategy, according to the partitioned storage of adjustment of load figure, so as to realize the dynamic expansion of figure subregion, eliminates " focus " work section
Point, balances IO, improves the performance of figure processing, greatly improves the speed of system processes data.
Brief description of the drawings
Fig. 1 is the distributed figure processing system execution flow chart of storage optimization of the present invention;
Fig. 2 is the general frame figure of the distributed figure processing system of storage optimization of the present invention;
Fig. 3 is the partitioning strategies figure of the distributed figure processing system of storage optimization of the present invention;
Fig. 4 is the distributed figure processing system calculation model M GA schematic diagrames of storage optimization of the present invention;
Fig. 5,6,7 are the distributed figure processing system subregion dynamic expansion schematic diagram of storage optimization of the present invention.
Embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.As long as in addition, technical characteristic involved in each embodiment of invention described below
Not constituting conflict each other can just be mutually combined.
Fig. 2 is the system architecture diagram of the distributed figure processing system of storage optimization of the present invention, and the system is made up of two parts,
Main controlled node and working node, main controlled node control the execution of whole figure processing system, and working node completes the basic of figure processing
Process, realizes the computation model of figure processing.Fig. 1 is the flow chart that the system data is handled, and specifically includes following steps:
Step 1:The division of figure:
Main controlled node can divide figure number according to the initial message routing table of the configuration generation of user initially according to the table first
According to having two kinds of situations in a subregion, one kind is that have continuous id, second is to have two sections of continuous id, this is a kind of to be based on block
Partition method, the system initially do not carry out division deliberately to diagram data, will not ensure the balance of subregion, but only
By by figure node it is average assign to each figure subregion up.
Step 2:The distribution of diagram data:
Figure subregion and subregion metadata are sent to corresponding working node by main controlled node, include the side number, complete of full figure
The nodal point number of figure, the type of figure, the side number of subregion, the id of subregion, the figure node id of the starting of subregion, the end figure node of subregion
Point diagram node id in id, figure subregion, again subregion can use the information;
Step 3:Start data iterative processing:
The beginning and end of iteration is often taken turns in the processing of main controlled node control figure, while time of iteration can be controlled according to preset value
Number, here main controlled node play a part of a barrier, barrier is that a working node will wait until all in figure processing procedure
Working node all completes that after a wheel iteration next round iteration could be carried out.When judging whole iteration completions, step 6 is performed;
Step 4:New information is transmitted:
In each round iteration, according to MGA (Map-Gather-Apply) computation model, each figure node will be through
These three stages are gone through, diagram data is carried out a Map behaviour to each edge in the form of streaming from disk input so that parallelization is handled
Make, produce a renewal, this more new construction includes a destination address, and each update can be sent to corresponding destination
Location, each figure node has Gather operation, then etc. all more new capital is collected finish after, each figure node can be held
One Apply operation of row.Working node needs the new information of generation being sent to corresponding working node, this new information
Transmission occur only between working node, new information is sent to corresponding work section according to distributed message routing table
Point;
Step 5:Extend decision-making:
Extension decision-making is the key method of equally loaded, is also to speed up the important means that a wheel figure handles iteration, master control section
Working node running status of the point meeting in iterative process according to collected by main controlled node, decides whether to carry out horizontal stroke to system
Carry out " division " to eliminate focus to extension (scale-out), and to which working node, reach the purpose of load regulation and control;
Step 6 diagram data processing terminates, while exporting result of calculation.
The present invention provides one embodiment, with [0,232- 1] ring-type hash spaces, Vid1, Vid2, Vid3, Vid4 tetra-
Exemplified by figure node, 3 figure subregions, wherein Vid2 and Vid4 are in same subregion, as shown in figure 5, specific introduce the present invention, bag
Include following steps:
The division of step 1 figure:
Main controlled node can divide figure number according to the initial message routing table of the configuration generation of user initially according to the table first
According to having two kinds of situations in a subregion, one kind is that have continuous id, second is to have two sections of continuous id, this is a kind of to be based on block
Partition method, the system initially do not carry out division deliberately to diagram data, will not ensure the balance of subregion, but only
By by figure node it is average assign to each figure subregion up.
For example shown in Fig. 3, it is an annulus in subregion that figure node, which is, and figure node is end to end, is divided into four points
Figure node id in area, subregion one, subregion two, subregion three be it is strict continuous, and subregion four be on annulus it is continuous,
The head and the tail node of figure is spanned, is really made up of two sections of strict continuous figure nodes.Assuming that in distributed figure processing procedure,
This block-based method reduces global figure node id to this block plan node part id mapping cost, and each figure subregion is only
Need to safeguard that boundary information just can be changed quickly.During load transfer is carried out, a figure subregion is divided again
When, it can prove, be inquired about along ring, can always find a figure node, be extended to the figure node, by the figure subregion again two
Point, it can make it that the load of this two node is equal or difference is minimum.
The distribution of step 2 diagram data:
Figure subregion and subregion metadata are sent to corresponding working node by main controlled node, first map working node
On the ring constituted to figure node, it is assumed that three working nodes obtain corresponding key (KEY) by uniformity Hash hash algorithm
Value, i.e. position of the working node on this ring.
Hash (W1)=KEY1
Hash (W2)=KEY2
Hash (W3)=KEY3
Then Key values are positioned in the ring, as shown in Figure 6.
Then uniformity hash algorithm is used, with clockwise direction, all figure nodes (Vid) of each figure subregion are reflected
It is mapped in the working node nearest from the subregion (being exactly that this figure subregion returns this working node to calculate).
Step 3 starts one and takes turns data iterative processing:
The beginning and end of iteration is often taken turns in the processing of main controlled node control figure, while time of iteration can be controlled according to preset value
Number, here main controlled node play a part of a barrier, barrier is that a working node will wait until all in figure processing procedure
Working node all completes that after a wheel iteration next round iteration could be carried out.When judging whole iteration completions, step 6 is performed;
Step 4 new information is transmitted:
In each round iteration, according to MGA (Map-Gather-Apply) computation model, each figure node will be through
These three stages are gone through, diagram data is carried out a Map behaviour to each edge in the form of streaming from disk input so that parallelization is handled
Make, produce a renewal, this more new construction includes a destination address, and each update can be sent to corresponding destination
Location, each figure node has Gather operation, then etc. all more new capital is collected finish after, each figure node can be held
One Apply operation of row.Working node needs the new information of generation being sent to corresponding working node, this new information
Transmission occur only between working node, new information is sent to corresponding work section according to distributed message routing table
Point;
Step 5 extends decision-making:
Extension decision-making is the key method of equally loaded, is also to speed up the important means that a wheel figure handles iteration, master control section
Working node running status between meeting is put in iterative process each time according to collected by main controlled node, decides whether to system
Carry out (scale-out) extending transversely, and " division " is carried out to eliminate focus to which working node, while using step 2
Described in uniformity hash algorithm distribute diagram data again, reach load regulation and control purpose.Assuming that focus appears in the 3rd
The subregion two of individual working node W3 processing, then need subregion two " division " into two child partitions, and adds the 4th work section
Point W4 handles one of child partition of division, as shown in Figure 7.
Step 6 diagram data processing terminates, while exporting result of calculation.
The present invention carries out quantitatively evaluating by the load to each working node, made " focus " when performing diagram data processing
The partition data of working node is divided equally, and realizes the dynamic expansion of subregion by uniformity hash algorithm, to the figure number of synthesis
Social network map data is tested according to this and truly, and experimental result is also demonstrated by storage optimization well, figure processing
Disequilibrium is reduced, and accelerates figure processing.
As it will be easily appreciated by one skilled in the art that the foregoing is merely illustrative of the preferred embodiments of the present invention, it is not used to
The limitation present invention, any modifications, equivalent substitutions and improvements made within the spirit and principles of the invention etc., it all should include
Within protection scope of the present invention.
Claims (5)
1. the distributed figure processing method of a kind of storage optimization, it is characterised in that comprise the following steps:
(1) initialize;Figure processing system processor node is divided into a main controlled node and multiple working nodes, each work section
Point is used for the basic process for completing figure processing, realizes the computation model of figure processing;Main controlled node is used to control each working node;
Configuration of the main controlled node according to user initially, generates initial message routing table, and all working node preserves the message
Routing table copy simultaneously synchronizes renewal;The message routing table is used to record the routing iinformation between each working node;Master control section
The execution of the whole figure processing system of point control, working node completes the basic process of figure processing;The routing iinformation, for work
The transmission of new information and main controlled node communicate between each working node between node;
Main controlled node divides diagram data according to the message routing table, is divided by the node id of figure;Divide block number and message
Working node number is identical in routing table;Piecemeal is divided according to figure node sum is average;All nodes one ring of formation of the figure
Shape space, figure node id minimum value and figure node id maximum are adjacent;By dividing, two kinds of feelings are presented in each figure node subregion
A kind of condition, subregion is that have continuous id, and another subregion is that have two sections of continuous id;
(2) distribution of diagram data;Main controlled node divides step (1) metadata in obtained each figure node subregion and the subregion
Corresponding working node is sent in message routing table according to uniformity hash algorithm, the metadata include full figure side number,
The nodal point number of full figure, the type of figure, the side number of each subregion, the id of each subregion, the figure node id of the starting of each subregion, each subregion knot
Point diagram node id in the figure node id of beam, figure subregion;
(3) iterations differentiates;Each working node starts diagram data iterative processing under main controlled node control;Before iteration, master control
Node differentiates whether iterations reaches iterations preset value, is to go to step (6);Otherwise, go to step (4);
(4) new information is transmitted, and each working node performs MGA and calculated, including:
First, each edge to figure subregion is carried out a Map operation, and each figure node is produced while MAP operations are performed
A raw new information, is sent to corresponding destination address;
Secondly, each figure node performs a Gather operation collection and passes to all new informations of the figure node;
3rd, each figure node performs an Apply operation, changes the data of this figure node with the new information of collection;Institute
The transmission for stating new information is occurred only between working node, and new information is sent to corresponding work according to message routing table
Node;
The MGA calculating is the abbreviation of above-mentioned Map, Gather and Apply operation, in iteration is taken turns in figure processing one, each figure
Node will undergo these three stages;
(5) extension process;Main controlled node differentiates whether each working node load is equal according to collected working node running status
Weighing apparatus:
It is then, without splitting and extending, to go to step (3) and carry out next round iteration;
Otherwise line splitting is entered to the figure partition data for loading maximum functional node, i.e., the diagram data of processing split, then
It is extending transversely, that most long node is taken to eliminate focus, i.e. processing data, uniformity hash algorithm is used for work section
Point distribution diagram partition data, reaches the purpose of load regulation and control;Then new information routing table, goes to step (3) progress next round and changes
Generation;
(6) diagram data processing terminates, while exporting result of calculation.
2. the method as described in claim 1, it is characterised in that MGA calculating process uses the stream of diagram data in the step (4)
Formula reads the sequential access that ensure that to memory, and external memory IO maximum is utilized so as to ensure that.
3. the method as described in claim 1, it is characterised in that the working node running status bag collected in the step (5)
Include disk I/O, network I/O and the consumption of calculating cost.
4. the method as described in claim 1, it is characterised in that focus described in step (5) refer to run in a wheel iteration it is most slow
Working node, the principle that line splitting is entered to the figure subregion of focus be make two subregions being divided into load cost as close possible to;
COST=α | V |+| E |
Wherein, for diagram data processing, α takes the average in-degree of figure, α | V | represent a figure subregion renewal to be received
Message, | E | represent the load that COST in a figure subregion new information to be sent, formula weighs a figure;The mesh of division
Be to find a figure node on the figure subregion for needing to divide, make the load cost for the two cross-talk subregions that the figure subregion splits into
It is roughly the same.
5. a kind of distributed figure processing method based on storage optimization, it is characterised in that system is made up of two parts, main controlled node
And working node, the execution of the whole figure processing system of main controlled node control, the basic process of working node completion figure processing, realize
Scheme the computation model of processing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710301095.4A CN107122248B (en) | 2017-05-02 | 2017-05-02 | Storage optimization distributed graph processing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710301095.4A CN107122248B (en) | 2017-05-02 | 2017-05-02 | Storage optimization distributed graph processing method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107122248A true CN107122248A (en) | 2017-09-01 |
CN107122248B CN107122248B (en) | 2020-01-21 |
Family
ID=59726565
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710301095.4A Active CN107122248B (en) | 2017-05-02 | 2017-05-02 | Storage optimization distributed graph processing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107122248B (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108241472A (en) * | 2017-12-01 | 2018-07-03 | 北京大学 | A kind of big data processing method and system for supporting locality expression function |
CN109522102A (en) * | 2018-09-11 | 2019-03-26 | 华中科技大学 | A kind of multitask external memory ideograph processing method based on I/O scheduling |
CN109522428A (en) * | 2018-09-17 | 2019-03-26 | 华中科技大学 | A kind of external memory access method of the figure computing system based on index positioning |
CN109710774A (en) * | 2018-12-21 | 2019-05-03 | 福州大学 | It is divided and distributed storage algorithm in conjunction with the diagram data of equilibrium strategy |
CN110309367A (en) * | 2018-03-05 | 2019-10-08 | 腾讯科技(深圳)有限公司 | Method, the method and apparatus of information processing of information classification |
CN110647406A (en) * | 2019-08-29 | 2020-01-03 | 湖北工业大学 | Coarse-grained graph data asynchronous iterative updating method |
CN110737804A (en) * | 2019-09-20 | 2020-01-31 | 华中科技大学 | graph processing memory access optimization method and system based on activity level layout |
CN110764824A (en) * | 2019-10-25 | 2020-02-07 | 湖南大学 | Graph calculation data partitioning method on GPU |
CN111737531A (en) * | 2020-06-12 | 2020-10-02 | 深圳计算科学研究院 | Application-driven graph division adjusting method and system |
CN111737540A (en) * | 2020-05-27 | 2020-10-02 | 中国科学院计算技术研究所 | Graph data processing method and medium applied to distributed computing node cluster |
CN112181288A (en) * | 2020-08-17 | 2021-01-05 | 厦门大学 | Data processing method of nonvolatile storage medium and computer storage medium |
CN113065035A (en) * | 2021-03-29 | 2021-07-02 | 武汉大学 | Single-machine out-of-core attribute graph calculation method |
CN113777877A (en) * | 2021-09-03 | 2021-12-10 | 珠海市睿晶聚源科技有限公司 | Method and system for integrated circuit optical proximity correction parallel processing |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101645039A (en) * | 2009-06-02 | 2010-02-10 | 中国科学院声学研究所 | Method for storing and reading data based on Peterson graph |
US8429150B2 (en) * | 2010-03-14 | 2013-04-23 | Intellidimension, Inc. | Distributed query compilation and evaluation system and method |
CN104780213A (en) * | 2015-04-17 | 2015-07-15 | 华中科技大学 | Load dynamic optimization method for principal and subordinate distributed graph manipulation system |
CN105590321A (en) * | 2015-12-24 | 2016-05-18 | 华中科技大学 | Block-based subgraph construction and distributed graph processing method |
CN105653204A (en) * | 2015-12-24 | 2016-06-08 | 华中科技大学 | Distributed graph calculation method based on disk |
-
2017
- 2017-05-02 CN CN201710301095.4A patent/CN107122248B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101645039A (en) * | 2009-06-02 | 2010-02-10 | 中国科学院声学研究所 | Method for storing and reading data based on Peterson graph |
US8429150B2 (en) * | 2010-03-14 | 2013-04-23 | Intellidimension, Inc. | Distributed query compilation and evaluation system and method |
CN104780213A (en) * | 2015-04-17 | 2015-07-15 | 华中科技大学 | Load dynamic optimization method for principal and subordinate distributed graph manipulation system |
CN105590321A (en) * | 2015-12-24 | 2016-05-18 | 华中科技大学 | Block-based subgraph construction and distributed graph processing method |
CN105653204A (en) * | 2015-12-24 | 2016-06-08 | 华中科技大学 | Distributed graph calculation method based on disk |
Non-Patent Citations (1)
Title |
---|
ZHAN SHI ET AL.: "Partitioning dynamic graph asynchronously with distributed FENNEL", 《FUTURE GENERATION COMPUTER SYSTEMS》 * |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108241472A (en) * | 2017-12-01 | 2018-07-03 | 北京大学 | A kind of big data processing method and system for supporting locality expression function |
CN108241472B (en) * | 2017-12-01 | 2021-03-12 | 北京大学 | Big data processing method and system supporting locality expression function |
CN110309367A (en) * | 2018-03-05 | 2019-10-08 | 腾讯科技(深圳)有限公司 | Method, the method and apparatus of information processing of information classification |
CN110309367B (en) * | 2018-03-05 | 2022-11-08 | 腾讯科技(深圳)有限公司 | Information classification method, information processing method and device |
CN109522102A (en) * | 2018-09-11 | 2019-03-26 | 华中科技大学 | A kind of multitask external memory ideograph processing method based on I/O scheduling |
CN109522102B (en) * | 2018-09-11 | 2022-12-02 | 华中科技大学 | Multitask external memory mode graph processing method based on I/O scheduling |
CN109522428B (en) * | 2018-09-17 | 2020-11-24 | 华中科技大学 | External memory access method of graph computing system based on index positioning |
CN109522428A (en) * | 2018-09-17 | 2019-03-26 | 华中科技大学 | A kind of external memory access method of the figure computing system based on index positioning |
CN109710774A (en) * | 2018-12-21 | 2019-05-03 | 福州大学 | It is divided and distributed storage algorithm in conjunction with the diagram data of equilibrium strategy |
CN109710774B (en) * | 2018-12-21 | 2022-06-21 | 福州大学 | Graph data partitioning and distributed storage method combining balance strategy |
CN110647406A (en) * | 2019-08-29 | 2020-01-03 | 湖北工业大学 | Coarse-grained graph data asynchronous iterative updating method |
CN110647406B (en) * | 2019-08-29 | 2022-11-29 | 湖北工业大学 | Coarse-grained graph data asynchronous iterative updating method |
CN110737804B (en) * | 2019-09-20 | 2022-04-22 | 华中科技大学 | Graph processing access optimization method and system based on activity degree layout |
CN110737804A (en) * | 2019-09-20 | 2020-01-31 | 华中科技大学 | graph processing memory access optimization method and system based on activity level layout |
CN110764824A (en) * | 2019-10-25 | 2020-02-07 | 湖南大学 | Graph calculation data partitioning method on GPU |
CN111737540A (en) * | 2020-05-27 | 2020-10-02 | 中国科学院计算技术研究所 | Graph data processing method and medium applied to distributed computing node cluster |
CN111737540B (en) * | 2020-05-27 | 2022-11-29 | 中国科学院计算技术研究所 | Graph data processing method and medium applied to distributed computing node cluster |
CN111737531B (en) * | 2020-06-12 | 2021-05-28 | 深圳计算科学研究院 | Application-driven graph division adjusting method and system |
CN111737531A (en) * | 2020-06-12 | 2020-10-02 | 深圳计算科学研究院 | Application-driven graph division adjusting method and system |
CN112181288A (en) * | 2020-08-17 | 2021-01-05 | 厦门大学 | Data processing method of nonvolatile storage medium and computer storage medium |
CN113065035A (en) * | 2021-03-29 | 2021-07-02 | 武汉大学 | Single-machine out-of-core attribute graph calculation method |
CN113777877A (en) * | 2021-09-03 | 2021-12-10 | 珠海市睿晶聚源科技有限公司 | Method and system for integrated circuit optical proximity correction parallel processing |
Also Published As
Publication number | Publication date |
---|---|
CN107122248B (en) | 2020-01-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107122248A (en) | A kind of distributed figure processing method of storage optimization | |
CN107710696B (en) | Method and network component for path determination | |
CN107122490B (en) | Data processing method and system for aggregation function in packet query | |
CN110737804B (en) | Graph processing access optimization method and system based on activity degree layout | |
Espegren et al. | The static bicycle repositioning problem-literature survey and new formulation | |
CN103631878B (en) | A kind of massive data of graph structure processing method, device and system | |
KR102163209B1 (en) | Method and reconfigurable interconnect topology for multi-dimensional parallel training of convolutional neural network | |
CN110222029A (en) | A kind of big data multidimensional analysis computational efficiency method for improving and system | |
CN107122244A (en) | A kind of diagram data processing system and method based on many GPU | |
CN105677755B (en) | A kind of method and device handling diagram data | |
CN104104621A (en) | Dynamic adaptive adjustment method of virtual network resources based on nonlinear dimensionality reduction | |
CN106846236A (en) | A kind of expansible distributed GPU accelerating method and devices | |
CN109376151A (en) | Data divide library processing method, system, device and storage medium | |
JPWO2018158819A1 (en) | Distributed database system and resource management method for distributed database system | |
CN113191029B (en) | Traffic simulation method, program, and medium based on cluster computing | |
Kumar et al. | Graphsteal: Dynamic re-partitioning for efficient graph processing in heterogeneous clusters | |
CN104426774A (en) | High-speed routing lookup method and device simultaneously supporting IPv4 and IPv6 | |
CN110245271B (en) | Large-scale associated data partitioning method and system based on attribute graph | |
CN104598600B (en) | A kind of parallel analysis of digital terrain optimization method based on distributed memory | |
CN108334532A (en) | A kind of Eclat parallel methods, system and device based on Spark | |
CN108197186B (en) | Dynamic graph matching query method applied to social network | |
US11886934B2 (en) | Control of data transfer between processing nodes | |
CN111737347B (en) | Method and device for sequentially segmenting data on Spark platform | |
CN114610501A (en) | Resource allocation method for parallel training of task planning model | |
CN108737130B (en) | Network flow prediction device and method based on neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |