CN105787020A - Graph data partitioning method and device - Google Patents

Graph data partitioning method and device Download PDF

Info

Publication number
CN105787020A
CN105787020A CN201610101409.1A CN201610101409A CN105787020A CN 105787020 A CN105787020 A CN 105787020A CN 201610101409 A CN201610101409 A CN 201610101409A CN 105787020 A CN105787020 A CN 105787020A
Authority
CN
China
Prior art keywords
data
plies
divides
dimension number
limit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610101409.1A
Other languages
Chinese (zh)
Other versions
CN105787020B (en
Inventor
武永卫
章明星
陈康
郑纬民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yangtze Delta Region Institute of Tsinghua University Zhejiang
Original Assignee
Innovation Center Of Yin Zhou Qinghua Changsanjiao Research Inst Zhejiang
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Innovation Center Of Yin Zhou Qinghua Changsanjiao Research Inst Zhejiang filed Critical Innovation Center Of Yin Zhou Qinghua Changsanjiao Research Inst Zhejiang
Priority to CN201610101409.1A priority Critical patent/CN105787020B/en
Publication of CN105787020A publication Critical patent/CN105787020A/en
Application granted granted Critical
Publication of CN105787020B publication Critical patent/CN105787020B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a graph data partitioning method and device.The method comprises the steps that modeling is performed on an algorithm according to graph-defined data and a calculation model UPPS; modeling data is partitioned through a two-dimensional data partitioning method, and the partitioned data with the smallest redundancy rate is acquired; the best third dimension layer number is estimated and determined according to an estimating formula; each layer of the third dimension layer number is partitioned according to the data with the smallest redundancy rate to obtain a data distribution mode.According to the method, the purpose of decreasing data partitioning blocks by increasing an appropriate amount of the communication quantity among layers is achieved, the communication quantity during distributed graph data processing is decreased, and the calculation efficiency is improved.

Description

Diagram data division methods and device
Technical field
The present invention relates to technical field of data processing, particularly relate to a kind of diagram data division methods and device.
Background technology
Along with diagram data constantly increasing in data volume and importance, distributed diagram data processes engine and is increasingly becoming the solution of a kind of main flow.This demand orders about many developers of industrial quarters and academia and research worker have developed many diversified figure parallel processing frameworks, including Pregel, PowerGraph, GraphX etc..Due to these systems provide that a kind of simple and effective DLL makes that programmers can be seamless by their extension to multi-machine surroundings, these are originally not only used to traditional map analysis at present in order to scheme to calculate designed system, be also used to many can with the figure machine learning being modeled and data mining task.
The such as purpose of this class data mining problem of CollaborativeFiltering is in that by known user, the marking set of article to be predicted the part of the unknown.This problem is described with matrix the earliest.As shown in Figure 1, briefly, a given sparse matrix R being sized to N*M, the purpose solved is that R resolves into the product (size of P and Q respectively N*D and M*D, and R approximate P*Q^T wherein D much smaller than N and M) of two low-dimensional dense matrix P and Q.Relative, this problem is described by equally possible graph model, and wherein every a line of P and Q corresponds respectively to the point in a bipartite graph.The attribute of each point is a length is the vector of D, and scoring matrix R is then corresponding to the mark on limit.In other words the marking of article v is Ruv by user u, then the limit weights between u and v are Ruv.
Along with the growth that the medium-and-large-sized diagram data of machine learning and data mining task is analyzed this demand, the problem not being taken into account before many also occurs therewith.According to some recent researchs, in order to process diagram data efficiently, a crucial research point is how traffic when minimizing figure processes.Therefore, a distributed figure processes engine needs the task partitioning algorithm selecting him to use of extreme care.But, according to investigation, current existing figure processes engine and substantially have ignored the characteristic of machine learning and data mining task and think that the attribute of each point is inseparable in figure, therefore calculating of task of figure is divided of equal value for general Graph partition problem simply., as this class data mining problem of CollaborativeFiltering, the attribute of each point is actually a vector in a lot.Although the length of this vector is general and little, it is necessary to figure is divided by a kind of new dimension, solve traffic problem.
Summary of the invention
The purpose of the present invention is intended to solve at least to a certain extent one of above-mentioned technical problem.
For this, first purpose of the present invention is in that to propose a kind of diagram data division methods.The method achieve by increasing data divided block number in the appropriate interlayer traffic each layer of minimizing, reduce traffic when distributed diagram data processes, promote computational efficiency.
Second purpose of the present invention is in that proposing a kind of diagram data divides device.
For reaching above-mentioned purpose, the diagram data division methods of first aspect present invention embodiment, according to the data of figure definition and computation model UPPS (UpdatePushPullSink), algorithm is modeled;By 2-D data division methods, the data in described modeling are divided, and obtain the data that redundancy after division is minimum;The third dimension number of plies of the best is determined according to estimating formula estimation;And according to the data that described redundancy is minimum, each layer of the described third dimension number of plies is divided, it is thus achieved that the distribution mode of data.
The diagram data division methods of the embodiment of the present invention, algorithm is modeled by data and computation model according to figure definition, by 2-D data division methods, the data in modeling are divided, and obtain the data that redundancy after division is minimum, the third dimension number of plies of the best is determined further according to estimating formula estimation, each layer of the third dimension number of plies is divided by the data minimum finally according to redundancy, it is thus achieved that the distribution mode of data.The method achieve by increasing data divided block number in the appropriate interlayer traffic each layer of minimizing, reduce traffic when distributed diagram data processes, promote computational efficiency.
In one embodiment of the invention, described computation model UPPS specifically includes: at a datagram as basic operating unit, distinguishes the data DShare that cannot divide and the data DColle that can divide, and wherein DColle is a length is the vector of SC;4 kinds of action types are provided, it is respectively as follows: Update, Push, Pull and Sink, wherein, described Update operates the point for obtaining each figure or limit and all data of described each figure is updated, the operation of described Push is update meeting point by the data on source point and limit, the operation of described Pull is update source point by the data on meeting point and limit, and the operation of described Sink is update limit by the data of source point and meeting point
In one embodiment of the invention, described 2-D data method includes: point divides, limit divides and compound divides.
In one embodiment of the invention, described according to estimating formula estimation determine the best the third dimension number of plies specifically include: determine the estimating formula of each layer of traffic according to 4 kinds of action types in described computing module UPPS;The final traffic of each layer, the third dimension number of plies described in the conduct that selection traffic is minimum is determined according to described estimating formula.
In one embodiment of the invention, according to the data that described redundancy is minimum, each layer of the described third dimension number of plies is divided, it is thus achieved that the distribution mode of data specifically includes: computing node corresponding to the minimum data of described redundancy is divided into third dimension number of plies group;Each group of computing node in described third dimension number of plies group is responsible for safeguarding a layer in the described third dimension number of plies.
For reaching above-mentioned purpose, the diagram data of second aspect present invention embodiment divides device, including processing module, for the data according to figure definition and computation model, algorithm is modeled;First divides module, for the data in described modeling being divided by 2-D data division methods;Acquisition module, divides, for obtaining described first, the data that redundancy is minimum after Module Division;Determine module, for determining the third dimension number of plies of the best according to estimating formula estimation;And second divide module, for each layer of the described third dimension number of plies being divided according to the data that described redundancy is minimum, it is thus achieved that the distribution mode of data.
The diagram data of the embodiment of the present invention divides device, algorithm is modeled by processing module according to data and the computation model of figure definition, first divides module is divided the data in modeling by 2-D data division methods, acquisition module obtains the data that after dividing, redundancy is minimum, determine that module determines the third dimension number of plies of the best according to estimating formula estimation, the second last divides module and according to the data that redundancy is minimum, each layer of the third dimension number of plies is divided, it is thus achieved that the distribution mode of data.The arrangement achieves by increasing data divided block number in the appropriate interlayer traffic each layer of minimizing, reduce traffic when distributed diagram data processes, promote computational efficiency.
In one embodiment of the invention, described computation model UPPS specifically includes: at a datagram as basic operating unit, distinguishes the data DShare that cannot divide and the data DColle that can divide, and wherein DColle is a length is the vector of SC;4 kinds of action types are provided, it is respectively as follows: Update, Push, Pull and Sink, wherein, described Update operates the point for obtaining each figure or limit and all data of described each figure is updated, the operation of described Push is update meeting point by the data on source point and limit, the operation of described Pull is update source point by the data on meeting point and limit, and the operation of described Sink is update limit by the data of source point and meeting point
In one embodiment of the invention, described 2-D data method includes: point divides, limit divides and compound divides.
In one embodiment of the invention, described determine that module specifically includes: first determines unit, for determining the estimating formula of each layer of traffic according to 4 kinds of action types in described computing module;Second determines unit, for determining the final traffic of each layer according to described estimating formula;Select unit, for selecting the third dimension number of plies described in the minimum conduct of traffic.
In one embodiment of the invention, described second divide module and specifically includes: division unit, the computing node corresponding for data that described redundancy is minimum is divided into third dimension number of plies group;Maintenance unit, is responsible for safeguarding a layer in the described third dimension number of plies for each group of computing node in described third dimension number of plies group.
Aspect and advantage that the present invention adds will part provide in the following description, and part will become apparent from the description below, or is recognized by the practice of the present invention.
Accompanying drawing explanation
Above-mentioned and/or the additional aspect of the present invention and advantage are from conjunction with will be apparent from easy to understand the accompanying drawings below description to embodiment, wherein:
Fig. 1 works in coordination with partition problem model schematic according to an embodiment of the invention;
Fig. 2 is the flow chart of diagram data division methods according to an embodiment of the invention;
Fig. 3 is the schematic diagram of UPPS model according to an embodiment of the invention;
Fig. 4 is the UPPS model application drawing of SGD algorithm according to an embodiment of the invention;
The schematic diagram of Fig. 5 different dimensional division methods according to an embodiment of the invention;
Fig. 6 is the speed-up ratio schematic diagram data of diagram data division methods according to an embodiment of the invention;
Fig. 7 is the flow chart of diagram data division methods in accordance with another embodiment of the present invention;
Fig. 8 is that diagram data divides the structural representation of device according to an embodiment of the invention.
Detailed description of the invention
Being described below in detail embodiments of the invention, the example of described embodiment is shown in the drawings, and wherein same or similar label represents same or similar element or has the element of same or like function from start to finish.The embodiment described below with reference to accompanying drawing is illustrative of, it is intended to is used for explaining the present invention, and is not considered as limiting the invention.
Below with reference to the accompanying drawings diagram data division methods and the device of the embodiment of the present invention are described.
Fig. 2 is the flow chart of diagram data division methods according to an embodiment of the invention.As in figure 2 it is shown, this diagram data division methods may include that
S1, is modeled algorithm according to the data of figure definition and computation model.
Specifically, user sees the summit in mapping, and the relation between user and user sees that the limit in mapping, whole network are just considered as a network, and according to figure definition data, algorithm is modeled by data and computation model further according to definition.
It should be noted that, in one embodiment of the invention, as it is shown on figure 3, computation model UPPS specifically includes: at a datagram as basic operating unit, the data DShare that differentiation cannot divide and the data DColle that can divide, wherein DColle is a length is the vector of SC;4 kinds of action types are provided, it is respectively as follows: Update, Push, Pull and Sink, wherein, Update operates the point for obtaining each figure or limit and all data of each figure is updated, the operation of Push is update meeting point by the data on source point and limit, the operation of Pull is update source point by the data on meeting point and limit, and the operation of Sink is update limit by the data of source point and meeting point.
More specifically, use datagram can be G as basic operating unit, user can add arbitrary data each point in figure and on limit.Computation model can be UPPS (UpdatePushPullSink), the operation of 4 kinds of types is provided in computing, wherein U represents Update, operates for UpdateVertex or UpdateEdge, and each point or limit can be taken its all of data and be updated.For a limit, (u, v), the operation of the Push data on source point u and limit update meeting point v;The operation of Pull is then contrary, updates source point u by the data on meeting point v and limit;The operation of Sink is then update limit by the data of two points.
It should be noted that most machine learning and data mining algorithm all can be by what above-mentioned computation model was described, for instance for SGD algorithm, general SGD algorithm is used to solve CollaborativeFiltering problem.Its principle is initialized target matrix P and Q, and the method then constantly declined along gradient is updated, and concrete more new formula is as follows:
P i n e w = P i + α * ( E r r i , j * Q j - α * P i ) Q j n e w = Q i + α * ( E r r i , j * P i - α * Q j ) - - - ( 1 )
Wherein, PiAnd QjRepresenting the i-th row and the jth row of P and Q the two matrix respectively, α is for updating stride, ErrFor corresponding and error functions.
As shown in Figure 4, SC is set to D, that one-dimensional length that namely objective matrix is less, and the DShare data on each limit also have a corresponding wrong E except corresponding marking raterr.Error functions is calculated and can be realized by a Push and Pull function each time.
Data in modeling are divided by S2 by 2-D data division methods, and obtain the data that redundancy after division is minimum.
In one embodiment of the invention, 2-D data method includes: point divides, limit divides and compound divides.
It should be noted that good task partitioning algorithm needs first the limit in figure is as much as possible all given each computing node, because this quantity is proportional to amount of calculation;And second be reduced as far as the limit or the size summation of point that repeat.
Point divides.This method gives the set of each computing node one independent point of distribution and all of limit relevant to this set midpoint.After cutting, the calculation of each node is the value by reading on limit, through calculating and update oneself weights and all go out value on limit.This class method is owing to being the cutting that carries out of opposite side and the unit distributed is that point is therefore again often referred as " limit cutting " or " based on point " dividing method.
Limit divides.This class scheme can say that each computing node is distributed on the limit in figure equably, and is that the point in the figure shared by multiple computing nodes sets up backup relation for synchronizing.Owing to the effect in the diagram data meeting powerlaw common in realistic problem is substantially better than the splitting scheme of 1D, this class 2D algorithm is adopted by most of figure computing engines later, including PowerGraph and CombBALS.In like manner, the algorithm of two dimension is also tasted and is referred to as " some cutting " or " based on limit ".
Compound divides.Recent PowerLyra system proposes a kind of new division methods " compound division " by name.It is known as the advantage combining peacekeeping two kinds of division methods of two dimension.But, with regard to it basically it is believed that compound divides remains a kind of special two-dimentional division methods, only it is the use of specific heuristic and has reached to reduce the purpose of traffic.
S3, determines the third dimension number of plies of the best according to estimating formula estimation.
Utilize estimating formula to estimate and determine the third dimension number of plies of the best, to reach minimum data communication amount.Particularly for the operation four kinds different in UPPS model, sets forth the estimating formula of its traffic under the specific number of plies.
S4, divides each layer of the third dimension number of plies according to the data that redundancy is minimum, it is thus achieved that the distribution mode of data.
Concrete example, each point in figure can be split as K sub-point, and in like manner, the computing node in cluster is also divided into K group.Each of which group computing node is responsible for safeguarding a layer in K layer figure, the subgraph being namely made up of corresponding son point.Extreme, for a cluster having N number of computing node, if setting K=N, each node holds the structure of whole figure.Contrary, if setting K=1, then rollback is the division methods using two dimension.As it is shown in figure 5, illustrate the division exemplary plot of three kinds of different demarcation methods one-dimensional, two-dimentional, three-dimensional.Need the number being divided to decrease owing to cluster having carried out figure in packet therefore each layer, therefore just decrease quantity and the traffic of backup.Fig. 6 is the speed-up ratio schematic diagram data of diagram data division methods according to an embodiment of the invention.According to Fig. 6, diagram data division methods improves computational efficiency.
The diagram data division methods of the embodiment of the present invention, algorithm is modeled by data and computation model according to figure definition, by 2-D data division methods, the data in modeling are divided, and obtain the data that redundancy after division is minimum, the third dimension number of plies of the best is determined further according to estimating formula estimation, each layer of the third dimension number of plies is divided by the data minimum finally according to redundancy, it is thus achieved that the distribution mode of data.The method achieve by increasing data divided block number in the appropriate interlayer traffic each layer of minimizing, reduce traffic when distributed diagram data processes, promote computational efficiency.
Fig. 7 is the flow chart of diagram data division methods in accordance with another embodiment of the present invention.
As it is shown in fig. 7, this diagram data division methods may include that
S71, is modeled algorithm according to the data of figure definition and computation model.
Specifically, user sees the summit in mapping, and the relation between user and user sees that the limit in mapping, whole network are just considered as a network, and according to figure definition data, algorithm is modeled by data and computation model further according to definition.
Data in described modeling are divided by S72 by 2-D data division methods, and obtain the data that redundancy after division is minimum.
In one embodiment of the invention, 2-D data method includes: point divides, limit divides and compound divides.
S73, determines the estimating formula of each layer of traffic according to 4 kinds of action types in computation model UPPS.
According to the operation providing 4 kinds of types in computing, wherein U represents Update, operates for UpdateVertex or UpdateEdge, and each point or limit can be taken its all of data and be updated.For a limit, (u, v), the operation of the Push data on source point u and limit update meeting point v;The operation of Pull is then contrary, updates source point u by the data on meeting point v and limit;The operation of Sink is then update limit by the data of two points.Determine the estimating formula of each layer of traffic
S74, determines the final traffic of each layer, the third dimension number of plies described in the conduct that selection traffic is minimum according to estimating formula.
S75, the computing node selecting the minimum data of redundancy corresponding is divided into third dimension number of plies group.
S76, each group of computing node in third dimension number of plies group is responsible for one layer that safeguards in the third dimension number of plies.
Algorithm is modeled by data and computation model according to figure definition, by 2-D data division methods, the data in modeling are divided, and obtain the data that redundancy after division is minimum, the third dimension number of plies of the best is determined further according to estimating formula estimation, each layer of the third dimension number of plies is divided by the data minimum finally according to redundancy, it is thus achieved that the distribution mode of data.The method achieve by increasing data divided block number in the appropriate interlayer traffic each layer of minimizing, reduce traffic when distributed diagram data processes, promote computational efficiency.
Corresponding with the diagram data division methods that above-mentioned several embodiments provide, a kind of embodiment of the present invention also provides for a kind of diagram data and divides device, the diagram data division device provided due to the embodiment of the present invention is corresponding with the diagram data division methods that above-mentioned several embodiments provide, therefore it is also applied for, at the embodiment of earlier figures data partition method, the diagram data division device that the present embodiment provides, is not described in detail in the present embodiment.Fig. 8 is that diagram data divides the structural representation of device according to an embodiment of the invention.As shown in Figure 8, this diagram data divides device and may include that processing module 10, first divides module 20, acquisition module 30, determines that module 40 and second divides module 50.
Wherein, algorithm is modeled by processing module 10 for the data according to figure definition and computation model.First divides module 20 for the data in modeling being divided by 2-D data division methods.The data that acquisition module 30 is minimum for obtaining redundancy after the first division Module Division.Determine that module 40 for determining the third dimension number of plies of the best according to estimating formula estimation.And second divide module 50 for each layer of the third dimension number of plies being divided according to the data that redundancy is minimum, it is thus achieved that the distribution mode of data.
In one embodiment of the invention, computation model specifically includes: at a datagram as basic operating unit, distinguishes the data DShare that cannot divide and the data DColle that can divide, and wherein DColle is a length is the vector of SC;Thering is provided 4 kinds of action types, wherein, a kind of action type is obtain the point of each figure or limit and all data of each figure are updated, and additionally the data in each layer are only operated by three types.
In one embodiment of the invention, 2-D data method includes: point divides, limit divides and compound divides.
In one embodiment of the invention, it is determined that module 40 specifically includes: first determines that unit 401 for determining the estimating formula of each layer of traffic according to 4 kinds of action types in computation model.Second determines the unit 402 final traffic for determining each layer according to estimating formula.Select unit 403 for select traffic minimum conduct described in the third dimension number of plies.
In one embodiment of the invention, the second division module 50 specifically includes: the computing node that division unit 501 is used for the minimum data of redundancy corresponding is divided into third dimension number of plies group.Maintenance unit 502 is responsible for one layer that safeguards in the third dimension number of plies for each group of computing node in third dimension number of plies group.
The diagram data of the embodiment of the present invention divides device, algorithm is modeled by processing module according to data and the computation model of figure definition, first divides module is divided the data in modeling by 2-D data division methods, acquisition module obtains the data that after dividing, redundancy is minimum, determine that module determines the third dimension number of plies of the best according to estimating formula estimation, the second last divides module and according to the data that redundancy is minimum, each layer of the third dimension number of plies is divided, it is thus achieved that the distribution mode of data.The arrangement achieves by increasing data divided block number in the appropriate interlayer traffic each layer of minimizing, reduce traffic when distributed diagram data processes, promote computational efficiency.
In describing the invention, it is to be understood that term " first ", " second " only for descriptive purposes, and it is not intended that instruction or hint relative importance or the implicit quantity indicating indicated technical characteristic.Thus, define " first ", the feature of " second " can express or implicitly include at least one this feature.In describing the invention, " multiple " are meant that at least two, for instance two, three etc., unless otherwise expressly limited specifically.
In the description of this specification, specific features, structure, material or feature that the description of reference term " embodiment ", " some embodiments ", " example ", " concrete example " or " some examples " etc. means in conjunction with this embodiment or example describe are contained at least one embodiment or the example of the present invention.In this manual, the schematic representation of above-mentioned term is necessarily directed to identical embodiment or example.And, the specific features of description, structure, material or feature can combine in one or more embodiments in office or example in an appropriate manner.Additionally, when not conflicting, the feature of the different embodiments described in this specification or example and different embodiment or example can be carried out combining and combining by those skilled in the art.
Describe in flow chart or in this any process described otherwise above or method and be construed as, represent and include the module of code of executable instruction of one or more step for realizing specific logical function or process, fragment or part, and the scope of the preferred embodiment of the present invention includes other realization, wherein can not press order that is shown or that discuss, including according to involved function by basic mode simultaneously or in the opposite order, performing function, this should be understood by embodiments of the invention person of ordinary skill in the field.
Should be appreciated that each several part of the present invention can realize with hardware, software, firmware or their combination.In the above-described embodiment, multiple steps or method can realize with the storage software or firmware in memory and by suitable instruction execution system execution.Such as, if realized with hardware, the same in another embodiment, can realize by any one in following technology well known in the art or their combination: there is the discrete logic of logic gates for data signal realizes logic function, there is the special IC of suitable combination logic gate circuit, programmable gate array (PGA), field programmable gate array (FPGA) etc..
Those skilled in the art are appreciated that realizing all or part of step that above-described embodiment method carries can be by the hardware that program carrys out instruction relevant and complete, described program can be stored in a kind of computer-readable recording medium, this program upon execution, including the step one or a combination set of of embodiment of the method.
Additionally, each functional unit in each embodiment of the present invention can be integrated in a processing module, it is also possible to be that unit is individually physically present, it is also possible to two or more unit are integrated in a module.Above-mentioned integrated module both can adopt the form of hardware to realize, it would however also be possible to employ the form of software function module realizes.If described integrated module is using the form realization of software function module and as independent production marketing or use, it is also possible to be stored in a computer read/write memory medium.
Storage medium mentioned above can be read only memory, disk or CD etc..Although above it has been shown and described that embodiments of the invention, it is understandable that, above-described embodiment is illustrative of, it is impossible to be interpreted as limitation of the present invention, and above-described embodiment can be changed, revises, replace and modification by those of ordinary skill in the art within the scope of the invention.

Claims (10)

1. a diagram data division methods, it is characterised in that comprise the following steps:
Algorithm is modeled by data and computation model UPPS according to figure definition;
By 2-D data division methods, the data in described modeling are divided, and obtain the data that redundancy after division is minimum;
The third dimension number of plies of the best is determined according to estimating formula estimation;And
According to the data that described redundancy is minimum, each layer of the described third dimension number of plies is divided, it is thus achieved that the distribution mode of data.
2. diagram data division methods as claimed in claim 1, it is characterised in that described computation model UPPS specifically includes:
At a datagram as basic operating unit, distinguishing the data DShare that cannot divide and the data DColle that can divide, wherein DColle is a length is the vector of SC;
4 kinds of action types are provided, it is respectively as follows: Update, Push, Pull and Sink, wherein, described Update operates the point for obtaining each figure or limit and all data of described each figure is updated, the operation of described Push is update meeting point by the data on source point and limit, the operation of described Pull is update source point by the data on meeting point and limit, and the operation of described Sink is update limit by the data of source point and meeting point.
3. diagram data division methods as claimed in claim 1, it is characterised in that described 2-D data method includes: point divides, limit divides and compound divides.
4. diagram data division methods as claimed in claim 1, it is characterised in that described determine that the third dimension number of plies of the best specifically includes according to estimating formula estimation:
The estimating formula of each layer of traffic is determined according to 4 kinds of action types in described computation model UPPS;
The final traffic of each layer, the third dimension number of plies described in the conduct that selection traffic is minimum is determined according to described estimating formula.
5. diagram data division methods as claimed in claim 1, it is characterised in that according to the data that described redundancy is minimum, each layer of the described third dimension number of plies is divided, it is thus achieved that the distribution mode of data specifically includes:
Computing node corresponding to the minimum data of described redundancy is divided into third dimension number of plies group;
Each group of computing node in described third dimension number of plies group is responsible for safeguarding a layer in the described third dimension number of plies.
6. a diagram data divides device, it is characterised in that including:
Processing module, is modeled algorithm for the data according to figure definition and computation model UPPS;
First divides module, for the data in described modeling being divided by 2-D data division methods;
Acquisition module, divides, for obtaining described first, the data that redundancy is minimum after Module Division;
Determine module, for determining the third dimension number of plies of the best according to estimating formula estimation;And
Second divides module, for each layer of the described third dimension number of plies being divided according to the data that described redundancy is minimum, it is thus achieved that the distribution mode of data.
7. diagram data as claimed in claim 6 divides device, it is characterised in that described computation model UPPS specifically includes:
At a datagram as basic operating unit, distinguishing the data DShare that cannot divide and the data DColle that can divide, wherein DColle is a length is the vector of SC;
4 kinds of action types are provided, it is respectively as follows: Update, Push, Pull and Sink, wherein, described Update operates the point for obtaining each figure or limit and all data of described each figure is updated, the operation of described Push is update meeting point by the data on source point and limit, the operation of described Pull is update source point by the data on meeting point and limit, and the operation of described Sink is update limit by the data of source point and meeting point.
8. diagram data as claimed in claim 6 divides device, it is characterised in that described 2-D data method includes: point divides, limit divides and compound divides.
9. diagram data as claimed in claim 6 divides device, it is characterised in that described determine that module specifically includes:
First determines unit, for determining the estimating formula of each layer of traffic according to 4 kinds of action types in described computation model UPPS;
Second determines unit, for determining the final traffic of each layer according to described estimating formula;
Select unit, for selecting the third dimension number of plies described in the minimum conduct of traffic.
10. diagram data as claimed in claim 6 divides device, it is characterised in that described second divides module specifically includes:
Division unit, the computing node corresponding for data that described redundancy is minimum is divided into third dimension number of plies group;
Maintenance unit, is responsible for safeguarding a layer in the described third dimension number of plies for each group of computing node in described third dimension number of plies group.
CN201610101409.1A 2016-02-24 2016-02-24 Diagram data partitioning method and device Active CN105787020B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610101409.1A CN105787020B (en) 2016-02-24 2016-02-24 Diagram data partitioning method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610101409.1A CN105787020B (en) 2016-02-24 2016-02-24 Diagram data partitioning method and device

Publications (2)

Publication Number Publication Date
CN105787020A true CN105787020A (en) 2016-07-20
CN105787020B CN105787020B (en) 2019-05-21

Family

ID=56402354

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610101409.1A Active CN105787020B (en) 2016-02-24 2016-02-24 Diagram data partitioning method and device

Country Status (1)

Country Link
CN (1) CN105787020B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108304586A (en) * 2018-03-07 2018-07-20 南京审计大学 A kind of availability of data improvement method of task orientation
CN113326125A (en) * 2021-05-20 2021-08-31 清华大学 Large-scale distributed graph calculation end-to-end acceleration method and device
CN113792170A (en) * 2021-11-15 2021-12-14 支付宝(杭州)信息技术有限公司 Graph data dividing method and device and computer equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102968475A (en) * 2012-11-16 2013-03-13 上海交通大学 Secure nearest neighbor query method and system based on minimum redundant data partition
WO2013009503A3 (en) * 2011-07-08 2014-05-30 Yale University Query execution systems and methods
CN104820705A (en) * 2015-05-13 2015-08-05 华中科技大学 Extensible partition method for associated flow graph data
CN105117488A (en) * 2015-09-19 2015-12-02 大连理工大学 RDF data balance partitioning algorithm based on mixed hierarchical clustering

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013009503A3 (en) * 2011-07-08 2014-05-30 Yale University Query execution systems and methods
CN102968475A (en) * 2012-11-16 2013-03-13 上海交通大学 Secure nearest neighbor query method and system based on minimum redundant data partition
CN104820705A (en) * 2015-05-13 2015-08-05 华中科技大学 Extensible partition method for associated flow graph data
CN105117488A (en) * 2015-09-19 2015-12-02 大连理工大学 RDF data balance partitioning algorithm based on mixed hierarchical clustering

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
武永卫等: "一种分布式吴方法计算模型", 《软件学报》 *
王志刚等: "一种分布式吴方法计算模型", 《计算机学报》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108304586A (en) * 2018-03-07 2018-07-20 南京审计大学 A kind of availability of data improvement method of task orientation
CN113326125A (en) * 2021-05-20 2021-08-31 清华大学 Large-scale distributed graph calculation end-to-end acceleration method and device
CN113792170A (en) * 2021-11-15 2021-12-14 支付宝(杭州)信息技术有限公司 Graph data dividing method and device and computer equipment
CN113792170B (en) * 2021-11-15 2022-03-15 支付宝(杭州)信息技术有限公司 Graph data dividing method and device and computer equipment
WO2023083241A1 (en) * 2021-11-15 2023-05-19 支付宝(杭州)信息技术有限公司 Graph data division

Also Published As

Publication number Publication date
CN105787020B (en) 2019-05-21

Similar Documents

Publication Publication Date Title
AU2020203909B2 (en) Data lineage summarization
Pal et al. Optimizing multi-GPU parallelization strategies for deep learning training
US8605092B2 (en) Method and apparatus of animation planning for a dynamic graph
Kim et al. DBCURE-MR: An efficient density-based clustering algorithm for large data using MapReduce
CN102792271B (en) Cross over many-core systems DYNAMIC DISTRIBUTION multidimensional working set
CN102915347B (en) A kind of distributed traffic clustering method and system
CN103971417A (en) Geometrical elements transformed by rigid motions
Doraiswamy et al. Efficient algorithms for computing Reeb graphs
CN102360494B (en) Interactive image segmentation method for multiple foreground targets
US20130151535A1 (en) Distributed indexing of data
Fišer et al. Growing neural gas efficiently
JP2010033561A (en) Method and apparatus for partitioning and sorting data set on multiprocessor system
CN113763700B (en) Information processing method, information processing device, computer equipment and storage medium
CN114327844A (en) Memory allocation method, related device and computer readable storage medium
CN105787020A (en) Graph data partitioning method and device
CN106164795B (en) Optimization method for classified alarm
CN112541584B (en) Deep neural network model parallel mode selection method
KR102284532B1 (en) Method for predicting molecular activity and apparatus therefor
Meyerhenke et al. Drawing large graphs by multilevel maxent-stress optimization
CN110375759A (en) Multi-robots Path Planning Method based on ant group algorithm
CN103366401B (en) Quick display method for multi-level virtual clothes fitting
Krumke et al. Robust absolute single machine makespan scheduling-location problem on trees
Boukhdhir et al. An improved MapReduce Design of Kmeans for clustering very large datasets
US20190005169A1 (en) Dynamic Design of Complex System-of-Systems for Planning and Adaptation to Unplanned Scenarios
Baldo et al. Performance models for master/slave parallel programs

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20191227

Address after: 9F Asia Pacific Road No. 705 Jiaxing Nanhu District of Zhejiang Province in 314006

Patentee after: Qinghua Changsanjiao Research Inst., Zhejiang

Address before: 315105 Zhejiang city of Ningbo province Yinzhou District Qiming Road No. 818 building 14, No. 108

Patentee before: Innovation center of Yin Zhou Qinghua Changsanjiao Research Inst., Zhejiang

TR01 Transfer of patent right