CN105787020B - Diagram data partitioning method and device - Google Patents
Diagram data partitioning method and device Download PDFInfo
- Publication number
- CN105787020B CN105787020B CN201610101409.1A CN201610101409A CN105787020B CN 105787020 B CN105787020 B CN 105787020B CN 201610101409 A CN201610101409 A CN 201610101409A CN 105787020 B CN105787020 B CN 105787020B
- Authority
- CN
- China
- Prior art keywords
- data
- plies
- division
- layer
- divided
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Fuzzy Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of diagram data partitioning method and devices, wherein this method includes being modeled according to the data and computation model UPPS of figure definition to algorithm;The data in modeling are divided by 2-D data division methods, and obtain the smallest data of redundancy after division;The optimal third dimension number of plies is determined according to estimating formula estimation;And divided according to each layer of the smallest data of redundancy to the third dimension number of plies, obtain the distribution mode of data.The method achieve data division block number in each layer is reduced by increasing suitable interlayer traffic, traffic when distributed diagram data processing is reduced, computational efficiency is promoted.
Description
Technical field
The present invention relates to technical field of data processing more particularly to a kind of diagram data partitioning method and devices.
Background technique
With continuous growth of the diagram data in data volume and importance, distributed diagram data processing engine is increasingly becoming
A kind of solution of mainstream.This demand drives many developers of industry and academia and researcher to have developed
Many diversified figure parallel processing frames, including Pregel, PowerGraph, GraphX etc..Since these systems mention
Supply a kind of simple and effective programming interface that programmers is made seamless their program can be expanded to multi-machine surroundings, this
It is a little originally in order to which the system for scheming to calculate design is not only only applied to traditional map analysis at present, be also used for it is many can with figure into
The machine learning of row modeling and data mining task.
Such as the purpose of Collaborative Filtering this kind data mining problem is through known user
Unknown part is predicted to the marking set of article.This problem is described with matrix earliest.As shown in Figure 1, simple
It singly says, give the sparse matrix R that a size is N*M, the purpose of solution is that R is resolved into two low-dimensional dense matrix P and Q
Product (size of P and Q are respectively N*D and M*D, and R is approximately equal to P*Q^T wherein D is much smaller than N and M).Opposite, together
Sample can be described this problem with graph model, and wherein every a line of P and Q corresponds respectively to the point in a bipartite graph.
The attribute of each point is the vector that a length is D, and scoring matrix R then corresponds to the score on side.In other words user u
If being Ruv to the marking of article v, then the side right value between u and v is Ruv.
With the growth for analyze to machine learning and the medium-and-large-sized diagram data of data mining task this demand, many
Preceding the problem of not being taken into account, also occurs therewith.According to recent some researchs, in order to efficiently handle diagram data, one
A crucial research point is how the traffic when processing of reduction figure.Therefore, a distributed figure processing engine needs very
The careful task partitioning algorithm for selecting him to use.However, existing figure processing engine all has ignored substantially at present according to investigation
The characteristic of machine learning and data mining task and the attribute for thinking in figure that each is put is inseparable, therefore figure is calculated
Task divide simply equivalence for general Graph partition problem.But as Collaborative Filtering this
Class data mining problem is such, each attribute put is actually a vector when very much.Although this vector
Length is general and little, needs a kind of new dimension to divide figure, solves the problems, such as traffic.
Summary of the invention
The purpose of the present invention is intended to solve above-mentioned one of technical problem at least to a certain extent.
For this purpose, the first purpose of this invention is to propose a kind of diagram data division methods.The method achieve pass through to increase
Add suitable interlayer traffic to reduce data in each layer and divide block number, reduces traffic when distributed diagram data processing, mention
Rise computational efficiency.
Second object of the present invention is to propose a kind of diagram data dividing device.
In order to achieve the above object, the diagram data division methods of first aspect present invention embodiment, according to the data of figure definition and
Computation model UPPS (Update Push Pull Sink) models algorithm;By 2-D data division methods to described
Data in modeling are divided, and obtain the smallest data of redundancy after division;It is optimal according to estimating formula estimation determination
The third dimension number of plies;And divided according to each layer of the smallest data of the redundancy to the third dimension number of plies, it obtains
The distribution mode of data.
The diagram data division methods of the embodiment of the present invention build algorithm according to the data of figure definition and computation model
Mould divides the data in modeling by 2-D data division methods, and obtains the smallest data of redundancy after division, then
The optimal third dimension number of plies is determined according to estimating formula estimation, finally according to the smallest data of redundancy to the every of the third dimension number of plies
One layer is divided, and the distribution mode of data is obtained.The method achieve each by increasing suitable interlayer traffic reduction
Data divide block number in layer, reduce traffic when distributed diagram data processing, promote computational efficiency.
In one embodiment of the invention, the computation model UPPS is specifically included: in a datagram as basic
Operating unit, distinguish the data DShare that cannot divide and the data DColle that can be divided, wherein DColle is one
Length is the vector of SC;4 kinds of action types are provided, Update, Push, Pull and Sink are respectively as follows:, wherein the Update
Operation be to obtain the point of each figure or side and all data of each figure are updated, the operation of the Push is to use
The data on source point and side update meeting point, and the operation of the Pull is to update source point, the behaviour of the Sink with the data on meeting point and side
Side is updated as with the data of source point and meeting point
In one embodiment of the invention, the 2-D data division methods include: divide, side divides and compound stroke
Point.
In one embodiment of the invention, described to determine that the optimal third dimension number of plies is specifically wrapped according to estimating formula estimation
It includes: determining the estimating formula of each layer of traffic according to 4 kinds of action types in the computing module UPPS;According to the estimation
Formula determines each layer of final traffic, selects traffic the smallest as the third dimension number of plies.
In one embodiment of the invention, according to the smallest data of the redundancy to each of the third dimension number of plies
Layer is divided, and the distribution mode for obtaining data specifically includes: the corresponding calculate node of the smallest data of redundancy divides
For third dimension number of plies group;Each group of calculate node in the third dimension number of plies group is responsible for safeguarding one in the third dimension number of plies
Layer.
In order to achieve the above object, the diagram data dividing device of second aspect of the present invention embodiment, including processing module, it is used for root
Algorithm is modeled according to the data and computation model of figure definition;First division module, for passing through 2-D data division methods
Data in the modeling are divided;Module is obtained, it is minimum for obtaining redundancy after first division module divides
Data;Determining module, for determining the optimal third dimension number of plies according to estimating formula estimation;And second division module, it uses
It is divided according to each layer of the smallest data of the redundancy to the third dimension number of plies, obtains the distribution side of data
Formula.
The diagram data dividing device of the embodiment of the present invention, processing module is according to the data and computation model of figure definition to algorithm
It is modeled, the first division module divides the data in modeling by 2-D data division methods, obtains module and obtains
The smallest data of redundancy after division, determining module determine the optimal third dimension number of plies, the second last according to estimating formula estimation
Division module is divided according to each layer of the smallest data of redundancy to the third dimension number of plies, obtains the distribution mode of data.
The arrangement achieves data division block number in each layer is reduced by increasing suitable interlayer traffic, distributed diagram data is reduced
Traffic when processing promotes computational efficiency.
In one embodiment of the invention, the computation model UPPS is specifically included: in a datagram as basic
Operating unit, distinguish the data DShare that cannot divide and the data DColle that can be divided, wherein DColle is one
Length is the vector of SC;4 kinds of action types are provided, Update, Push, Pull and Sink are respectively as follows:, wherein the Update
Operation be to obtain the point of each figure or side and all data of each figure are updated, the operation of the Push is to use
The data on source point and side update meeting point, and the operation of the Pull is to update source point, the behaviour of the Sink with the data on meeting point and side
Side is updated as with the data of source point and meeting point
In one embodiment of the invention, the 2-D data division methods include: divide, side divides and compound stroke
Point.
In one embodiment of the invention, the determining module specifically includes: the first determination unit, for according to
4 kinds of action types in computing module determine the estimating formula of each layer of traffic;Second determination unit, for estimating according to
Survey the final traffic that formula determines each layer;Selecting unit, for selecting traffic the smallest as the third dimension number of plies.
In one embodiment of the invention, second division module specifically includes: division unit, is used for the redundancy
It spends the corresponding calculate node of the smallest data and is divided into third dimension number of plies group;Maintenance unit, in the third dimension number of plies group
Each group of calculate node be responsible for safeguarding one layer in the third dimension number of plies.
The additional aspect of the present invention and advantage will be set forth in part in the description, and will partially become from the following description
Obviously, or practice through the invention is recognized.
Detailed description of the invention
Above-mentioned and/or additional aspect of the invention and advantage will become from the description of the embodiment in conjunction with the following figures
Obviously and it is readily appreciated that, in which:
Fig. 1 is collaboration partition problem model schematic according to an embodiment of the invention;
Fig. 2 is the flow chart of diagram data division methods according to an embodiment of the invention;
Fig. 3 is the schematic diagram of UPPS model according to an embodiment of the invention;
Fig. 4 is the UPPS model application drawing of SGD algorithm according to an embodiment of the invention;
The schematic diagram of Fig. 5 different dimensional division methods according to an embodiment of the invention;
Fig. 6 is the speed-up ratio schematic diagram data according to the diagram data division methods of one embodiment of the invention;
Fig. 7 is the flow chart of diagram data division methods in accordance with another embodiment of the present invention;
Fig. 8 is the structural schematic diagram of diagram data dividing device according to an embodiment of the invention.
Specific embodiment
The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end
Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached
The embodiment of figure description is exemplary, it is intended to is used to explain the present invention, and is not considered as limiting the invention.
Below with reference to the accompanying drawings the diagram data partitioning method and device of the embodiment of the present invention is described.
Fig. 2 is the flow chart of diagram data division methods according to an embodiment of the invention.As shown in Fig. 2, the diagram data
Division methods may include:
S1 models algorithm according to the data of figure definition and computation model.
Specifically, user sees the vertex in mapping, the relationship between user and user sees the side in mapping, whole network
A network can be regarded as, data are defined according to figure, data and computation model further according to definition model algorithm.
It should be noted that in one embodiment of the invention, as shown in figure 3, computation model UPPS is specifically included:
One datagram distinguishes the data DShare that cannot be divided and the data that can be divided as basic operating unit
DColle, wherein DColle is the vector that a length is SC;4 kinds of action types are provided, Update, Push, Pull are respectively as follows:
And Sink, wherein the operation of Update is to obtain the point of each figure or side and be updated to all data of each figure, Push
Operation be that meeting point is updated with the data on source point and side, the operation of Pull is to update source point with the data on meeting point and side, Sink's
Operation is to update side with the data of source point and meeting point.
More specifically, a datagram is used to can be G as basic operating unit, user can add arbitrarily
Data are on each of figure point and side.Computation model can be UPPS (Update Push Pull Sink), in operation side
Face provides the operation of 4 seed types, and wherein U represents Update, and Update a Vertex or UpdateEdge are operated,
Each point or side can take its all data and be updated.For a line (u, v), operation source point u and the side of Push
Data update meeting point v;The operation of Pull is then on the contrary, updating source point u with the data on meeting point v and side;The operation of Sink is then
Side is updated with the data of two points.
It should be noted that most of machine learning and data mining algorithm be all can by above-mentioned computation model into
Row description, such as SGD algorithm, general SGD algorithm be used to solve the problems, such as Collaborative Filtering.It
Principle be initialized target matrix P and Q, be then constantly updated along the method that gradient declines, it is specific update it is public
Formula is as follows:
Wherein, PiAnd QjThe i-th row and jth row of the two matrixes of P and Q are respectively represented, α is to update stride, ErrIt is corresponding
And error functions.
As shown in figure 4, SC is set to D, that is, that lesser one-dimensional length of objective matrix, each side
There are one corresponding mistake E other than corresponding marking rate for DShare datarr.Error functions calculating each time can pass through one
A Push and Pull function is realized.
S2 divides the data in modeling by 2-D data division methods, and obtains redundancy minimum after division
Data.
In one embodiment of the invention, 2-D data division methods include: a division, side divides and compound division.
It should be noted that good task partitioning algorithm needs first each meter is given on the side in figure as far as possible
Operator node, because this quantity is proportional to calculation amount;And second be reduced as far as duplicate side or point size summation.
Point divides.This method distributes the set of an independent point to each calculate node and gathers midpoint with this
Relevant all sides.After cutting, the calculation of each node is by reading the value on side, by calculating simultaneously
Update the value on the weight and all sides out of oneself.This kind of methods due to be opposite side carry out cutting and also distribution unit be
Therefore point is often referred as " side cutting " or " based on point " dividing method again.
Side divides.This kind of schemes can say that each calculate node is equably distributed on the side in figure, and for by multiple calculating
Point in the figure of nodes sharing establishes backup relation for synchronizing.Meet power law's due to common in realistic problem
Effect in diagram data is substantially better than the splitting scheme of 1D, and this kind of 2D algorithms are adopted by later most of figure computing engines
With, including PowerGraph and CombBALS.Similarly, two-dimensional algorithm also tastes referred to as " point cutting " or " based on side
".
Compound division.It is entitled " compound division " that recent PowerLyra system proposes a kind of new division methods.It is known as
The advantages of combining a peacekeeping two kinds of division methods of two dimension.However, basically can consider that compound division is still with regard to it
A kind of special two-dimentional division methods have only used specific heuristic to achieve the purpose that reduce traffic.
S3 determines the optimal third dimension number of plies according to estimating formula estimation.
Estimate and determine the optimal third dimension number of plies, using estimating formula to reach the smallest data communication amount.It is specific right
Four kinds of different operations in UPPS model, are set forth the estimating formula of its traffic under the specific number of plies.
S4 divides according to each layer of the smallest data of redundancy to the third dimension number of plies, obtains the distribution side of data
Formula.
For concrete example, each of figure point can be all split as K sub- points, similarly, the calculate node in cluster
It is also divided into K group.Wherein each group of calculate node is responsible for safeguarding one layer in K layers of figure, i.e., one be made of corresponding son point
Subgraph.For extreme, there is the cluster of N number of calculate node for one, each node is held whole if setting K=N
Open the structure of figure.Opposite, if setting K=1, retract to use two-dimensional division methods.As shown in figure 5, illustrating one
Dimension, two dimension, the division exemplary diagram of three-dimensional three kinds of different demarcation methods.Due to having carried out scheming to need in grouping therefore each layer to cluster
It wants divided number to reduce, therefore just reduces the quantity and traffic of backup.Fig. 6 is according to one implementation of the present invention
The speed-up ratio schematic diagram data of the diagram data division methods of example.According to Fig.6, diagram data division methods improve calculating effect
Rate.
The diagram data division methods of the embodiment of the present invention build algorithm according to the data of figure definition and computation model
Mould divides the data in modeling by 2-D data division methods, and obtains the smallest data of redundancy after division, then
The optimal third dimension number of plies is determined according to estimating formula estimation, finally according to the smallest data of redundancy to the every of the third dimension number of plies
One layer is divided, and the distribution mode of data is obtained.The method achieve each by increasing suitable interlayer traffic reduction
Data divide block number in layer, reduce traffic when distributed diagram data processing, promote computational efficiency.
Fig. 7 is the flow chart of diagram data division methods in accordance with another embodiment of the present invention.
As shown in fig. 7, the diagram data division methods may include:
S71 models algorithm according to the data of figure definition and computation model.
Specifically, user sees the vertex in mapping, the relationship between user and user sees the side in mapping, whole network
A network can be regarded as, data are defined according to figure, data and computation model further according to definition model algorithm.
S72 divides the data in the modeling by 2-D data division methods, and obtains redundancy after division
The smallest data.
In one embodiment of the invention, 2-D data division methods include: a division, side divides and compound division.
S73 determines the estimating formula of each layer of traffic according to 4 kinds of action types in computation model UPPS.
According to the operation of 4 seed types is provided in terms of operation, wherein U represents Update, for a Update Vertex
Or UpdateEdge operation, each point or side can take its all data and be updated.For a line (u, v),
The operation of Push updates meeting point v with the data on source point u and side;The operation of Pull is then on the contrary, being updated with the data on meeting point v and side
Source point u;The operation of Sink is then to update side with the data of two points.Determine the estimating formula of each layer of traffic
S74 determines each layer of final traffic according to estimating formula, selects traffic the smallest as the third dimension
The number of plies.
S75 selects the corresponding calculate node of the smallest data of redundancy to be divided into third dimension number of plies group.
S76, each group of calculate node in third dimension number of plies group are responsible for safeguarding one layer in the third dimension number of plies.
Algorithm is modeled according to the data of figure definition and computation model, by 2-D data division methods in modeling
Data divided, and obtain the smallest data of redundancy after division, determine optimal third further according to estimating formula estimation
The number of plies is tieed up, is finally divided according to each layer of the smallest data of redundancy to the third dimension number of plies, obtains the distribution side of data
Formula.The method achieve data division block number in each layer is reduced by increasing suitable interlayer traffic, distributed figure is reduced
Traffic when data processing promotes computational efficiency.
Corresponding with the diagram data division methods that above-mentioned several embodiments provide, a kind of embodiment of the invention also provides one
Kind diagram data dividing device, the figure provided due to diagram data dividing device provided in an embodiment of the present invention and above-mentioned several embodiments
Data partition method is corresponding, therefore is also applied for figure provided in this embodiment in the embodiment of earlier figures data partition method
Data dividing device, is not described in detail in the present embodiment.Fig. 8 is that diagram data according to an embodiment of the invention divides dress
The structural schematic diagram set.As shown in figure 8, the diagram data dividing device may include: processing module 10, the first division module 20,
Obtain module 30, determining module 40 and the second division module 50.
Wherein, processing module 10 is used to model algorithm according to the data and computation model of figure definition.First divides
Module 20 is for dividing the data in modeling by 2-D data division methods.Module 30 is obtained for obtaining first stroke
The smallest data of redundancy after sub-module divides.Determining module 40 is used to determine optimal third dimension layer according to estimating formula estimation
Number.And second division module 50 for being divided according to each layer of the smallest data of redundancy to the third dimension number of plies, obtain
Obtain the distribution mode of data.
In one embodiment of the invention, computation model specifically includes: in a datagram as basic operation list
Member distinguishes the data DShare that cannot be divided and the data DColle that can be divided, and it is SC that wherein DColle, which is a length,
Vector;There is provided 4 kinds of action types, wherein a kind of action type is to obtain the point of each figure or side and own to each figure
Data are updated, and in addition three types only operate the data in each layer.
In one embodiment of the invention, 2-D data division methods include: a division, side divides and compound division.
In one embodiment of the invention, determining module 40 specifically includes: the first determination unit 401 is used for according to calculating
4 kinds of action types in model determine the estimating formula of each layer of traffic.Second determination unit 402 is used for according to estimating formula
Determine each layer of final traffic.Selecting unit 403 is for selecting traffic the smallest as the third dimension number of plies.
In one embodiment of the invention, the second division module 50 specifically includes: division unit 501 is used for redundancy most
The corresponding calculate node of small data is divided into third dimension number of plies group.Maintenance unit 502 is for each in third dimension number of plies group
Group calculate node is responsible for safeguarding one layer in the third dimension number of plies.
The diagram data dividing device of the embodiment of the present invention, processing module is according to the data and computation model of figure definition to algorithm
It is modeled, the first division module divides the data in modeling by 2-D data division methods, obtains module and obtains
The smallest data of redundancy after division, determining module determine the optimal third dimension number of plies, the second last according to estimating formula estimation
Division module is divided according to each layer of the smallest data of redundancy to the third dimension number of plies, obtains the distribution mode of data.
The arrangement achieves data division block number in each layer is reduced by increasing suitable interlayer traffic, distributed diagram data is reduced
Traffic when processing promotes computational efficiency.
In the description of the present invention, it is to be understood that, term " first ", " second " are used for description purposes only, and cannot
It is interpreted as indication or suggestion relative importance or implicitly indicates the quantity of indicated technical characteristic.Define as a result, " the
One ", the feature of " second " can explicitly or implicitly include at least one of the features.In the description of the present invention, " multiple "
It is meant that at least two, such as two, three etc., unless otherwise specifically defined.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show
The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example
Point is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are not
It must be directed to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be in office
It can be combined in any suitable manner in one or more embodiment or examples.In addition, without conflicting with each other, the skill of this field
Art personnel can tie the feature of different embodiments or examples described in this specification and different embodiments or examples
It closes and combines.
Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes
It is one or more for realizing specific logical function or process the step of executable instruction code module, segment or portion
Point, and the range of the preferred embodiment of the present invention includes other realization, wherein can not press shown or discussed suitable
Sequence, including according to related function by it is basic simultaneously in the way of or in the opposite order, Lai Zhihang function, this should be of the invention
Embodiment person of ordinary skill in the field understood.
It should be appreciated that each section of the invention can be realized with hardware, software, firmware or their combination.Above-mentioned
In embodiment, software that multiple steps or method can be executed in memory and by suitable instruction execution system with storage
Or firmware is realized.It, and in another embodiment, can be under well known in the art for example, if realized with hardware
Any one of column technology or their combination are realized: having a logic gates for realizing logic function to data-signal
Discrete logic, with suitable combinational logic gate circuit specific integrated circuit, programmable gate array (PGA), scene
Programmable gate array (FPGA) etc..
Those skilled in the art are understood that realize all or part of step that above-described embodiment method carries
It suddenly is that relevant hardware can be instructed to complete by program, the program can store in a kind of computer-readable storage medium
In matter, which when being executed, includes the steps that one or a combination set of embodiment of the method.
It, can also be in addition, each functional unit in each embodiment of the present invention can integrate in a processing module
It is that each unit physically exists alone, can also be integrated in two or more units in a module.Above-mentioned integrated mould
Block both can take the form of hardware realization, can also be realized in the form of software function module.The integrated module is such as
Fruit is realized and when sold or used as an independent product in the form of software function module, also can store in a computer
In read/write memory medium.
Storage medium mentioned above can be read-only memory, disk or CD etc..Although having been shown and retouching above
The embodiment of the present invention is stated, it is to be understood that above-described embodiment is exemplary, and should not be understood as to limit of the invention
System, those skilled in the art can be changed above-described embodiment, modify, replace and become within the scope of the invention
Type.
Claims (6)
1. a kind of diagram data division methods, which comprises the following steps:
Algorithm is modeled according to the data of figure definition and computation model UPPS;
The data in the modeling are divided by 2-D data division methods, and obtain the smallest number of redundancy after division
According to;
The optimal third dimension number of plies is determined according to estimating formula estimation;And
It is divided according to each layer of the smallest data of the redundancy to the third dimension number of plies, obtains the distribution side of data
Formula;
Wherein, the computation model UPPS is specifically included:
In a datagram as basic operating unit, the data DShare that cannot be divided and the data that can be divided are distinguished
DColle, wherein DColle is the vector that a length is SC;
4 kinds of action types are provided, Update, Push, Pull and Sink are respectively as follows:, wherein the operation of the Update is to obtain
The point of each figure or side are simultaneously updated all data of each figure, and the operation of the Push is the number with source point and side
According to meeting point is updated, the operation of the Pull is that source point is updated with the data on meeting point and side, the operation of the Sink be with source point and
The data of meeting point update side;
It is described to determine that the optimal third dimension number of plies specifically includes according to estimating formula estimation:
The estimating formula of each layer of traffic is determined according to 4 kinds of action types in the computation model UPPS;
The final traffic that each layer is determined according to the estimating formula selects traffic the smallest as the third dimension layer
Number.
2. diagram data division methods as described in claim 1, which is characterized in that the 2-D data division methods include: a little
It divides, side divides and compound division.
3. diagram data division methods as described in claim 1, which is characterized in that according to the smallest data of the redundancy to institute
The each layer for stating the third dimension number of plies is divided, and the distribution mode for obtaining data specifically includes:
The corresponding calculate node of the smallest data of redundancy is divided into third dimension number of plies group;
Each group of calculate node in the third dimension number of plies group is responsible for safeguarding one layer in the third dimension number of plies.
4. a kind of diagram data dividing device characterized by comprising
Processing module, for being modeled according to the data and computation model UPPS of figure definition to algorithm;
First division module, for being divided by 2-D data division methods to the data in the modeling;
Module is obtained, for obtaining the smallest data of redundancy after first division module divides;
Determining module, for determining the optimal third dimension number of plies according to estimating formula estimation;And
Second division module, for being drawn according to each layer of the smallest data of the redundancy to the third dimension number of plies
Point, obtain the distribution mode of data;
Wherein, the computation model UPPS is specifically included:
In a datagram as basic operating unit, the data DShare that cannot be divided and the data that can be divided are distinguished
DColle, wherein DColle is the vector that a length is SC;
4 kinds of action types are provided, Update, Push, Pull and Sink are respectively as follows:, wherein the operation of the Update is to obtain
The point of each figure or side are simultaneously updated all data of each figure, and the operation of the Push is the number with source point and side
According to meeting point is updated, the operation of the Pull is that source point is updated with the data on meeting point and side, the operation of the Sink be with source point and
The data of meeting point update side;
The determining module specifically includes:
First determination unit, for determining estimating for each layer of traffic according to 4 kinds of action types in the computation model UPPS
Survey formula;
Second determination unit, for determining each layer of final traffic according to the estimating formula;
Selecting unit, for selecting traffic the smallest as the third dimension number of plies.
5. diagram data dividing device as claimed in claim 4, which is characterized in that the 2-D data division methods include: a little
It divides, side divides and compound division.
6. diagram data dividing device as claimed in claim 4, which is characterized in that second division module specifically includes:
Division unit is divided into third dimension number of plies group for the corresponding calculate node of the smallest data of the redundancy;
Maintenance unit is responsible for safeguarding in the third dimension number of plies for each group of calculate node in the third dimension number of plies group
One layer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610101409.1A CN105787020B (en) | 2016-02-24 | 2016-02-24 | Diagram data partitioning method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610101409.1A CN105787020B (en) | 2016-02-24 | 2016-02-24 | Diagram data partitioning method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105787020A CN105787020A (en) | 2016-07-20 |
CN105787020B true CN105787020B (en) | 2019-05-21 |
Family
ID=56402354
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610101409.1A Active CN105787020B (en) | 2016-02-24 | 2016-02-24 | Diagram data partitioning method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105787020B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108304586A (en) * | 2018-03-07 | 2018-07-20 | 南京审计大学 | A kind of availability of data improvement method of task orientation |
CN113326125B (en) * | 2021-05-20 | 2023-03-24 | 清华大学 | Large-scale distributed graph calculation end-to-end acceleration method and device |
CN113792170B (en) * | 2021-11-15 | 2022-03-15 | 支付宝(杭州)信息技术有限公司 | Graph data dividing method and device and computer equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102968475A (en) * | 2012-11-16 | 2013-03-13 | 上海交通大学 | Secure nearest neighbor query method and system based on minimum redundant data partition |
WO2013009503A3 (en) * | 2011-07-08 | 2014-05-30 | Yale University | Query execution systems and methods |
CN104820705A (en) * | 2015-05-13 | 2015-08-05 | 华中科技大学 | Extensible partition method for associated flow graph data |
CN105117488A (en) * | 2015-09-19 | 2015-12-02 | 大连理工大学 | RDF data balance partitioning algorithm based on mixed hierarchical clustering |
-
2016
- 2016-02-24 CN CN201610101409.1A patent/CN105787020B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013009503A3 (en) * | 2011-07-08 | 2014-05-30 | Yale University | Query execution systems and methods |
CN102968475A (en) * | 2012-11-16 | 2013-03-13 | 上海交通大学 | Secure nearest neighbor query method and system based on minimum redundant data partition |
CN104820705A (en) * | 2015-05-13 | 2015-08-05 | 华中科技大学 | Extensible partition method for associated flow graph data |
CN105117488A (en) * | 2015-09-19 | 2015-12-02 | 大连理工大学 | RDF data balance partitioning algorithm based on mixed hierarchical clustering |
Non-Patent Citations (2)
Title |
---|
一种分布式吴方法计算模型;武永卫等;《软件学报》;20051231;第16卷(第3期);全文 |
一种分布式吴方法计算模型;王志刚等;《计算机学报》;20151231;第38卷(第9期);全文 |
Also Published As
Publication number | Publication date |
---|---|
CN105787020A (en) | 2016-07-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8605092B2 (en) | Method and apparatus of animation planning for a dynamic graph | |
Gyulassy et al. | A practical approach to Morse-Smale complex computation: Scalability and generality | |
CN103914865B (en) | Form the group in the face of geometrical pattern | |
Batagelj et al. | Visual analysis of large graphs using (x, y)-clustering and hybrid visualizations | |
US20120136641A1 (en) | Machine, computer program product and method to carry out parallel reservoir simulation | |
CN105787020B (en) | Diagram data partitioning method and device | |
Fišer et al. | Growing neural gas efficiently | |
CN103678671A (en) | Dynamic community detection method in social network | |
Lee et al. | Simultaneous and incremental feature-based multiresolution modeling with feature operations in part design | |
US20220156430A1 (en) | Topological message passing for three dimensional models in boundary representation format | |
Yuan et al. | Feature preserving multiresolution subdivision and simplification of point clouds: A conformal geometric algebra approach | |
EP4374325A1 (en) | Pointcloud processing, especially for use with building intelligence modelling (bim) | |
CN103366401B (en) | Quick display method for multi-level virtual clothes fitting | |
CN108960335A (en) | One kind carrying out efficient clustering method based on large scale network | |
CN109508389B (en) | Visual accelerating method for personnel social relationship map | |
Yao et al. | Shape estimation for elongated deformable object using B-spline chained multiple random matrices model | |
CN107679127A (en) | Point cloud information parallel extraction method and its system based on geographical position | |
Rekleitis et al. | Efficient topological exploration | |
CN107908696A (en) | A kind of parallel efficiently multidimensional space data clustering algorithm GRIDEN based on grid and density | |
Wang et al. | Synchronization between the spatial Julia sets of complex Lorenz system and complex Henon map | |
JP2019526111A (en) | Direct boolean operation using geometric facets | |
CN110325984B (en) | System and method for hierarchical community detection in graphics | |
CN107730586B (en) | Method and system for modeling stratum | |
Camata et al. | Parallel linear octree meshing with immersed surfaces | |
Saye | An algorithm to mesh interconnected surfaces via the Voronoi interface |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20191227 Address after: 9F Asia Pacific Road No. 705 Jiaxing Nanhu District of Zhejiang Province in 314006 Patentee after: Qinghua Changsanjiao Research Inst., Zhejiang Address before: 315105 Zhejiang city of Ningbo province Yinzhou District Qiming Road No. 818 building 14, No. 108 Patentee before: Innovation center of Yin Zhou Qinghua Changsanjiao Research Inst., Zhejiang |