CN105787020B - Diagram data partitioning method and device - Google Patents

Diagram data partitioning method and device Download PDF

Info

Publication number
CN105787020B
CN105787020B CN201610101409.1A CN201610101409A CN105787020B CN 105787020 B CN105787020 B CN 105787020B CN 201610101409 A CN201610101409 A CN 201610101409A CN 105787020 B CN105787020 B CN 105787020B
Authority
CN
China
Prior art keywords
data
plies
division
layer
divided
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610101409.1A
Other languages
Chinese (zh)
Other versions
CN105787020A (en
Inventor
武永卫
章明星
陈康
郑纬民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yangtze Delta Region Institute of Tsinghua University Zhejiang
Original Assignee
Innovation Center Of Yin Zhou Qinghua Changsanjiao Research Inst Zhejiang
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Innovation Center Of Yin Zhou Qinghua Changsanjiao Research Inst Zhejiang filed Critical Innovation Center Of Yin Zhou Qinghua Changsanjiao Research Inst Zhejiang
Priority to CN201610101409.1A priority Critical patent/CN105787020B/en
Publication of CN105787020A publication Critical patent/CN105787020A/en
Application granted granted Critical
Publication of CN105787020B publication Critical patent/CN105787020B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of diagram data partitioning method and devices, wherein this method includes being modeled according to the data and computation model UPPS of figure definition to algorithm;The data in modeling are divided by 2-D data division methods, and obtain the smallest data of redundancy after division;The optimal third dimension number of plies is determined according to estimating formula estimation;And divided according to each layer of the smallest data of redundancy to the third dimension number of plies, obtain the distribution mode of data.The method achieve data division block number in each layer is reduced by increasing suitable interlayer traffic, traffic when distributed diagram data processing is reduced, computational efficiency is promoted.

Description

Diagram data partitioning method and device
Technical field
The present invention relates to technical field of data processing more particularly to a kind of diagram data partitioning method and devices.
Background technique
With continuous growth of the diagram data in data volume and importance, distributed diagram data processing engine is increasingly becoming A kind of solution of mainstream.This demand drives many developers of industry and academia and researcher to have developed Many diversified figure parallel processing frames, including Pregel, PowerGraph, GraphX etc..Since these systems mention Supply a kind of simple and effective programming interface that programmers is made seamless their program can be expanded to multi-machine surroundings, this It is a little originally in order to which the system for scheming to calculate design is not only only applied to traditional map analysis at present, be also used for it is many can with figure into The machine learning of row modeling and data mining task.
Such as the purpose of Collaborative Filtering this kind data mining problem is through known user Unknown part is predicted to the marking set of article.This problem is described with matrix earliest.As shown in Figure 1, simple It singly says, give the sparse matrix R that a size is N*M, the purpose of solution is that R is resolved into two low-dimensional dense matrix P and Q Product (size of P and Q are respectively N*D and M*D, and R is approximately equal to P*Q^T wherein D is much smaller than N and M).Opposite, together Sample can be described this problem with graph model, and wherein every a line of P and Q corresponds respectively to the point in a bipartite graph. The attribute of each point is the vector that a length is D, and scoring matrix R then corresponds to the score on side.In other words user u If being Ruv to the marking of article v, then the side right value between u and v is Ruv.
With the growth for analyze to machine learning and the medium-and-large-sized diagram data of data mining task this demand, many Preceding the problem of not being taken into account, also occurs therewith.According to recent some researchs, in order to efficiently handle diagram data, one A crucial research point is how the traffic when processing of reduction figure.Therefore, a distributed figure processing engine needs very The careful task partitioning algorithm for selecting him to use.However, existing figure processing engine all has ignored substantially at present according to investigation The characteristic of machine learning and data mining task and the attribute for thinking in figure that each is put is inseparable, therefore figure is calculated Task divide simply equivalence for general Graph partition problem.But as Collaborative Filtering this Class data mining problem is such, each attribute put is actually a vector when very much.Although this vector Length is general and little, needs a kind of new dimension to divide figure, solves the problems, such as traffic.
Summary of the invention
The purpose of the present invention is intended to solve above-mentioned one of technical problem at least to a certain extent.
For this purpose, the first purpose of this invention is to propose a kind of diagram data division methods.The method achieve pass through to increase Add suitable interlayer traffic to reduce data in each layer and divide block number, reduces traffic when distributed diagram data processing, mention Rise computational efficiency.
Second object of the present invention is to propose a kind of diagram data dividing device.
In order to achieve the above object, the diagram data division methods of first aspect present invention embodiment, according to the data of figure definition and Computation model UPPS (Update Push Pull Sink) models algorithm;By 2-D data division methods to described Data in modeling are divided, and obtain the smallest data of redundancy after division;It is optimal according to estimating formula estimation determination The third dimension number of plies;And divided according to each layer of the smallest data of the redundancy to the third dimension number of plies, it obtains The distribution mode of data.
The diagram data division methods of the embodiment of the present invention build algorithm according to the data of figure definition and computation model Mould divides the data in modeling by 2-D data division methods, and obtains the smallest data of redundancy after division, then The optimal third dimension number of plies is determined according to estimating formula estimation, finally according to the smallest data of redundancy to the every of the third dimension number of plies One layer is divided, and the distribution mode of data is obtained.The method achieve each by increasing suitable interlayer traffic reduction Data divide block number in layer, reduce traffic when distributed diagram data processing, promote computational efficiency.
In one embodiment of the invention, the computation model UPPS is specifically included: in a datagram as basic Operating unit, distinguish the data DShare that cannot divide and the data DColle that can be divided, wherein DColle is one Length is the vector of SC;4 kinds of action types are provided, Update, Push, Pull and Sink are respectively as follows:, wherein the Update Operation be to obtain the point of each figure or side and all data of each figure are updated, the operation of the Push is to use The data on source point and side update meeting point, and the operation of the Pull is to update source point, the behaviour of the Sink with the data on meeting point and side Side is updated as with the data of source point and meeting point
In one embodiment of the invention, the 2-D data division methods include: divide, side divides and compound stroke Point.
In one embodiment of the invention, described to determine that the optimal third dimension number of plies is specifically wrapped according to estimating formula estimation It includes: determining the estimating formula of each layer of traffic according to 4 kinds of action types in the computing module UPPS;According to the estimation Formula determines each layer of final traffic, selects traffic the smallest as the third dimension number of plies.
In one embodiment of the invention, according to the smallest data of the redundancy to each of the third dimension number of plies Layer is divided, and the distribution mode for obtaining data specifically includes: the corresponding calculate node of the smallest data of redundancy divides For third dimension number of plies group;Each group of calculate node in the third dimension number of plies group is responsible for safeguarding one in the third dimension number of plies Layer.
In order to achieve the above object, the diagram data dividing device of second aspect of the present invention embodiment, including processing module, it is used for root Algorithm is modeled according to the data and computation model of figure definition;First division module, for passing through 2-D data division methods Data in the modeling are divided;Module is obtained, it is minimum for obtaining redundancy after first division module divides Data;Determining module, for determining the optimal third dimension number of plies according to estimating formula estimation;And second division module, it uses It is divided according to each layer of the smallest data of the redundancy to the third dimension number of plies, obtains the distribution side of data Formula.
The diagram data dividing device of the embodiment of the present invention, processing module is according to the data and computation model of figure definition to algorithm It is modeled, the first division module divides the data in modeling by 2-D data division methods, obtains module and obtains The smallest data of redundancy after division, determining module determine the optimal third dimension number of plies, the second last according to estimating formula estimation Division module is divided according to each layer of the smallest data of redundancy to the third dimension number of plies, obtains the distribution mode of data. The arrangement achieves data division block number in each layer is reduced by increasing suitable interlayer traffic, distributed diagram data is reduced Traffic when processing promotes computational efficiency.
In one embodiment of the invention, the computation model UPPS is specifically included: in a datagram as basic Operating unit, distinguish the data DShare that cannot divide and the data DColle that can be divided, wherein DColle is one Length is the vector of SC;4 kinds of action types are provided, Update, Push, Pull and Sink are respectively as follows:, wherein the Update Operation be to obtain the point of each figure or side and all data of each figure are updated, the operation of the Push is to use The data on source point and side update meeting point, and the operation of the Pull is to update source point, the behaviour of the Sink with the data on meeting point and side Side is updated as with the data of source point and meeting point
In one embodiment of the invention, the 2-D data division methods include: divide, side divides and compound stroke Point.
In one embodiment of the invention, the determining module specifically includes: the first determination unit, for according to 4 kinds of action types in computing module determine the estimating formula of each layer of traffic;Second determination unit, for estimating according to Survey the final traffic that formula determines each layer;Selecting unit, for selecting traffic the smallest as the third dimension number of plies.
In one embodiment of the invention, second division module specifically includes: division unit, is used for the redundancy It spends the corresponding calculate node of the smallest data and is divided into third dimension number of plies group;Maintenance unit, in the third dimension number of plies group Each group of calculate node be responsible for safeguarding one layer in the third dimension number of plies.
The additional aspect of the present invention and advantage will be set forth in part in the description, and will partially become from the following description Obviously, or practice through the invention is recognized.
Detailed description of the invention
Above-mentioned and/or additional aspect of the invention and advantage will become from the description of the embodiment in conjunction with the following figures Obviously and it is readily appreciated that, in which:
Fig. 1 is collaboration partition problem model schematic according to an embodiment of the invention;
Fig. 2 is the flow chart of diagram data division methods according to an embodiment of the invention;
Fig. 3 is the schematic diagram of UPPS model according to an embodiment of the invention;
Fig. 4 is the UPPS model application drawing of SGD algorithm according to an embodiment of the invention;
The schematic diagram of Fig. 5 different dimensional division methods according to an embodiment of the invention;
Fig. 6 is the speed-up ratio schematic diagram data according to the diagram data division methods of one embodiment of the invention;
Fig. 7 is the flow chart of diagram data division methods in accordance with another embodiment of the present invention;
Fig. 8 is the structural schematic diagram of diagram data dividing device according to an embodiment of the invention.
Specific embodiment
The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, it is intended to is used to explain the present invention, and is not considered as limiting the invention.
Below with reference to the accompanying drawings the diagram data partitioning method and device of the embodiment of the present invention is described.
Fig. 2 is the flow chart of diagram data division methods according to an embodiment of the invention.As shown in Fig. 2, the diagram data Division methods may include:
S1 models algorithm according to the data of figure definition and computation model.
Specifically, user sees the vertex in mapping, the relationship between user and user sees the side in mapping, whole network A network can be regarded as, data are defined according to figure, data and computation model further according to definition model algorithm.
It should be noted that in one embodiment of the invention, as shown in figure 3, computation model UPPS is specifically included: One datagram distinguishes the data DShare that cannot be divided and the data that can be divided as basic operating unit DColle, wherein DColle is the vector that a length is SC;4 kinds of action types are provided, Update, Push, Pull are respectively as follows: And Sink, wherein the operation of Update is to obtain the point of each figure or side and be updated to all data of each figure, Push Operation be that meeting point is updated with the data on source point and side, the operation of Pull is to update source point with the data on meeting point and side, Sink's Operation is to update side with the data of source point and meeting point.
More specifically, a datagram is used to can be G as basic operating unit, user can add arbitrarily Data are on each of figure point and side.Computation model can be UPPS (Update Push Pull Sink), in operation side Face provides the operation of 4 seed types, and wherein U represents Update, and Update a Vertex or UpdateEdge are operated, Each point or side can take its all data and be updated.For a line (u, v), operation source point u and the side of Push Data update meeting point v;The operation of Pull is then on the contrary, updating source point u with the data on meeting point v and side;The operation of Sink is then Side is updated with the data of two points.
It should be noted that most of machine learning and data mining algorithm be all can by above-mentioned computation model into Row description, such as SGD algorithm, general SGD algorithm be used to solve the problems, such as Collaborative Filtering.It Principle be initialized target matrix P and Q, be then constantly updated along the method that gradient declines, it is specific update it is public Formula is as follows:
Wherein, PiAnd QjThe i-th row and jth row of the two matrixes of P and Q are respectively represented, α is to update stride, ErrIt is corresponding And error functions.
As shown in figure 4, SC is set to D, that is, that lesser one-dimensional length of objective matrix, each side There are one corresponding mistake E other than corresponding marking rate for DShare datarr.Error functions calculating each time can pass through one A Push and Pull function is realized.
S2 divides the data in modeling by 2-D data division methods, and obtains redundancy minimum after division Data.
In one embodiment of the invention, 2-D data division methods include: a division, side divides and compound division.
It should be noted that good task partitioning algorithm needs first each meter is given on the side in figure as far as possible Operator node, because this quantity is proportional to calculation amount;And second be reduced as far as duplicate side or point size summation.
Point divides.This method distributes the set of an independent point to each calculate node and gathers midpoint with this Relevant all sides.After cutting, the calculation of each node is by reading the value on side, by calculating simultaneously Update the value on the weight and all sides out of oneself.This kind of methods due to be opposite side carry out cutting and also distribution unit be Therefore point is often referred as " side cutting " or " based on point " dividing method again.
Side divides.This kind of schemes can say that each calculate node is equably distributed on the side in figure, and for by multiple calculating Point in the figure of nodes sharing establishes backup relation for synchronizing.Meet power law's due to common in realistic problem Effect in diagram data is substantially better than the splitting scheme of 1D, and this kind of 2D algorithms are adopted by later most of figure computing engines With, including PowerGraph and CombBALS.Similarly, two-dimensional algorithm also tastes referred to as " point cutting " or " based on side ".
Compound division.It is entitled " compound division " that recent PowerLyra system proposes a kind of new division methods.It is known as The advantages of combining a peacekeeping two kinds of division methods of two dimension.However, basically can consider that compound division is still with regard to it A kind of special two-dimentional division methods have only used specific heuristic to achieve the purpose that reduce traffic.
S3 determines the optimal third dimension number of plies according to estimating formula estimation.
Estimate and determine the optimal third dimension number of plies, using estimating formula to reach the smallest data communication amount.It is specific right Four kinds of different operations in UPPS model, are set forth the estimating formula of its traffic under the specific number of plies.
S4 divides according to each layer of the smallest data of redundancy to the third dimension number of plies, obtains the distribution side of data Formula.
For concrete example, each of figure point can be all split as K sub- points, similarly, the calculate node in cluster It is also divided into K group.Wherein each group of calculate node is responsible for safeguarding one layer in K layers of figure, i.e., one be made of corresponding son point Subgraph.For extreme, there is the cluster of N number of calculate node for one, each node is held whole if setting K=N Open the structure of figure.Opposite, if setting K=1, retract to use two-dimensional division methods.As shown in figure 5, illustrating one Dimension, two dimension, the division exemplary diagram of three-dimensional three kinds of different demarcation methods.Due to having carried out scheming to need in grouping therefore each layer to cluster It wants divided number to reduce, therefore just reduces the quantity and traffic of backup.Fig. 6 is according to one implementation of the present invention The speed-up ratio schematic diagram data of the diagram data division methods of example.According to Fig.6, diagram data division methods improve calculating effect Rate.
The diagram data division methods of the embodiment of the present invention build algorithm according to the data of figure definition and computation model Mould divides the data in modeling by 2-D data division methods, and obtains the smallest data of redundancy after division, then The optimal third dimension number of plies is determined according to estimating formula estimation, finally according to the smallest data of redundancy to the every of the third dimension number of plies One layer is divided, and the distribution mode of data is obtained.The method achieve each by increasing suitable interlayer traffic reduction Data divide block number in layer, reduce traffic when distributed diagram data processing, promote computational efficiency.
Fig. 7 is the flow chart of diagram data division methods in accordance with another embodiment of the present invention.
As shown in fig. 7, the diagram data division methods may include:
S71 models algorithm according to the data of figure definition and computation model.
Specifically, user sees the vertex in mapping, the relationship between user and user sees the side in mapping, whole network A network can be regarded as, data are defined according to figure, data and computation model further according to definition model algorithm.
S72 divides the data in the modeling by 2-D data division methods, and obtains redundancy after division The smallest data.
In one embodiment of the invention, 2-D data division methods include: a division, side divides and compound division.
S73 determines the estimating formula of each layer of traffic according to 4 kinds of action types in computation model UPPS.
According to the operation of 4 seed types is provided in terms of operation, wherein U represents Update, for a Update Vertex Or UpdateEdge operation, each point or side can take its all data and be updated.For a line (u, v), The operation of Push updates meeting point v with the data on source point u and side;The operation of Pull is then on the contrary, being updated with the data on meeting point v and side Source point u;The operation of Sink is then to update side with the data of two points.Determine the estimating formula of each layer of traffic
S74 determines each layer of final traffic according to estimating formula, selects traffic the smallest as the third dimension The number of plies.
S75 selects the corresponding calculate node of the smallest data of redundancy to be divided into third dimension number of plies group.
S76, each group of calculate node in third dimension number of plies group are responsible for safeguarding one layer in the third dimension number of plies.
Algorithm is modeled according to the data of figure definition and computation model, by 2-D data division methods in modeling Data divided, and obtain the smallest data of redundancy after division, determine optimal third further according to estimating formula estimation The number of plies is tieed up, is finally divided according to each layer of the smallest data of redundancy to the third dimension number of plies, obtains the distribution side of data Formula.The method achieve data division block number in each layer is reduced by increasing suitable interlayer traffic, distributed figure is reduced Traffic when data processing promotes computational efficiency.
Corresponding with the diagram data division methods that above-mentioned several embodiments provide, a kind of embodiment of the invention also provides one Kind diagram data dividing device, the figure provided due to diagram data dividing device provided in an embodiment of the present invention and above-mentioned several embodiments Data partition method is corresponding, therefore is also applied for figure provided in this embodiment in the embodiment of earlier figures data partition method Data dividing device, is not described in detail in the present embodiment.Fig. 8 is that diagram data according to an embodiment of the invention divides dress The structural schematic diagram set.As shown in figure 8, the diagram data dividing device may include: processing module 10, the first division module 20, Obtain module 30, determining module 40 and the second division module 50.
Wherein, processing module 10 is used to model algorithm according to the data and computation model of figure definition.First divides Module 20 is for dividing the data in modeling by 2-D data division methods.Module 30 is obtained for obtaining first stroke The smallest data of redundancy after sub-module divides.Determining module 40 is used to determine optimal third dimension layer according to estimating formula estimation Number.And second division module 50 for being divided according to each layer of the smallest data of redundancy to the third dimension number of plies, obtain Obtain the distribution mode of data.
In one embodiment of the invention, computation model specifically includes: in a datagram as basic operation list Member distinguishes the data DShare that cannot be divided and the data DColle that can be divided, and it is SC that wherein DColle, which is a length, Vector;There is provided 4 kinds of action types, wherein a kind of action type is to obtain the point of each figure or side and own to each figure Data are updated, and in addition three types only operate the data in each layer.
In one embodiment of the invention, 2-D data division methods include: a division, side divides and compound division.
In one embodiment of the invention, determining module 40 specifically includes: the first determination unit 401 is used for according to calculating 4 kinds of action types in model determine the estimating formula of each layer of traffic.Second determination unit 402 is used for according to estimating formula Determine each layer of final traffic.Selecting unit 403 is for selecting traffic the smallest as the third dimension number of plies.
In one embodiment of the invention, the second division module 50 specifically includes: division unit 501 is used for redundancy most The corresponding calculate node of small data is divided into third dimension number of plies group.Maintenance unit 502 is for each in third dimension number of plies group Group calculate node is responsible for safeguarding one layer in the third dimension number of plies.
The diagram data dividing device of the embodiment of the present invention, processing module is according to the data and computation model of figure definition to algorithm It is modeled, the first division module divides the data in modeling by 2-D data division methods, obtains module and obtains The smallest data of redundancy after division, determining module determine the optimal third dimension number of plies, the second last according to estimating formula estimation Division module is divided according to each layer of the smallest data of redundancy to the third dimension number of plies, obtains the distribution mode of data. The arrangement achieves data division block number in each layer is reduced by increasing suitable interlayer traffic, distributed diagram data is reduced Traffic when processing promotes computational efficiency.
In the description of the present invention, it is to be understood that, term " first ", " second " are used for description purposes only, and cannot It is interpreted as indication or suggestion relative importance or implicitly indicates the quantity of indicated technical characteristic.Define as a result, " the One ", the feature of " second " can explicitly or implicitly include at least one of the features.In the description of the present invention, " multiple " It is meant that at least two, such as two, three etc., unless otherwise specifically defined.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example Point is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are not It must be directed to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be in office It can be combined in any suitable manner in one or more embodiment or examples.In addition, without conflicting with each other, the skill of this field Art personnel can tie the feature of different embodiments or examples described in this specification and different embodiments or examples It closes and combines.
Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes It is one or more for realizing specific logical function or process the step of executable instruction code module, segment or portion Point, and the range of the preferred embodiment of the present invention includes other realization, wherein can not press shown or discussed suitable Sequence, including according to related function by it is basic simultaneously in the way of or in the opposite order, Lai Zhihang function, this should be of the invention Embodiment person of ordinary skill in the field understood.
It should be appreciated that each section of the invention can be realized with hardware, software, firmware or their combination.Above-mentioned In embodiment, software that multiple steps or method can be executed in memory and by suitable instruction execution system with storage Or firmware is realized.It, and in another embodiment, can be under well known in the art for example, if realized with hardware Any one of column technology or their combination are realized: having a logic gates for realizing logic function to data-signal Discrete logic, with suitable combinational logic gate circuit specific integrated circuit, programmable gate array (PGA), scene Programmable gate array (FPGA) etc..
Those skilled in the art are understood that realize all or part of step that above-described embodiment method carries It suddenly is that relevant hardware can be instructed to complete by program, the program can store in a kind of computer-readable storage medium In matter, which when being executed, includes the steps that one or a combination set of embodiment of the method.
It, can also be in addition, each functional unit in each embodiment of the present invention can integrate in a processing module It is that each unit physically exists alone, can also be integrated in two or more units in a module.Above-mentioned integrated mould Block both can take the form of hardware realization, can also be realized in the form of software function module.The integrated module is such as Fruit is realized and when sold or used as an independent product in the form of software function module, also can store in a computer In read/write memory medium.
Storage medium mentioned above can be read-only memory, disk or CD etc..Although having been shown and retouching above The embodiment of the present invention is stated, it is to be understood that above-described embodiment is exemplary, and should not be understood as to limit of the invention System, those skilled in the art can be changed above-described embodiment, modify, replace and become within the scope of the invention Type.

Claims (6)

1. a kind of diagram data division methods, which comprises the following steps:
Algorithm is modeled according to the data of figure definition and computation model UPPS;
The data in the modeling are divided by 2-D data division methods, and obtain the smallest number of redundancy after division According to;
The optimal third dimension number of plies is determined according to estimating formula estimation;And
It is divided according to each layer of the smallest data of the redundancy to the third dimension number of plies, obtains the distribution side of data Formula;
Wherein, the computation model UPPS is specifically included:
In a datagram as basic operating unit, the data DShare that cannot be divided and the data that can be divided are distinguished DColle, wherein DColle is the vector that a length is SC;
4 kinds of action types are provided, Update, Push, Pull and Sink are respectively as follows:, wherein the operation of the Update is to obtain The point of each figure or side are simultaneously updated all data of each figure, and the operation of the Push is the number with source point and side According to meeting point is updated, the operation of the Pull is that source point is updated with the data on meeting point and side, the operation of the Sink be with source point and The data of meeting point update side;
It is described to determine that the optimal third dimension number of plies specifically includes according to estimating formula estimation:
The estimating formula of each layer of traffic is determined according to 4 kinds of action types in the computation model UPPS;
The final traffic that each layer is determined according to the estimating formula selects traffic the smallest as the third dimension layer Number.
2. diagram data division methods as described in claim 1, which is characterized in that the 2-D data division methods include: a little It divides, side divides and compound division.
3. diagram data division methods as described in claim 1, which is characterized in that according to the smallest data of the redundancy to institute The each layer for stating the third dimension number of plies is divided, and the distribution mode for obtaining data specifically includes:
The corresponding calculate node of the smallest data of redundancy is divided into third dimension number of plies group;
Each group of calculate node in the third dimension number of plies group is responsible for safeguarding one layer in the third dimension number of plies.
4. a kind of diagram data dividing device characterized by comprising
Processing module, for being modeled according to the data and computation model UPPS of figure definition to algorithm;
First division module, for being divided by 2-D data division methods to the data in the modeling;
Module is obtained, for obtaining the smallest data of redundancy after first division module divides;
Determining module, for determining the optimal third dimension number of plies according to estimating formula estimation;And
Second division module, for being drawn according to each layer of the smallest data of the redundancy to the third dimension number of plies Point, obtain the distribution mode of data;
Wherein, the computation model UPPS is specifically included:
In a datagram as basic operating unit, the data DShare that cannot be divided and the data that can be divided are distinguished DColle, wherein DColle is the vector that a length is SC;
4 kinds of action types are provided, Update, Push, Pull and Sink are respectively as follows:, wherein the operation of the Update is to obtain The point of each figure or side are simultaneously updated all data of each figure, and the operation of the Push is the number with source point and side According to meeting point is updated, the operation of the Pull is that source point is updated with the data on meeting point and side, the operation of the Sink be with source point and The data of meeting point update side;
The determining module specifically includes:
First determination unit, for determining estimating for each layer of traffic according to 4 kinds of action types in the computation model UPPS Survey formula;
Second determination unit, for determining each layer of final traffic according to the estimating formula;
Selecting unit, for selecting traffic the smallest as the third dimension number of plies.
5. diagram data dividing device as claimed in claim 4, which is characterized in that the 2-D data division methods include: a little It divides, side divides and compound division.
6. diagram data dividing device as claimed in claim 4, which is characterized in that second division module specifically includes:
Division unit is divided into third dimension number of plies group for the corresponding calculate node of the smallest data of the redundancy;
Maintenance unit is responsible for safeguarding in the third dimension number of plies for each group of calculate node in the third dimension number of plies group One layer.
CN201610101409.1A 2016-02-24 2016-02-24 Diagram data partitioning method and device Active CN105787020B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610101409.1A CN105787020B (en) 2016-02-24 2016-02-24 Diagram data partitioning method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610101409.1A CN105787020B (en) 2016-02-24 2016-02-24 Diagram data partitioning method and device

Publications (2)

Publication Number Publication Date
CN105787020A CN105787020A (en) 2016-07-20
CN105787020B true CN105787020B (en) 2019-05-21

Family

ID=56402354

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610101409.1A Active CN105787020B (en) 2016-02-24 2016-02-24 Diagram data partitioning method and device

Country Status (1)

Country Link
CN (1) CN105787020B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108304586A (en) * 2018-03-07 2018-07-20 南京审计大学 A kind of availability of data improvement method of task orientation
CN113326125B (en) * 2021-05-20 2023-03-24 清华大学 Large-scale distributed graph calculation end-to-end acceleration method and device
CN113792170B (en) * 2021-11-15 2022-03-15 支付宝(杭州)信息技术有限公司 Graph data dividing method and device and computer equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102968475A (en) * 2012-11-16 2013-03-13 上海交通大学 Secure nearest neighbor query method and system based on minimum redundant data partition
WO2013009503A3 (en) * 2011-07-08 2014-05-30 Yale University Query execution systems and methods
CN104820705A (en) * 2015-05-13 2015-08-05 华中科技大学 Extensible partition method for associated flow graph data
CN105117488A (en) * 2015-09-19 2015-12-02 大连理工大学 RDF data balance partitioning algorithm based on mixed hierarchical clustering

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013009503A3 (en) * 2011-07-08 2014-05-30 Yale University Query execution systems and methods
CN102968475A (en) * 2012-11-16 2013-03-13 上海交通大学 Secure nearest neighbor query method and system based on minimum redundant data partition
CN104820705A (en) * 2015-05-13 2015-08-05 华中科技大学 Extensible partition method for associated flow graph data
CN105117488A (en) * 2015-09-19 2015-12-02 大连理工大学 RDF data balance partitioning algorithm based on mixed hierarchical clustering

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种分布式吴方法计算模型;武永卫等;《软件学报》;20051231;第16卷(第3期);全文
一种分布式吴方法计算模型;王志刚等;《计算机学报》;20151231;第38卷(第9期);全文

Also Published As

Publication number Publication date
CN105787020A (en) 2016-07-20

Similar Documents

Publication Publication Date Title
US8605092B2 (en) Method and apparatus of animation planning for a dynamic graph
Gyulassy et al. A practical approach to Morse-Smale complex computation: Scalability and generality
CN103914865B (en) Form the group in the face of geometrical pattern
Batagelj et al. Visual analysis of large graphs using (x, y)-clustering and hybrid visualizations
US20120136641A1 (en) Machine, computer program product and method to carry out parallel reservoir simulation
CN105787020B (en) Diagram data partitioning method and device
Fišer et al. Growing neural gas efficiently
CN103678671A (en) Dynamic community detection method in social network
Lee et al. Simultaneous and incremental feature-based multiresolution modeling with feature operations in part design
US20220156430A1 (en) Topological message passing for three dimensional models in boundary representation format
Yuan et al. Feature preserving multiresolution subdivision and simplification of point clouds: A conformal geometric algebra approach
EP4374325A1 (en) Pointcloud processing, especially for use with building intelligence modelling (bim)
CN103366401B (en) Quick display method for multi-level virtual clothes fitting
CN108960335A (en) One kind carrying out efficient clustering method based on large scale network
CN109508389B (en) Visual accelerating method for personnel social relationship map
Yao et al. Shape estimation for elongated deformable object using B-spline chained multiple random matrices model
CN107679127A (en) Point cloud information parallel extraction method and its system based on geographical position
Rekleitis et al. Efficient topological exploration
CN107908696A (en) A kind of parallel efficiently multidimensional space data clustering algorithm GRIDEN based on grid and density
Wang et al. Synchronization between the spatial Julia sets of complex Lorenz system and complex Henon map
JP2019526111A (en) Direct boolean operation using geometric facets
CN110325984B (en) System and method for hierarchical community detection in graphics
CN107730586B (en) Method and system for modeling stratum
Camata et al. Parallel linear octree meshing with immersed surfaces
Saye An algorithm to mesh interconnected surfaces via the Voronoi interface

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20191227

Address after: 9F Asia Pacific Road No. 705 Jiaxing Nanhu District of Zhejiang Province in 314006

Patentee after: Qinghua Changsanjiao Research Inst., Zhejiang

Address before: 315105 Zhejiang city of Ningbo province Yinzhou District Qiming Road No. 818 building 14, No. 108

Patentee before: Innovation center of Yin Zhou Qinghua Changsanjiao Research Inst., Zhejiang