CN107943918A - A kind of arithmetic system based on stratification large-scale graph data - Google Patents
A kind of arithmetic system based on stratification large-scale graph data Download PDFInfo
- Publication number
- CN107943918A CN107943918A CN201711160660.6A CN201711160660A CN107943918A CN 107943918 A CN107943918 A CN 107943918A CN 201711160660 A CN201711160660 A CN 201711160660A CN 107943918 A CN107943918 A CN 107943918A
- Authority
- CN
- China
- Prior art keywords
- diagram data
- data
- module
- node
- diagram
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013517 stratification Methods 0.000 title claims abstract description 11
- 238000010586 diagram Methods 0.000 claims abstract description 129
- 238000012517 data analytics Methods 0.000 claims abstract description 19
- 230000011218 segmentation Effects 0.000 claims abstract description 18
- 238000012545 processing Methods 0.000 claims abstract description 9
- 238000001914 filtration Methods 0.000 claims abstract description 6
- 238000004364 calculation method Methods 0.000 claims description 7
- 238000012546 transfer Methods 0.000 claims description 6
- 230000010354 integration Effects 0.000 claims description 5
- 238000000926 separation method Methods 0.000 claims description 4
- 235000013399 edible fruits Nutrition 0.000 claims description 2
- 238000007781 pre-processing Methods 0.000 claims description 2
- 238000005192 partition Methods 0.000 abstract description 3
- 238000000034 method Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9024—Graphs; Linked lists
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Business, Economics & Management (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- Evolutionary Computation (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of arithmetic system based on stratification large-scale graph data, including diagram data collecting unit, diagram data analytic unit, diagram data administrative unit;Diagram data analytic unit includes diagram data segmentation module, statistical module, diagram data merging module;Diagram data administrative unit includes computing module, contrast module, alarm module;Diagram data collecting unit is used to gather large-scale diagram data, and diagram data is carried out noise filtering processing by medium filtering, and the diagram data after processing then is transmitted to diagram data analytic unit and diagram data administrative unit.The present invention is split by the adjacent node after being pre-processed to diagram data according to diagram data, then the diagram data after segmentation is integrated, and into row bound, some is gathered to pretreatment diagram data, obtain original boundaries, original boundaries and integral data are contrasted at the same time, judge the accuracy of partition data, and then ensure the accuracy of diagram data.
Description
Technical field
The invention belongs to large-scale graph data process field, is related to a kind of computing system based on stratification large-scale graph data
System.
Background technology
Excavated the epoch in big data, figure not only can directly describe the neck such as computer science, chemistry and bioinformatics
Many practical applications in domain, such as social networks, web (webpage) figure, chemical substance and biological structure etc., while can also use
To describe various data mining algorithms, such as matrix decomposition or shortest path etc..Wherein, figure includes multiple nodes and connection
The side of each node, the node data of diagram data including each node and each node of connection while while data, a line
While data include form this while source node, destination node and the weights on this side.Processing platform is calculated in unit figure (i.e.
The processing platform of figure calculating is carried out using single computer) in, since the memory size of the local memory of single computer is limited,
When the data volume of calculative diagram data exceedes the memory size, it is necessary to handle the side data in diagram data, obtain
To multiple side data blocks, a data when data block includes one or more.
At present, when the side data in diagram data are handled, using fixed method so that computer is to one
When the node data of side data node in the block is calculated, if can not directly acquire with the relevant side data of the node, need
Will be to the side data that are adjusted and can just be needed of putting in order of the data when data are in the block.For example,
In GraphChi (a kind of unit figure calculates processing platform), due to being the calculating mould centered on destination node when scheming and calculating
Side data in diagram data are divided into more by formula, therefore, computer according to ID (mark) orders from small to large of destination node
A side data block (being known as Shard in GraphChi), corresponds to the whole when data are divided in one of same destination node
In data block, but the side data block that the regular difference split obtains is different, the data accuracy obtained when causing finally to merge
It is relatively low.
The content of the invention
It is an object of the invention to provide a kind of arithmetic system based on stratification large-scale graph data, it is right which passes through
Diagram data is split after being pre-processed according to the adjacent node of diagram data, is then integrated the diagram data after segmentation,
And to pretreatment diagram data, into row bound, some is gathered, and obtains original boundaries, while original boundaries and integral data are carried out
Contrast, judges the accuracy of partition data, and then ensure the accuracy of diagram data.
The purpose of the present invention can be achieved through the following technical solutions:
A kind of arithmetic system based on stratification large-scale graph data, including the analysis of diagram data collecting unit, diagram data are single
Member, diagram data administrative unit;
The diagram data collecting unit is used to gather large-scale diagram data, and diagram data is made an uproar by medium filtering
Sound filtration treatment, is then transmitted to diagram data analytic unit and diagram data administrative unit by the diagram data after processing;
Diagram data progress of the diagram data analytic unit to pretreatment is regular to be divided into different subdatas, while subnumber
According to corresponding calculate node is distributed to, then the result that each calculate node is calculated is counted, and by the knot of statistics
Fruit merges, data that each calculate node calculates, merge after data transfer to diagram data administrative unit;
The diagram data administrative unit is calculated diagram data is pre-processed, while the result of calculating and diagram data are analyzed
Contrasted in unit by separating the result of calculation after merging, determine its similarity, when similarity is more than 80%, then will be merged
Data transfer afterwards directly sends warning, diagram data analysis if similarity is less than 80% to user to diagram data analytic unit
Unit re-starts the segmentation of diagram data, and result of calculation is with directly counting pretreatment diagram data after reaching separation and merging
The result similarity of calculation is more than 80%.
Further, the diagram data analytic unit includes diagram data segmentation module, statistical module, diagram data and merges mould
Block;Diagram data segmentation module is to be divided the node of graph being connected two-by-two in pretreated diagram data, according to always two-by-two
The number of node, carries out since a border of diagram data using the node two-by-two of adjacent certain amount as a sub-graph data
Separate one by one, wherein each sub-graph data can form a boundary node between node again two-by-two, what time border connects to be formed
Super side;Statistical module is that multiple super sides that segmentation obtains are carried out statistics random integration;Diagram data merging module is will be random whole
Each sub-graph data after conjunction surpass while node merge to form total super while, and then obtain calculating data.
Further, the number of node is the 10%-20% of total interstitial content two-by-two two-by-two in each sub-graph data.
Further, the diagram data administrative unit includes computing module, contrast module, alarm module, and computing module is
The border node of graph for pre-processing diagram data is extracted, obtains original boundaries;Contrast module is by diagram data merging module
Merge obtained multiple total super sides to be compared, the total boundary node on super side and the coincidence factor of original boundaries node reach 80% with
When upper, then the joint account result was transmitted to user, splitting module to diagram data by alarm module if less than 80% sends out
Go out warning, diagram data segmentation module reselects the separation that boundary point carries out diagram data, until final comparison result is more than
Untill 80%.
Beneficial effects of the present invention:
The system is split by the adjacent node after being pre-processed to diagram data according to diagram data, then will segmentation
Diagram data afterwards is integrated, and to pretreatment diagram data, into row bound, some is gathered, and obtains original boundaries, while will be original
Border is contrasted with integral data, judges the accuracy of partition data, and then ensures the accuracy of diagram data.
Brief description of the drawings
In order to facilitate the understanding of those skilled in the art, the present invention is further illustrated below in conjunction with the accompanying drawings.
Fig. 1 is diagram data arithmetic system schematic diagram of the present invention.
Embodiment
One kind is based on stratification large-scale graph data arithmetic system, as shown in Figure 1, including diagram data collecting unit, figure number
According to analytic unit, diagram data administrative unit;
Diagram data collecting unit is used to gather large-scale diagram data, and diagram data is carried out noise mistake by medium filtering
Filter is handled, and the diagram data after processing then is transmitted to diagram data analytic unit and diagram data administrative unit;
Diagram data progress of the diagram data analytic unit to pretreatment is regular to be divided into different subdatas, while subdata point
Issue corresponding calculate node, then counted the result that each calculate node is calculated, and by the result of statistics into
Row merges, data that each calculate node calculates, merge after data transfer to diagram data administrative unit;Diagram data analytic unit bag
Include diagram data segmentation module, statistical module, diagram data merging module;Diagram data segmentation module is by pretreated diagram data
The node of graph being connected two-by-two is divided, will be adjacent since a border of diagram data according to the always number of node two-by-two
The node two-by-two of certain amount is separated one by one as a sub-graph data, wherein the number of node two-by-two in each sub-graph data
Mesh is the 10%-20% of total interstitial content two-by-two, and each sub-graph data can form a boundary node between node again two-by-two,
Some connection of border forms super side;Statistical module is that multiple super sides that segmentation obtains are carried out statistics random integration;Diagram data closes
And module be by each sub-graph data after random integration surpass while node merge to form total super while, and then obtain calculating number
According to;
Diagram data administrative unit is calculated diagram data is pre-processed, while by the result of calculating and diagram data analytic unit
In contrasted by separating the result of calculation after merging, its similarity is determined, when similarity is more than 80%, then by after merging
Data transfer directly sends warning, diagram data analytic unit if similarity is less than 80% to user to diagram data analytic unit
Re-start the segmentation of diagram data, until reach separate merge after result of calculation with directly being calculated pretreatment diagram data
As a result similarity is more than 80%;Diagram data administrative unit includes computing module, contrast module, alarm module, and computing module is pair
The border node of graph of pretreatment diagram data is extracted, and obtains original boundaries;Contrast module is will to be closed in diagram data merging module
And obtained multiple total super sides are compared, total boundary node on super side reaches more than 80% with the coincidence factor of original boundaries node
When, then the joint account result is transmitted to user, splitting module to diagram data by alarm module if less than 80% sends
Warning, diagram data segmentation module reselect the separation that boundary point carries out diagram data, until final comparison result is more than 80%
Untill.
Present invention disclosed above preferred embodiment is only intended to help and illustrates the present invention.Preferred embodiment is not detailed
All details are described, are not limited the invention to the specific embodiments described.Obviously, according to the content of this specification,
It can make many modifications and variations.This specification is chosen and specifically describes these embodiments, is in order to preferably explain the present invention
Principle and practical application so that skilled artisan can be best understood by and utilize the present invention.The present invention is only
Limited by claims and its four corner and equivalent.
Claims (4)
1. a kind of arithmetic system based on stratification large-scale graph data, it is characterised in that including diagram data collecting unit, figure number
According to analytic unit, diagram data administrative unit;
The diagram data collecting unit is used to gather large-scale diagram data, and diagram data is carried out noise mistake by medium filtering
Filter is handled, and the diagram data after processing then is transmitted to diagram data analytic unit and diagram data administrative unit;
Diagram data progress of the diagram data analytic unit to pretreatment is regular to be divided into different subdatas, while subdata point
Issue corresponding calculate node, then counted the result that each calculate node is calculated, and by the result of statistics into
Row merges, data that each calculate node calculates, merge after data transfer to diagram data administrative unit;
The diagram data administrative unit is calculated diagram data is pre-processed, while by the result of calculating and diagram data analytic unit
In contrasted by separating the result of calculation after merging, its similarity is determined, when similarity is more than 80%, then by after merging
Data transfer directly sends warning, diagram data analytic unit if similarity is less than 80% to user to diagram data analytic unit
Re-start the segmentation of diagram data, until reach separate merge after result of calculation with directly being calculated pretreatment diagram data
As a result similarity is more than 80%.
2. a kind of arithmetic system based on stratification large-scale graph data according to claim 1, it is characterised in that described
Diagram data analytic unit includes diagram data segmentation module, statistical module, diagram data merging module;Diagram data segmentation module is will be pre-
The node of graph being connected two-by-two in diagram data after processing is divided, according to the always number of node two-by-two, from the one of diagram data
Border starts one by one to be separated the node two-by-two of adjacent certain amount as a sub-graph data, wherein each subgraph number
According to that can form a boundary node between node two-by-two again, some connection of border forms super side;Statistical module is to split
To multiple super sides carry out statistics random integration;Diagram data merging module is that each sub-graph data after random integration is surpassed side
Node merges to form total super side, and then obtains calculating data.
3. a kind of arithmetic system based on stratification large-scale graph data according to claim 2, it is characterised in that each
The number of node is the 10%-20% of total interstitial content two-by-two two-by-two in the sub-graph data.
4. a kind of arithmetic system based on stratification large-scale graph data according to claim 1, it is characterised in that described
Diagram data administrative unit includes computing module, contrast module, alarm module, and computing module is the boundary graph to pre-processing diagram data
Node is extracted, and obtains original boundaries;Contrast module be the multiple total super sides that will merge in diagram data merging module into
Row compares, when total boundary node on super side reaches more than 80% with the coincidence factor of original boundaries node, then by the joint account knot
Fruit is transmitted to user, and splitting module to diagram data by alarm module if less than 80% sends warning, diagram data segmentation module
The separation that boundary point carries out diagram data is reselected, untill final comparison result is more than 80%.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711160660.6A CN107943918B (en) | 2017-11-20 | 2017-11-20 | Operation system based on hierarchical large-scale graph data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711160660.6A CN107943918B (en) | 2017-11-20 | 2017-11-20 | Operation system based on hierarchical large-scale graph data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107943918A true CN107943918A (en) | 2018-04-20 |
CN107943918B CN107943918B (en) | 2021-09-07 |
Family
ID=61929244
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711160660.6A Active CN107943918B (en) | 2017-11-20 | 2017-11-20 | Operation system based on hierarchical large-scale graph data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107943918B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111046237A (en) * | 2018-10-10 | 2020-04-21 | 北京京东金融科技控股有限公司 | User behavior data processing method and device, electronic equipment and readable medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1315122A2 (en) * | 2001-11-21 | 2003-05-28 | Oki Data Corporation | Graphical data processing |
CN102254012A (en) * | 2011-07-19 | 2011-11-23 | 北京大学 | Graph data storing method and subgraph enquiring method based on external memory |
CN103336808A (en) * | 2013-06-25 | 2013-10-02 | 中国科学院信息工程研究所 | System and method for real-time graph data processing based on BSP (Board Support Package) model |
CN104618153A (en) * | 2015-01-20 | 2015-05-13 | 北京大学 | Dynamic fault-tolerant method and dynamic fault-tolerant system based on P2P in distributed parallel graph processing |
CN105426375A (en) * | 2014-09-22 | 2016-03-23 | 阿里巴巴集团控股有限公司 | Relationship network calculation method and apparatus |
CN105590321A (en) * | 2015-12-24 | 2016-05-18 | 华中科技大学 | Block-based subgraph construction and distributed graph processing method |
CN105677755A (en) * | 2015-12-30 | 2016-06-15 | 杭州华为数字技术有限公司 | Method and device for processing graph data |
-
2017
- 2017-11-20 CN CN201711160660.6A patent/CN107943918B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1315122A2 (en) * | 2001-11-21 | 2003-05-28 | Oki Data Corporation | Graphical data processing |
CN102254012A (en) * | 2011-07-19 | 2011-11-23 | 北京大学 | Graph data storing method and subgraph enquiring method based on external memory |
CN103336808A (en) * | 2013-06-25 | 2013-10-02 | 中国科学院信息工程研究所 | System and method for real-time graph data processing based on BSP (Board Support Package) model |
CN105426375A (en) * | 2014-09-22 | 2016-03-23 | 阿里巴巴集团控股有限公司 | Relationship network calculation method and apparatus |
CN104618153A (en) * | 2015-01-20 | 2015-05-13 | 北京大学 | Dynamic fault-tolerant method and dynamic fault-tolerant system based on P2P in distributed parallel graph processing |
CN105590321A (en) * | 2015-12-24 | 2016-05-18 | 华中科技大学 | Block-based subgraph construction and distributed graph processing method |
CN105677755A (en) * | 2015-12-30 | 2016-06-15 | 杭州华为数字技术有限公司 | Method and device for processing graph data |
Non-Patent Citations (1)
Title |
---|
SUNGWOONG KIM ETC.: "Image Segmentation UsingHigher-Order Correlation Clustering", 《IEEE》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111046237A (en) * | 2018-10-10 | 2020-04-21 | 北京京东金融科技控股有限公司 | User behavior data processing method and device, electronic equipment and readable medium |
CN111046237B (en) * | 2018-10-10 | 2024-04-05 | 京东科技控股股份有限公司 | User behavior data processing method and device, electronic equipment and readable medium |
Also Published As
Publication number | Publication date |
---|---|
CN107943918B (en) | 2021-09-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Saracco et al. | Inferring monopartite projections of bipartite networks: an entropy-based approach | |
CN106548343B (en) | Illegal transaction detection method and device | |
CN110019876B (en) | Data query method, electronic device and storage medium | |
CN105550583A (en) | Random forest classification method based detection method for malicious application in Android platform | |
CN106874857A (en) | A kind of living body determination method and system based on video analysis | |
CN110572362A (en) | network attack detection method and device for multiple types of unbalanced abnormal traffic | |
TW200828053A (en) | A method for grid-based data clustering | |
JP2009104591A (en) | Web document clustering method and system | |
CN106202430A (en) | Live platform user interest-degree digging system based on correlation rule and method for digging | |
Amato et al. | Towards automatic generation of hardware classifiers | |
CN110378301A (en) | Pedestrian recognition methods and system again | |
CN105677755B (en) | A kind of method and device handling diagram data | |
CN102651030B (en) | Social network association searching method based on graphics processing unit (GPU) multiple sequence alignment algorithm | |
CN106909454B (en) | Rule processing method and equipment | |
CN104346443A (en) | Web text processing method and device | |
CN104778159B (en) | Word segmenting method and device based on word weights | |
CN107943918A (en) | A kind of arithmetic system based on stratification large-scale graph data | |
CN106649344A (en) | Network log compression method and apparatus | |
US20160292151A1 (en) | Distributed storytelling framework for intelligence analysis | |
CN106844338B (en) | method for detecting entity column of network table based on dependency relationship between attributes | |
CN108694192A (en) | The judgment method and device of type of webpage | |
CN106383738A (en) | Task processing method and distributed computing framework | |
CN106649315A (en) | Method and device for processing path navigation | |
WO2016202209A1 (en) | Method and device for estimating user influence in social network using graph simplification technique | |
CN107526794A (en) | Data processing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |