CN107943918B - Operation system based on hierarchical large-scale graph data - Google Patents
Operation system based on hierarchical large-scale graph data Download PDFInfo
- Publication number
- CN107943918B CN107943918B CN201711160660.6A CN201711160660A CN107943918B CN 107943918 B CN107943918 B CN 107943918B CN 201711160660 A CN201711160660 A CN 201711160660A CN 107943918 B CN107943918 B CN 107943918B
- Authority
- CN
- China
- Prior art keywords
- graph data
- module
- data
- graph
- nodes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9024—Graphs; Linked lists
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Abstract
The invention discloses an operation system based on hierarchical large-scale graph data, which comprises a graph data acquisition unit, a graph data analysis unit and a graph data management unit; the graph data analysis unit comprises a graph data segmentation module, a statistic module and a graph data merging module; the graph data management unit comprises an operation module, a comparison module and a warning module; the graph data acquisition unit is used for acquiring large-scale graph data, performing noise filtering processing on the graph data through median filtering, and transmitting the processed graph data to the graph data analysis unit and the graph data management unit. According to the method, the graph data are preprocessed and then are segmented according to adjacent nodes of the graph data, then the segmented graph data are integrated, the boundary points of the preprocessed graph data are collected to obtain an original boundary, meanwhile, the original boundary is compared with the integrated data, the accuracy of the segmented data is judged, and the accuracy of the graph data is further ensured.
Description
Technical Field
The invention belongs to the field of large-scale graph data processing, and relates to an operation system based on hierarchical large-scale graph data.
Background
In the era of big data mining, graphs can not only directly describe many real-world applications in the fields of computer science, chemistry, and bioinformatics, such as social networks, web (web page) graphs, chemicals, and biological structures, but also describe various data mining algorithms, such as matrix decomposition or shortest path, etc. The graph comprises a plurality of nodes and edges connecting the nodes, the graph data comprises node data of the nodes and edge data of the edges connecting the nodes, and the edge data of one edge comprises a source node, a destination node and a weight of the edge. In a stand-alone graph computation processing platform (i.e., a processing platform that performs graph computation by using a single computer), because the memory capacity of the local memory of the single computer is limited, when the data amount of graph data to be computed exceeds the memory capacity, edge data in the graph data needs to be processed to obtain a plurality of edge data blocks, where one edge data block includes one or more edge data.
At present, when processing edge data in graph data, a fixed method is adopted, so that when a computer calculates node data of a node in an edge data block, if the edge data related to the node cannot be directly acquired, the required edge data can be acquired only by adjusting the arrangement sequence of the edge data in the edge data block. For example, in graph chi (a stand-alone graph computation processing platform), because a computation mode with a destination node as a center is used in graph computation, a computer divides edge data in graph data into a plurality of edge data blocks (called Shard in graph chi) in order of ID (identification) of the destination node from small to large, and divides all edge data corresponding to the same destination node into one edge data block, but the edge data blocks obtained by different division rules are different, so that the accuracy of data obtained in final merging is low.
Disclosure of Invention
The invention aims to provide an operation system based on hierarchical large-scale graph data, which is characterized in that graph data are preprocessed and then are divided according to adjacent nodes of the graph data, then the divided graph data are integrated, boundary points of the preprocessed graph data are collected to obtain an original boundary, and meanwhile, the original boundary is compared with the integrated data to judge the accuracy of the divided data so as to ensure the accuracy of the graph data.
The purpose of the invention can be realized by the following technical scheme:
an operation system based on hierarchical large-scale graph data comprises a graph data acquisition unit, a graph data analysis unit and a graph data management unit;
the image data acquisition unit is used for acquiring large-scale image data, performing noise filtering processing on the image data through median filtering, and transmitting the processed image data to the image data analysis unit and the image data management unit;
the graph data analysis unit is used for regularly dividing the preprocessed graph data into different subdata, simultaneously distributing the subdata to corresponding computing nodes, then counting the results obtained by computing each computing node, combining the counted results, and transmitting the data computed by each computing node and the combined data to the graph data management unit;
the graph data management unit calculates the preprocessed graph data, compares the calculation result with the calculation result obtained after separation and combination in the graph data analysis unit to determine the similarity, when the similarity is greater than 80%, the combined data is transmitted to a user, if the similarity is less than 80%, a warning is directly sent to the graph data analysis unit, and the graph data analysis unit performs graph data segmentation again until the similarity between the calculation result obtained after separation and combination and the result obtained by directly calculating the preprocessed graph data is greater than 80%.
Furthermore, the graph data analysis unit comprises a graph data segmentation module, a statistic module and a graph data merging module; the graph data partitioning module is used for partitioning graph nodes which are connected in pairs in the graph data after preprocessing, and partitioning every two adjacent nodes in a certain number as sub-graph data one by one from a boundary of the graph data according to the number of the total every two nodes, wherein a boundary node can be formed between every two nodes of each sub-graph data, and the points of the boundary are connected to form a super edge; the statistical module is used for performing statistical random integration on a plurality of super edges obtained by segmentation; and the graph data merging module merges the nodes of the super edges of the randomly integrated sub-graph data to form a total super edge so as to obtain the calculated data.
Further, the number of pairwise nodes in each sub-graph data is 10% -20% of the total number of pairwise nodes.
Furthermore, the graph data management unit comprises an operation module, a comparison module and a warning module, wherein the operation module is used for extracting boundary graph nodes of preprocessed graph data to obtain an original boundary; the comparison module compares a plurality of total excess edges obtained by combining in the graph data combination module, when the coincidence rate of the boundary nodes of the total excess edges and the original boundary nodes reaches more than 80%, the combined calculation result is transmitted to a user, if the coincidence rate is less than 80%, the warning module sends a warning to the graph data division module, and the graph data division module reselects the boundary points to divide the graph data until the final comparison result is more than 80%.
The invention has the beneficial effects that:
the system is used for preprocessing the graph data, then segmenting the graph data according to adjacent nodes of the graph data, then integrating the segmented graph data, performing boundary point acquisition on the preprocessed graph data to obtain an original boundary, and meanwhile, comparing the original boundary with the integrated data to judge the accuracy of the segmented data so as to ensure the accuracy of the graph data.
Drawings
In order to facilitate understanding for those skilled in the art, the present invention will be further described with reference to the accompanying drawings.
FIG. 1 is a diagram of a data calculation system according to the present invention.
Detailed Description
A large-scale graph data operation system based on hierarchy is shown in figure 1 and comprises a graph data acquisition unit, a graph data analysis unit and a graph data management unit;
the image data acquisition unit is used for acquiring large-scale image data, performing noise filtering processing on the image data through median filtering, and transmitting the processed image data to the image data analysis unit and the image data management unit;
the graph data analysis unit is used for regularly dividing the preprocessed graph data into different subdata, simultaneously distributing the subdata to corresponding computing nodes, then counting the results obtained by computing each computing node, combining the counted results, and transmitting the data computed by each computing node and the combined data to the graph data management unit; the graph data analysis unit comprises a graph data segmentation module, a statistic module and a graph data merging module; the graph data partitioning module is used for partitioning graph nodes which are connected in pairs in the graph data after preprocessing, and partitioning adjacent nodes in a certain number as sub-graph data one by one from a boundary of the graph data according to the number of the total nodes in pairs, wherein the number of the nodes in each sub-graph data is 10% -20% of the number of the total nodes in each sub-graph data, a boundary node can be formed between every two nodes in each sub-graph data, and the points of the boundary are connected to form a super edge; the statistical module is used for performing statistical random integration on a plurality of super edges obtained by segmentation; the graph data merging module merges nodes of the randomly integrated super edges of the sub-graph data to form a total super edge so as to obtain calculated data;
the graph data management unit calculates the preprocessed graph data, compares the calculation result with the calculation result obtained after separation and combination in the graph data analysis unit to determine the similarity, if the similarity is greater than 80%, the combined data is transmitted to a user, if the similarity is less than 80%, a warning is directly sent to the graph data analysis unit, and the graph data analysis unit performs graph data segmentation again until the similarity between the calculation result obtained after separation and combination and the result obtained by directly calculating the preprocessed graph data is greater than 80%; the graph data management unit comprises an operation module, a comparison module and a warning module, wherein the operation module is used for extracting boundary graph nodes of preprocessed graph data to obtain an original boundary; the comparison module compares a plurality of total excess edges obtained by combining in the graph data combination module, when the coincidence rate of the boundary nodes of the total excess edges and the original boundary nodes reaches more than 80%, the combined calculation result is transmitted to a user, if the coincidence rate is less than 80%, the warning module sends a warning to the graph data division module, and the graph data division module reselects the boundary points to divide the graph data until the final comparison result is more than 80%.
The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.
Claims (2)
1. An operation system based on hierarchical large-scale graph data is characterized by comprising a graph data acquisition unit, a graph data analysis unit and a graph data management unit;
the image data acquisition unit is used for acquiring large-scale image data, performing noise filtering processing on the image data through median filtering, and transmitting the processed image data to the image data analysis unit and the image data management unit;
the graph data analysis unit is used for regularly dividing the preprocessed graph data into different subdata, simultaneously distributing the subdata to corresponding computing nodes, then counting the results obtained by computing each computing node, combining the counted results, and transmitting the data computed by each computing node and the combined data to the graph data management unit;
the graph data management unit calculates the preprocessed graph data, compares the calculation result with the calculation result obtained by dividing and combining in the graph data analysis unit to determine the similarity, if the similarity is greater than 80%, the combined data is transmitted to a user, if the similarity is less than 80%, a warning is directly sent to the graph data analysis unit, and the graph data analysis unit divides the graph data again until the similarity between the calculation result obtained by dividing and combining and the calculation result obtained by directly calculating the preprocessed graph data is greater than 80%;
the graph data analysis unit comprises a graph data segmentation module, a statistic module and a graph data merging module; the graph data dividing module divides graph nodes which are connected in pairs in the graph data after pretreatment, and divides the adjacent nodes in pairs as sub-graph data one by one from a boundary of the graph data according to the number of the total nodes in pairs, wherein a boundary node can be formed between every two nodes of each sub-graph data, and the points of the boundary are connected to form a super edge; the statistical module is used for performing statistical random integration on a plurality of super edges obtained by segmentation; the graph data merging module merges nodes of the randomly integrated super edges of the sub-graph data to form a total super edge so as to obtain calculated data;
the number of the two nodes in each sub-graph data is 10% -20% of the total number of the two nodes.
2. The operation system based on the hierarchical large-scale graph data according to claim 1, wherein the graph data management unit comprises an operation module, a comparison module and a warning module, wherein the operation module extracts boundary graph nodes of preprocessed graph data to obtain an original boundary; the comparison module compares the boundary node of the total excess edge with the original boundary node, when the coincidence rate of the boundary node of the total excess edge and the original boundary node reaches more than 80%, the combined calculation result is transmitted to a user, if the coincidence rate is less than 80%, the warning module sends a warning to the graph data segmentation module, and the graph data segmentation module reselects the boundary node to segment the graph data until the final comparison result is more than 80%.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711160660.6A CN107943918B (en) | 2017-11-20 | 2017-11-20 | Operation system based on hierarchical large-scale graph data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711160660.6A CN107943918B (en) | 2017-11-20 | 2017-11-20 | Operation system based on hierarchical large-scale graph data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107943918A CN107943918A (en) | 2018-04-20 |
CN107943918B true CN107943918B (en) | 2021-09-07 |
Family
ID=61929244
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711160660.6A Active CN107943918B (en) | 2017-11-20 | 2017-11-20 | Operation system based on hierarchical large-scale graph data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107943918B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111046237B (en) * | 2018-10-10 | 2024-04-05 | 京东科技控股股份有限公司 | User behavior data processing method and device, electronic equipment and readable medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1315122A2 (en) * | 2001-11-21 | 2003-05-28 | Oki Data Corporation | Graphical data processing |
CN102254012A (en) * | 2011-07-19 | 2011-11-23 | 北京大学 | Graph data storing method and subgraph enquiring method based on external memory |
CN103336808A (en) * | 2013-06-25 | 2013-10-02 | 中国科学院信息工程研究所 | System and method for real-time graph data processing based on BSP (Board Support Package) model |
CN104618153A (en) * | 2015-01-20 | 2015-05-13 | 北京大学 | Dynamic fault-tolerant method and dynamic fault-tolerant system based on P2P in distributed parallel graph processing |
CN105426375A (en) * | 2014-09-22 | 2016-03-23 | 阿里巴巴集团控股有限公司 | Relationship network calculation method and apparatus |
CN105590321A (en) * | 2015-12-24 | 2016-05-18 | 华中科技大学 | Block-based subgraph construction and distributed graph processing method |
CN105677755A (en) * | 2015-12-30 | 2016-06-15 | 杭州华为数字技术有限公司 | Method and device for processing graph data |
-
2017
- 2017-11-20 CN CN201711160660.6A patent/CN107943918B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1315122A2 (en) * | 2001-11-21 | 2003-05-28 | Oki Data Corporation | Graphical data processing |
CN102254012A (en) * | 2011-07-19 | 2011-11-23 | 北京大学 | Graph data storing method and subgraph enquiring method based on external memory |
CN103336808A (en) * | 2013-06-25 | 2013-10-02 | 中国科学院信息工程研究所 | System and method for real-time graph data processing based on BSP (Board Support Package) model |
CN105426375A (en) * | 2014-09-22 | 2016-03-23 | 阿里巴巴集团控股有限公司 | Relationship network calculation method and apparatus |
CN104618153A (en) * | 2015-01-20 | 2015-05-13 | 北京大学 | Dynamic fault-tolerant method and dynamic fault-tolerant system based on P2P in distributed parallel graph processing |
CN105590321A (en) * | 2015-12-24 | 2016-05-18 | 华中科技大学 | Block-based subgraph construction and distributed graph processing method |
CN105677755A (en) * | 2015-12-30 | 2016-06-15 | 杭州华为数字技术有限公司 | Method and device for processing graph data |
Non-Patent Citations (1)
Title |
---|
Image Segmentation UsingHigher-Order Correlation Clustering;Sungwoong Kim etc.;《IEEE》;20140128;第1761-1774页 * |
Also Published As
Publication number | Publication date |
---|---|
CN107943918A (en) | 2018-04-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110019876B (en) | Data query method, electronic device and storage medium | |
CN103678671A (en) | Dynamic community detection method in social network | |
CN104391879B (en) | The method and device of hierarchical clustering | |
CN111026865B (en) | Knowledge graph relationship alignment method, device, equipment and storage medium | |
CN107967487A (en) | A kind of colliding data fusion method based on evidence distance and uncertainty | |
Van Hieu et al. | Fast k-means clustering for very large datasets based on mapreduce combined with a new cutting method | |
CN108228442A (en) | A kind of detection method and device of abnormal nodes | |
Yan et al. | Scalable load balancing for mapreduce-based record linkage | |
CN109086291A (en) | A kind of parallel method for detecting abnormality and system based on MapReduce | |
Xu et al. | VATE: A trade-off between memory and preserving time for high accurate cardinality estimation under sliding time window | |
CN107943918B (en) | Operation system based on hierarchical large-scale graph data | |
CN102722732A (en) | Image set matching method based on data second order static modeling | |
CN108510010A (en) | A kind of density peaks clustering method and system based on prescreening | |
CN117294497A (en) | Network traffic abnormality detection method and device, electronic equipment and storage medium | |
CN105824853A (en) | Clustering equipment and method | |
KR20160081231A (en) | Method and system for extracting image feature based on map-reduce for searching image | |
Cai et al. | Automatic relation-aware graph network proliferation | |
CN113706459B (en) | Detection and simulation repair device for abnormal brain area of autism patient | |
Jin et al. | A Hierarchical clustering community algorithm which missed the signal in the process of transmission | |
Alaettin et al. | Data stream clustering: a review | |
CN114610825A (en) | Method and device for confirming associated grid set, electronic equipment and storage medium | |
Li et al. | High resolution radar data fusion based on clustering algorithm | |
CN107749065A (en) | VIBE background modeling methods based on CUDA | |
Gamage et al. | Common randomized shortest paths (C-RSP): A simple yet effective framework for multi-view graph embedding | |
CN108846543B (en) | Computing method and device for non-overlapping community set quality metric index |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |