CN107943918B - Operation system based on hierarchical large-scale graph data - Google Patents

Operation system based on hierarchical large-scale graph data Download PDF

Info

Publication number
CN107943918B
CN107943918B CN201711160660.6A CN201711160660A CN107943918B CN 107943918 B CN107943918 B CN 107943918B CN 201711160660 A CN201711160660 A CN 201711160660A CN 107943918 B CN107943918 B CN 107943918B
Authority
CN
China
Prior art keywords
graph data
module
data
graph
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711160660.6A
Other languages
Chinese (zh)
Other versions
CN107943918A (en
Inventor
姚伟强
周基初
张宇
郑凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Yamooc Information Technology Co ltd
Original Assignee
Hefei Yamooc Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei Yamooc Information Technology Co ltd filed Critical Hefei Yamooc Information Technology Co ltd
Priority to CN201711160660.6A priority Critical patent/CN107943918B/en
Publication of CN107943918A publication Critical patent/CN107943918A/en
Application granted granted Critical
Publication of CN107943918B publication Critical patent/CN107943918B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Abstract

The invention discloses an operation system based on hierarchical large-scale graph data, which comprises a graph data acquisition unit, a graph data analysis unit and a graph data management unit; the graph data analysis unit comprises a graph data segmentation module, a statistic module and a graph data merging module; the graph data management unit comprises an operation module, a comparison module and a warning module; the graph data acquisition unit is used for acquiring large-scale graph data, performing noise filtering processing on the graph data through median filtering, and transmitting the processed graph data to the graph data analysis unit and the graph data management unit. According to the method, the graph data are preprocessed and then are segmented according to adjacent nodes of the graph data, then the segmented graph data are integrated, the boundary points of the preprocessed graph data are collected to obtain an original boundary, meanwhile, the original boundary is compared with the integrated data, the accuracy of the segmented data is judged, and the accuracy of the graph data is further ensured.

Description

Operation system based on hierarchical large-scale graph data
Technical Field
The invention belongs to the field of large-scale graph data processing, and relates to an operation system based on hierarchical large-scale graph data.
Background
In the era of big data mining, graphs can not only directly describe many real-world applications in the fields of computer science, chemistry, and bioinformatics, such as social networks, web (web page) graphs, chemicals, and biological structures, but also describe various data mining algorithms, such as matrix decomposition or shortest path, etc. The graph comprises a plurality of nodes and edges connecting the nodes, the graph data comprises node data of the nodes and edge data of the edges connecting the nodes, and the edge data of one edge comprises a source node, a destination node and a weight of the edge. In a stand-alone graph computation processing platform (i.e., a processing platform that performs graph computation by using a single computer), because the memory capacity of the local memory of the single computer is limited, when the data amount of graph data to be computed exceeds the memory capacity, edge data in the graph data needs to be processed to obtain a plurality of edge data blocks, where one edge data block includes one or more edge data.
At present, when processing edge data in graph data, a fixed method is adopted, so that when a computer calculates node data of a node in an edge data block, if the edge data related to the node cannot be directly acquired, the required edge data can be acquired only by adjusting the arrangement sequence of the edge data in the edge data block. For example, in graph chi (a stand-alone graph computation processing platform), because a computation mode with a destination node as a center is used in graph computation, a computer divides edge data in graph data into a plurality of edge data blocks (called Shard in graph chi) in order of ID (identification) of the destination node from small to large, and divides all edge data corresponding to the same destination node into one edge data block, but the edge data blocks obtained by different division rules are different, so that the accuracy of data obtained in final merging is low.
Disclosure of Invention
The invention aims to provide an operation system based on hierarchical large-scale graph data, which is characterized in that graph data are preprocessed and then are divided according to adjacent nodes of the graph data, then the divided graph data are integrated, boundary points of the preprocessed graph data are collected to obtain an original boundary, and meanwhile, the original boundary is compared with the integrated data to judge the accuracy of the divided data so as to ensure the accuracy of the graph data.
The purpose of the invention can be realized by the following technical scheme:
an operation system based on hierarchical large-scale graph data comprises a graph data acquisition unit, a graph data analysis unit and a graph data management unit;
the image data acquisition unit is used for acquiring large-scale image data, performing noise filtering processing on the image data through median filtering, and transmitting the processed image data to the image data analysis unit and the image data management unit;
the graph data analysis unit is used for regularly dividing the preprocessed graph data into different subdata, simultaneously distributing the subdata to corresponding computing nodes, then counting the results obtained by computing each computing node, combining the counted results, and transmitting the data computed by each computing node and the combined data to the graph data management unit;
the graph data management unit calculates the preprocessed graph data, compares the calculation result with the calculation result obtained after separation and combination in the graph data analysis unit to determine the similarity, when the similarity is greater than 80%, the combined data is transmitted to a user, if the similarity is less than 80%, a warning is directly sent to the graph data analysis unit, and the graph data analysis unit performs graph data segmentation again until the similarity between the calculation result obtained after separation and combination and the result obtained by directly calculating the preprocessed graph data is greater than 80%.
Furthermore, the graph data analysis unit comprises a graph data segmentation module, a statistic module and a graph data merging module; the graph data partitioning module is used for partitioning graph nodes which are connected in pairs in the graph data after preprocessing, and partitioning every two adjacent nodes in a certain number as sub-graph data one by one from a boundary of the graph data according to the number of the total every two nodes, wherein a boundary node can be formed between every two nodes of each sub-graph data, and the points of the boundary are connected to form a super edge; the statistical module is used for performing statistical random integration on a plurality of super edges obtained by segmentation; and the graph data merging module merges the nodes of the super edges of the randomly integrated sub-graph data to form a total super edge so as to obtain the calculated data.
Further, the number of pairwise nodes in each sub-graph data is 10% -20% of the total number of pairwise nodes.
Furthermore, the graph data management unit comprises an operation module, a comparison module and a warning module, wherein the operation module is used for extracting boundary graph nodes of preprocessed graph data to obtain an original boundary; the comparison module compares a plurality of total excess edges obtained by combining in the graph data combination module, when the coincidence rate of the boundary nodes of the total excess edges and the original boundary nodes reaches more than 80%, the combined calculation result is transmitted to a user, if the coincidence rate is less than 80%, the warning module sends a warning to the graph data division module, and the graph data division module reselects the boundary points to divide the graph data until the final comparison result is more than 80%.
The invention has the beneficial effects that:
the system is used for preprocessing the graph data, then segmenting the graph data according to adjacent nodes of the graph data, then integrating the segmented graph data, performing boundary point acquisition on the preprocessed graph data to obtain an original boundary, and meanwhile, comparing the original boundary with the integrated data to judge the accuracy of the segmented data so as to ensure the accuracy of the graph data.
Drawings
In order to facilitate understanding for those skilled in the art, the present invention will be further described with reference to the accompanying drawings.
FIG. 1 is a diagram of a data calculation system according to the present invention.
Detailed Description
A large-scale graph data operation system based on hierarchy is shown in figure 1 and comprises a graph data acquisition unit, a graph data analysis unit and a graph data management unit;
the image data acquisition unit is used for acquiring large-scale image data, performing noise filtering processing on the image data through median filtering, and transmitting the processed image data to the image data analysis unit and the image data management unit;
the graph data analysis unit is used for regularly dividing the preprocessed graph data into different subdata, simultaneously distributing the subdata to corresponding computing nodes, then counting the results obtained by computing each computing node, combining the counted results, and transmitting the data computed by each computing node and the combined data to the graph data management unit; the graph data analysis unit comprises a graph data segmentation module, a statistic module and a graph data merging module; the graph data partitioning module is used for partitioning graph nodes which are connected in pairs in the graph data after preprocessing, and partitioning adjacent nodes in a certain number as sub-graph data one by one from a boundary of the graph data according to the number of the total nodes in pairs, wherein the number of the nodes in each sub-graph data is 10% -20% of the number of the total nodes in each sub-graph data, a boundary node can be formed between every two nodes in each sub-graph data, and the points of the boundary are connected to form a super edge; the statistical module is used for performing statistical random integration on a plurality of super edges obtained by segmentation; the graph data merging module merges nodes of the randomly integrated super edges of the sub-graph data to form a total super edge so as to obtain calculated data;
the graph data management unit calculates the preprocessed graph data, compares the calculation result with the calculation result obtained after separation and combination in the graph data analysis unit to determine the similarity, if the similarity is greater than 80%, the combined data is transmitted to a user, if the similarity is less than 80%, a warning is directly sent to the graph data analysis unit, and the graph data analysis unit performs graph data segmentation again until the similarity between the calculation result obtained after separation and combination and the result obtained by directly calculating the preprocessed graph data is greater than 80%; the graph data management unit comprises an operation module, a comparison module and a warning module, wherein the operation module is used for extracting boundary graph nodes of preprocessed graph data to obtain an original boundary; the comparison module compares a plurality of total excess edges obtained by combining in the graph data combination module, when the coincidence rate of the boundary nodes of the total excess edges and the original boundary nodes reaches more than 80%, the combined calculation result is transmitted to a user, if the coincidence rate is less than 80%, the warning module sends a warning to the graph data division module, and the graph data division module reselects the boundary points to divide the graph data until the final comparison result is more than 80%.
The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.

Claims (2)

1. An operation system based on hierarchical large-scale graph data is characterized by comprising a graph data acquisition unit, a graph data analysis unit and a graph data management unit;
the image data acquisition unit is used for acquiring large-scale image data, performing noise filtering processing on the image data through median filtering, and transmitting the processed image data to the image data analysis unit and the image data management unit;
the graph data analysis unit is used for regularly dividing the preprocessed graph data into different subdata, simultaneously distributing the subdata to corresponding computing nodes, then counting the results obtained by computing each computing node, combining the counted results, and transmitting the data computed by each computing node and the combined data to the graph data management unit;
the graph data management unit calculates the preprocessed graph data, compares the calculation result with the calculation result obtained by dividing and combining in the graph data analysis unit to determine the similarity, if the similarity is greater than 80%, the combined data is transmitted to a user, if the similarity is less than 80%, a warning is directly sent to the graph data analysis unit, and the graph data analysis unit divides the graph data again until the similarity between the calculation result obtained by dividing and combining and the calculation result obtained by directly calculating the preprocessed graph data is greater than 80%;
the graph data analysis unit comprises a graph data segmentation module, a statistic module and a graph data merging module; the graph data dividing module divides graph nodes which are connected in pairs in the graph data after pretreatment, and divides the adjacent nodes in pairs as sub-graph data one by one from a boundary of the graph data according to the number of the total nodes in pairs, wherein a boundary node can be formed between every two nodes of each sub-graph data, and the points of the boundary are connected to form a super edge; the statistical module is used for performing statistical random integration on a plurality of super edges obtained by segmentation; the graph data merging module merges nodes of the randomly integrated super edges of the sub-graph data to form a total super edge so as to obtain calculated data;
the number of the two nodes in each sub-graph data is 10% -20% of the total number of the two nodes.
2. The operation system based on the hierarchical large-scale graph data according to claim 1, wherein the graph data management unit comprises an operation module, a comparison module and a warning module, wherein the operation module extracts boundary graph nodes of preprocessed graph data to obtain an original boundary; the comparison module compares the boundary node of the total excess edge with the original boundary node, when the coincidence rate of the boundary node of the total excess edge and the original boundary node reaches more than 80%, the combined calculation result is transmitted to a user, if the coincidence rate is less than 80%, the warning module sends a warning to the graph data segmentation module, and the graph data segmentation module reselects the boundary node to segment the graph data until the final comparison result is more than 80%.
CN201711160660.6A 2017-11-20 2017-11-20 Operation system based on hierarchical large-scale graph data Active CN107943918B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711160660.6A CN107943918B (en) 2017-11-20 2017-11-20 Operation system based on hierarchical large-scale graph data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711160660.6A CN107943918B (en) 2017-11-20 2017-11-20 Operation system based on hierarchical large-scale graph data

Publications (2)

Publication Number Publication Date
CN107943918A CN107943918A (en) 2018-04-20
CN107943918B true CN107943918B (en) 2021-09-07

Family

ID=61929244

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711160660.6A Active CN107943918B (en) 2017-11-20 2017-11-20 Operation system based on hierarchical large-scale graph data

Country Status (1)

Country Link
CN (1) CN107943918B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046237B (en) * 2018-10-10 2024-04-05 京东科技控股股份有限公司 User behavior data processing method and device, electronic equipment and readable medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1315122A2 (en) * 2001-11-21 2003-05-28 Oki Data Corporation Graphical data processing
CN102254012A (en) * 2011-07-19 2011-11-23 北京大学 Graph data storing method and subgraph enquiring method based on external memory
CN103336808A (en) * 2013-06-25 2013-10-02 中国科学院信息工程研究所 System and method for real-time graph data processing based on BSP (Board Support Package) model
CN104618153A (en) * 2015-01-20 2015-05-13 北京大学 Dynamic fault-tolerant method and dynamic fault-tolerant system based on P2P in distributed parallel graph processing
CN105426375A (en) * 2014-09-22 2016-03-23 阿里巴巴集团控股有限公司 Relationship network calculation method and apparatus
CN105590321A (en) * 2015-12-24 2016-05-18 华中科技大学 Block-based subgraph construction and distributed graph processing method
CN105677755A (en) * 2015-12-30 2016-06-15 杭州华为数字技术有限公司 Method and device for processing graph data

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1315122A2 (en) * 2001-11-21 2003-05-28 Oki Data Corporation Graphical data processing
CN102254012A (en) * 2011-07-19 2011-11-23 北京大学 Graph data storing method and subgraph enquiring method based on external memory
CN103336808A (en) * 2013-06-25 2013-10-02 中国科学院信息工程研究所 System and method for real-time graph data processing based on BSP (Board Support Package) model
CN105426375A (en) * 2014-09-22 2016-03-23 阿里巴巴集团控股有限公司 Relationship network calculation method and apparatus
CN104618153A (en) * 2015-01-20 2015-05-13 北京大学 Dynamic fault-tolerant method and dynamic fault-tolerant system based on P2P in distributed parallel graph processing
CN105590321A (en) * 2015-12-24 2016-05-18 华中科技大学 Block-based subgraph construction and distributed graph processing method
CN105677755A (en) * 2015-12-30 2016-06-15 杭州华为数字技术有限公司 Method and device for processing graph data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Image Segmentation UsingHigher-Order Correlation Clustering;Sungwoong Kim etc.;《IEEE》;20140128;第1761-1774页 *

Also Published As

Publication number Publication date
CN107943918A (en) 2018-04-20

Similar Documents

Publication Publication Date Title
CN110019876B (en) Data query method, electronic device and storage medium
CN103678671A (en) Dynamic community detection method in social network
CN104391879B (en) The method and device of hierarchical clustering
CN111026865B (en) Knowledge graph relationship alignment method, device, equipment and storage medium
CN107967487A (en) A kind of colliding data fusion method based on evidence distance and uncertainty
Van Hieu et al. Fast k-means clustering for very large datasets based on mapreduce combined with a new cutting method
CN108228442A (en) A kind of detection method and device of abnormal nodes
Yan et al. Scalable load balancing for mapreduce-based record linkage
CN109086291A (en) A kind of parallel method for detecting abnormality and system based on MapReduce
Xu et al. VATE: A trade-off between memory and preserving time for high accurate cardinality estimation under sliding time window
CN107943918B (en) Operation system based on hierarchical large-scale graph data
CN102722732A (en) Image set matching method based on data second order static modeling
CN108510010A (en) A kind of density peaks clustering method and system based on prescreening
CN117294497A (en) Network traffic abnormality detection method and device, electronic equipment and storage medium
CN105824853A (en) Clustering equipment and method
KR20160081231A (en) Method and system for extracting image feature based on map-reduce for searching image
Cai et al. Automatic relation-aware graph network proliferation
CN113706459B (en) Detection and simulation repair device for abnormal brain area of autism patient
Jin et al. A Hierarchical clustering community algorithm which missed the signal in the process of transmission
Alaettin et al. Data stream clustering: a review
CN114610825A (en) Method and device for confirming associated grid set, electronic equipment and storage medium
Li et al. High resolution radar data fusion based on clustering algorithm
CN107749065A (en) VIBE background modeling methods based on CUDA
Gamage et al. Common randomized shortest paths (C-RSP): A simple yet effective framework for multi-view graph embedding
CN108846543B (en) Computing method and device for non-overlapping community set quality metric index

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant