CN107943918A - A kind of arithmetic system based on stratification large-scale graph data - Google Patents

A kind of arithmetic system based on stratification large-scale graph data Download PDF

Info

Publication number
CN107943918A
CN107943918A CN201711160660.6A CN201711160660A CN107943918A CN 107943918 A CN107943918 A CN 107943918A CN 201711160660 A CN201711160660 A CN 201711160660A CN 107943918 A CN107943918 A CN 107943918A
Authority
CN
China
Prior art keywords
diagram data
data
module
node
diagram
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711160660.6A
Other languages
Chinese (zh)
Other versions
CN107943918B (en
Inventor
姚伟强
周基初
张宇
郑凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Asia Pacific Mdt Infotech Ltd
Original Assignee
Hefei Asia Pacific Mdt Infotech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei Asia Pacific Mdt Infotech Ltd filed Critical Hefei Asia Pacific Mdt Infotech Ltd
Priority to CN201711160660.6A priority Critical patent/CN107943918B/en
Publication of CN107943918A publication Critical patent/CN107943918A/en
Application granted granted Critical
Publication of CN107943918B publication Critical patent/CN107943918B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Evolutionary Computation (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of arithmetic system based on stratification large-scale graph data, including diagram data collecting unit, diagram data analytic unit, diagram data administrative unit;Diagram data analytic unit includes diagram data segmentation module, statistical module, diagram data merging module;Diagram data administrative unit includes computing module, contrast module, alarm module;Diagram data collecting unit is used to gather large-scale diagram data, and diagram data is carried out noise filtering processing by medium filtering, and the diagram data after processing then is transmitted to diagram data analytic unit and diagram data administrative unit.The present invention is split by the adjacent node after being pre-processed to diagram data according to diagram data, then the diagram data after segmentation is integrated, and into row bound, some is gathered to pretreatment diagram data, obtain original boundaries, original boundaries and integral data are contrasted at the same time, judge the accuracy of partition data, and then ensure the accuracy of diagram data.

Description

A kind of arithmetic system based on stratification large-scale graph data
Technical field
The invention belongs to large-scale graph data process field, is related to a kind of computing system based on stratification large-scale graph data System.
Background technology
Excavated the epoch in big data, figure not only can directly describe the neck such as computer science, chemistry and bioinformatics Many practical applications in domain, such as social networks, web (webpage) figure, chemical substance and biological structure etc., while can also use To describe various data mining algorithms, such as matrix decomposition or shortest path etc..Wherein, figure includes multiple nodes and connection The side of each node, the node data of diagram data including each node and each node of connection while while data, a line While data include form this while source node, destination node and the weights on this side.Processing platform is calculated in unit figure (i.e. The processing platform of figure calculating is carried out using single computer) in, since the memory size of the local memory of single computer is limited, When the data volume of calculative diagram data exceedes the memory size, it is necessary to handle the side data in diagram data, obtain To multiple side data blocks, a data when data block includes one or more.
At present, when the side data in diagram data are handled, using fixed method so that computer is to one When the node data of side data node in the block is calculated, if can not directly acquire with the relevant side data of the node, need Will be to the side data that are adjusted and can just be needed of putting in order of the data when data are in the block.For example, In GraphChi (a kind of unit figure calculates processing platform), due to being the calculating mould centered on destination node when scheming and calculating Side data in diagram data are divided into more by formula, therefore, computer according to ID (mark) orders from small to large of destination node A side data block (being known as Shard in GraphChi), corresponds to the whole when data are divided in one of same destination node In data block, but the side data block that the regular difference split obtains is different, the data accuracy obtained when causing finally to merge It is relatively low.
The content of the invention
It is an object of the invention to provide a kind of arithmetic system based on stratification large-scale graph data, it is right which passes through Diagram data is split after being pre-processed according to the adjacent node of diagram data, is then integrated the diagram data after segmentation, And to pretreatment diagram data, into row bound, some is gathered, and obtains original boundaries, while original boundaries and integral data are carried out Contrast, judges the accuracy of partition data, and then ensure the accuracy of diagram data.
The purpose of the present invention can be achieved through the following technical solutions:
A kind of arithmetic system based on stratification large-scale graph data, including the analysis of diagram data collecting unit, diagram data are single Member, diagram data administrative unit;
The diagram data collecting unit is used to gather large-scale diagram data, and diagram data is made an uproar by medium filtering Sound filtration treatment, is then transmitted to diagram data analytic unit and diagram data administrative unit by the diagram data after processing;
Diagram data progress of the diagram data analytic unit to pretreatment is regular to be divided into different subdatas, while subnumber According to corresponding calculate node is distributed to, then the result that each calculate node is calculated is counted, and by the knot of statistics Fruit merges, data that each calculate node calculates, merge after data transfer to diagram data administrative unit;
The diagram data administrative unit is calculated diagram data is pre-processed, while the result of calculating and diagram data are analyzed Contrasted in unit by separating the result of calculation after merging, determine its similarity, when similarity is more than 80%, then will be merged Data transfer afterwards directly sends warning, diagram data analysis if similarity is less than 80% to user to diagram data analytic unit Unit re-starts the segmentation of diagram data, and result of calculation is with directly counting pretreatment diagram data after reaching separation and merging The result similarity of calculation is more than 80%.
Further, the diagram data analytic unit includes diagram data segmentation module, statistical module, diagram data and merges mould Block;Diagram data segmentation module is to be divided the node of graph being connected two-by-two in pretreated diagram data, according to always two-by-two The number of node, carries out since a border of diagram data using the node two-by-two of adjacent certain amount as a sub-graph data Separate one by one, wherein each sub-graph data can form a boundary node between node again two-by-two, what time border connects to be formed Super side;Statistical module is that multiple super sides that segmentation obtains are carried out statistics random integration;Diagram data merging module is will be random whole Each sub-graph data after conjunction surpass while node merge to form total super while, and then obtain calculating data.
Further, the number of node is the 10%-20% of total interstitial content two-by-two two-by-two in each sub-graph data.
Further, the diagram data administrative unit includes computing module, contrast module, alarm module, and computing module is The border node of graph for pre-processing diagram data is extracted, obtains original boundaries;Contrast module is by diagram data merging module Merge obtained multiple total super sides to be compared, the total boundary node on super side and the coincidence factor of original boundaries node reach 80% with When upper, then the joint account result was transmitted to user, splitting module to diagram data by alarm module if less than 80% sends out Go out warning, diagram data segmentation module reselects the separation that boundary point carries out diagram data, until final comparison result is more than Untill 80%.
Beneficial effects of the present invention:
The system is split by the adjacent node after being pre-processed to diagram data according to diagram data, then will segmentation Diagram data afterwards is integrated, and to pretreatment diagram data, into row bound, some is gathered, and obtains original boundaries, while will be original Border is contrasted with integral data, judges the accuracy of partition data, and then ensures the accuracy of diagram data.
Brief description of the drawings
In order to facilitate the understanding of those skilled in the art, the present invention is further illustrated below in conjunction with the accompanying drawings.
Fig. 1 is diagram data arithmetic system schematic diagram of the present invention.
Embodiment
One kind is based on stratification large-scale graph data arithmetic system, as shown in Figure 1, including diagram data collecting unit, figure number According to analytic unit, diagram data administrative unit;
Diagram data collecting unit is used to gather large-scale diagram data, and diagram data is carried out noise mistake by medium filtering Filter is handled, and the diagram data after processing then is transmitted to diagram data analytic unit and diagram data administrative unit;
Diagram data progress of the diagram data analytic unit to pretreatment is regular to be divided into different subdatas, while subdata point Issue corresponding calculate node, then counted the result that each calculate node is calculated, and by the result of statistics into Row merges, data that each calculate node calculates, merge after data transfer to diagram data administrative unit;Diagram data analytic unit bag Include diagram data segmentation module, statistical module, diagram data merging module;Diagram data segmentation module is by pretreated diagram data The node of graph being connected two-by-two is divided, will be adjacent since a border of diagram data according to the always number of node two-by-two The node two-by-two of certain amount is separated one by one as a sub-graph data, wherein the number of node two-by-two in each sub-graph data Mesh is the 10%-20% of total interstitial content two-by-two, and each sub-graph data can form a boundary node between node again two-by-two, Some connection of border forms super side;Statistical module is that multiple super sides that segmentation obtains are carried out statistics random integration;Diagram data closes And module be by each sub-graph data after random integration surpass while node merge to form total super while, and then obtain calculating number According to;
Diagram data administrative unit is calculated diagram data is pre-processed, while by the result of calculating and diagram data analytic unit In contrasted by separating the result of calculation after merging, its similarity is determined, when similarity is more than 80%, then by after merging Data transfer directly sends warning, diagram data analytic unit if similarity is less than 80% to user to diagram data analytic unit Re-start the segmentation of diagram data, until reach separate merge after result of calculation with directly being calculated pretreatment diagram data As a result similarity is more than 80%;Diagram data administrative unit includes computing module, contrast module, alarm module, and computing module is pair The border node of graph of pretreatment diagram data is extracted, and obtains original boundaries;Contrast module is will to be closed in diagram data merging module And obtained multiple total super sides are compared, total boundary node on super side reaches more than 80% with the coincidence factor of original boundaries node When, then the joint account result is transmitted to user, splitting module to diagram data by alarm module if less than 80% sends Warning, diagram data segmentation module reselect the separation that boundary point carries out diagram data, until final comparison result is more than 80% Untill.
Present invention disclosed above preferred embodiment is only intended to help and illustrates the present invention.Preferred embodiment is not detailed All details are described, are not limited the invention to the specific embodiments described.Obviously, according to the content of this specification, It can make many modifications and variations.This specification is chosen and specifically describes these embodiments, is in order to preferably explain the present invention Principle and practical application so that skilled artisan can be best understood by and utilize the present invention.The present invention is only Limited by claims and its four corner and equivalent.

Claims (4)

1. a kind of arithmetic system based on stratification large-scale graph data, it is characterised in that including diagram data collecting unit, figure number According to analytic unit, diagram data administrative unit;
The diagram data collecting unit is used to gather large-scale diagram data, and diagram data is carried out noise mistake by medium filtering Filter is handled, and the diagram data after processing then is transmitted to diagram data analytic unit and diagram data administrative unit;
Diagram data progress of the diagram data analytic unit to pretreatment is regular to be divided into different subdatas, while subdata point Issue corresponding calculate node, then counted the result that each calculate node is calculated, and by the result of statistics into Row merges, data that each calculate node calculates, merge after data transfer to diagram data administrative unit;
The diagram data administrative unit is calculated diagram data is pre-processed, while by the result of calculating and diagram data analytic unit In contrasted by separating the result of calculation after merging, its similarity is determined, when similarity is more than 80%, then by after merging Data transfer directly sends warning, diagram data analytic unit if similarity is less than 80% to user to diagram data analytic unit Re-start the segmentation of diagram data, until reach separate merge after result of calculation with directly being calculated pretreatment diagram data As a result similarity is more than 80%.
2. a kind of arithmetic system based on stratification large-scale graph data according to claim 1, it is characterised in that described Diagram data analytic unit includes diagram data segmentation module, statistical module, diagram data merging module;Diagram data segmentation module is will be pre- The node of graph being connected two-by-two in diagram data after processing is divided, according to the always number of node two-by-two, from the one of diagram data Border starts one by one to be separated the node two-by-two of adjacent certain amount as a sub-graph data, wherein each subgraph number According to that can form a boundary node between node two-by-two again, some connection of border forms super side;Statistical module is to split To multiple super sides carry out statistics random integration;Diagram data merging module is that each sub-graph data after random integration is surpassed side Node merges to form total super side, and then obtains calculating data.
3. a kind of arithmetic system based on stratification large-scale graph data according to claim 2, it is characterised in that each The number of node is the 10%-20% of total interstitial content two-by-two two-by-two in the sub-graph data.
4. a kind of arithmetic system based on stratification large-scale graph data according to claim 1, it is characterised in that described Diagram data administrative unit includes computing module, contrast module, alarm module, and computing module is the boundary graph to pre-processing diagram data Node is extracted, and obtains original boundaries;Contrast module be the multiple total super sides that will merge in diagram data merging module into Row compares, when total boundary node on super side reaches more than 80% with the coincidence factor of original boundaries node, then by the joint account knot Fruit is transmitted to user, and splitting module to diagram data by alarm module if less than 80% sends warning, diagram data segmentation module The separation that boundary point carries out diagram data is reselected, untill final comparison result is more than 80%.
CN201711160660.6A 2017-11-20 2017-11-20 Operation system based on hierarchical large-scale graph data Active CN107943918B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711160660.6A CN107943918B (en) 2017-11-20 2017-11-20 Operation system based on hierarchical large-scale graph data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711160660.6A CN107943918B (en) 2017-11-20 2017-11-20 Operation system based on hierarchical large-scale graph data

Publications (2)

Publication Number Publication Date
CN107943918A true CN107943918A (en) 2018-04-20
CN107943918B CN107943918B (en) 2021-09-07

Family

ID=61929244

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711160660.6A Active CN107943918B (en) 2017-11-20 2017-11-20 Operation system based on hierarchical large-scale graph data

Country Status (1)

Country Link
CN (1) CN107943918B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046237A (en) * 2018-10-10 2020-04-21 北京京东金融科技控股有限公司 User behavior data processing method and device, electronic equipment and readable medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1315122A2 (en) * 2001-11-21 2003-05-28 Oki Data Corporation Graphical data processing
CN102254012A (en) * 2011-07-19 2011-11-23 北京大学 Graph data storing method and subgraph enquiring method based on external memory
CN103336808A (en) * 2013-06-25 2013-10-02 中国科学院信息工程研究所 System and method for real-time graph data processing based on BSP (Board Support Package) model
CN104618153A (en) * 2015-01-20 2015-05-13 北京大学 Dynamic fault-tolerant method and dynamic fault-tolerant system based on P2P in distributed parallel graph processing
CN105426375A (en) * 2014-09-22 2016-03-23 阿里巴巴集团控股有限公司 Relationship network calculation method and apparatus
CN105590321A (en) * 2015-12-24 2016-05-18 华中科技大学 Block-based subgraph construction and distributed graph processing method
CN105677755A (en) * 2015-12-30 2016-06-15 杭州华为数字技术有限公司 Method and device for processing graph data

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1315122A2 (en) * 2001-11-21 2003-05-28 Oki Data Corporation Graphical data processing
CN102254012A (en) * 2011-07-19 2011-11-23 北京大学 Graph data storing method and subgraph enquiring method based on external memory
CN103336808A (en) * 2013-06-25 2013-10-02 中国科学院信息工程研究所 System and method for real-time graph data processing based on BSP (Board Support Package) model
CN105426375A (en) * 2014-09-22 2016-03-23 阿里巴巴集团控股有限公司 Relationship network calculation method and apparatus
CN104618153A (en) * 2015-01-20 2015-05-13 北京大学 Dynamic fault-tolerant method and dynamic fault-tolerant system based on P2P in distributed parallel graph processing
CN105590321A (en) * 2015-12-24 2016-05-18 华中科技大学 Block-based subgraph construction and distributed graph processing method
CN105677755A (en) * 2015-12-30 2016-06-15 杭州华为数字技术有限公司 Method and device for processing graph data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SUNGWOONG KIM ETC.: "Image Segmentation UsingHigher-Order Correlation Clustering", 《IEEE》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046237A (en) * 2018-10-10 2020-04-21 北京京东金融科技控股有限公司 User behavior data processing method and device, electronic equipment and readable medium
CN111046237B (en) * 2018-10-10 2024-04-05 京东科技控股股份有限公司 User behavior data processing method and device, electronic equipment and readable medium

Also Published As

Publication number Publication date
CN107943918B (en) 2021-09-07

Similar Documents

Publication Publication Date Title
Saracco et al. Inferring monopartite projections of bipartite networks: an entropy-based approach
CN106548343B (en) Illegal transaction detection method and device
CN110019876B (en) Data query method, electronic device and storage medium
CN105550583A (en) Random forest classification method based detection method for malicious application in Android platform
CN106874857A (en) A kind of living body determination method and system based on video analysis
CN110572362A (en) network attack detection method and device for multiple types of unbalanced abnormal traffic
TW200828053A (en) A method for grid-based data clustering
JP2009104591A (en) Web document clustering method and system
CN106202430A (en) Live platform user interest-degree digging system based on correlation rule and method for digging
Amato et al. Towards automatic generation of hardware classifiers
CN110378301A (en) Pedestrian recognition methods and system again
CN105677755B (en) A kind of method and device handling diagram data
CN102651030B (en) Social network association searching method based on graphics processing unit (GPU) multiple sequence alignment algorithm
CN106909454B (en) Rule processing method and equipment
CN104346443A (en) Web text processing method and device
CN104778159B (en) Word segmenting method and device based on word weights
CN107943918A (en) A kind of arithmetic system based on stratification large-scale graph data
CN106649344A (en) Network log compression method and apparatus
US20160292151A1 (en) Distributed storytelling framework for intelligence analysis
CN106844338B (en) method for detecting entity column of network table based on dependency relationship between attributes
CN108694192A (en) The judgment method and device of type of webpage
CN106383738A (en) Task processing method and distributed computing framework
CN106649315A (en) Method and device for processing path navigation
WO2016202209A1 (en) Method and device for estimating user influence in social network using graph simplification technique
CN107526794A (en) Data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant