CN113139098A - Abstract extraction method and system for big homogeneous relation graph - Google Patents

Abstract extraction method and system for big homogeneous relation graph Download PDF

Info

Publication number
CN113139098A
CN113139098A CN202110308958.7A CN202110308958A CN113139098A CN 113139098 A CN113139098 A CN 113139098A CN 202110308958 A CN202110308958 A CN 202110308958A CN 113139098 A CN113139098 A CN 113139098A
Authority
CN
China
Prior art keywords
graph
data
node
graph data
reconstructed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110308958.7A
Other languages
Chinese (zh)
Other versions
CN113139098B (en
Inventor
刘盛华
程学旗
周厚铨
刘财政
沈华伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN202110308958.7A priority Critical patent/CN113139098B/en
Publication of CN113139098A publication Critical patent/CN113139098A/en
Application granted granted Critical
Publication of CN113139098B publication Critical patent/CN113139098B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9014Indexing; Data structures therefor; Storage structures hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and a system for abstracting a big picture of a homogeneous relation, which comprises the following steps: acquiring relation graph data to be abstracted as current graph data, wherein the relation graph data is a big homogeneous relation graph, and each node in the current graph data is regarded as a super point; grouping nodes in the current graph data through locality sensitive hashing according to the adjacency matrix of the current graph data; randomly selecting a plurality of super point pairs from the group, respectively calculating the difference between the combined super point pairs and the relational graph data, and selecting the super point pair with the minimum difference for combination to obtain the reconstructed graph data; the reconstructed image data is output as a digest extraction result.

Description

Abstract extraction method and system for big homogeneous relation graph
Technical Field
The invention relates to the field of data mining, in particular to a rapid summary abstract and reconstruction technology and device for a big homogeneous relation graph.
Background
Social media has surpassed search engines at present and become the first large-flow source of the internet, and the social media accounts for 46 percent and 40 percent respectively. Relational graph data becomes a common data to be applied to many sciences and engineering, and a graph can be represented as a structure that graph G ═ V, E is a pair of sets: a set of nodes V represents entities and a set of edges E represents relationships or connections between entities. In computer science, a network contains nodes and edges; in social science, the corresponding terms are actors and relationships, and the terms have equivalent meanings in this document. By the first quarter of 2020, the number of active accounts of WeChat and WeChat in the combined month reaches 12.025 hundred million, which means that WeChat is formally an application of more than 10 hundred million active users in the first month of China, and the sending amount of WeChat messages increases by 64.2% and 8.23 hundred million people to receive and send WeChat red packets from the beginning to the beginning. A quarter report by 31/3/2020. The most spread is "1 trillion dollars", 3 months and 31 days in 2020 and 12 months later, the arri platform GMV reaches 7.053 trillion renminbi. In the past 12 months, 7.8 million Chinese people buy products or services on the Ali platform, the number of active users in the mobile month is 8.46 million in the Ali Chinese retail market, and the number of active buyers in the year is 7.26 million. The messaging or shopping relationships between users of these platforms form a graph, as shown in fig. 1 and 2, the users form nodes in the graph, and the edges form shopping or messaging relationships between users. In most cases, graph data is created by one or more generation processes that are capable of not only representing activities in the system, but also collecting observations of entities. However, these large-scale graph data volumes are very large and difficult to process, analyze and understand, which presents a significant challenge to the graph data mining application. One effective technique to address these challenges is graph summarization. Given a graph G, the goal is to find a compact representation of G, i.e. a summary graph with supernodes and superedges (as an example in fig. 3). The abstract model usually needs to reconstruct a graph from the abstract graph, so the reconstruction scheme is the core of most abstract models. Aiming at the idea of abstract summarization, the current method mainly comprises the following categories:
(1) error of adjacency matrix: such methods attempt to minimize some measure of error between the adjacency matrix of the original graph and the adjacency matrix of the reconstructed graph to achieve the best summary result.
(2) The total number of edges: in this method, the objective function is defined as the sum of the number of edges in the summary map and the edge correction information, and the performance of summary is improved by the number of edges and the correction information.
(3) Coding length: such methods typically use the Minimum Description Length (MDL) principle, with the total code length being the objective function. The minimum description length is usually optimized under different coding schemes.
The above-mentioned method mainly focuses on static simple graphs and applies to a certain type of graph data, and cannot have general applicability. Meanwhile, the method needs to calculate the relationship between each pair of nodes so as to summarize the graph data, although some methods can optimize and accelerate the calculation process, the calculation complexity is still high, and particularly when large graph data is faced, the methods generally have the defects of low efficiency, time consumption, more memory occupation and the like.
Disclosure of Invention
The invention relates to the field of data mining, in particular to a rapid summary abstract and reconstruction technology and a device of a big homogeneous relation graph, which has the core idea that the key idea is the same as a common configuration model, the homogeneity is the same as the node type in the graph, and for example, the node types in a social network are all users; in the E-commerce shopping network, part of nodes represent customers, part of nodes represent commodities, the types of the nodes are different, namely heterogeneous graphs, the method sets some super edges which are proportional to the degrees of the nodes, and a distribution scheme (CR scheme) based on configuration can be usually embedded into the existing summary method and can improve the performance and effect of the existing related summary method; based on the Minimum Description Length (MDL) in the information theory as a principle to minimize the cost of summary graphs and reconstruction errors. Meanwhile, the method designs a rapid algorithm called DPGS algorithm, and the rapid algorithm is used for grouping the candidate nodes of the large graph based on a Local Sensitive Hashing (LSH) method and performing greedy combination in the group so as to achieve the purpose of summarizing the abstract of the graph. In theory, the method demonstrates that the perturbation of the laplacian eigenvalues is limited by minimizing the reconstruction error.
Aiming at the defects of the prior art, the invention provides a method for extracting an abstract of a big homogeneous relation graph, which comprises the following steps:
step 1, obtaining relation graph data to be abstracted as current graph data, wherein the relation graph data is a large homogeneous relation graph, and each node in the current graph data is regarded as a super point;
step 2, grouping nodes in the current graph data through locality sensitive hashing according to the adjacency matrix of the current graph data;
step 3, randomly selecting a plurality of super point pairs from the group, respectively calculating the difference between the combined super point pairs and the relational graph data, and selecting the super point pair with the minimum difference for combination to obtain the reconstructed graph data;
and 4, outputting the reconstructed picture data as a summary extraction result.
The abstract extraction method of the big homogeneity relation graph comprises the following steps of adding 1 to the iteration times after obtaining the reconstructed graph data in the step 3; and (4) judging whether the current iteration number reaches a preset value, if so, executing the step (4), otherwise, taking the reconstructed picture data as the current picture data, and executing the step (2) again.
The abstract extraction method of the big homogeneity relation graph comprises the following steps of: obtaining the difference L (M, D) between the combined pair of the excess points and the data of the relational graph through the following formula;
Figure BDA0002988957450000031
Figure BDA0002988957450000032
Figure BDA0002988957450000033
L(M,D)=L(M)+L(D|M)
wherein d isiAnd djDegree, D, representing nodes i and jkAnd DlRepresenting supernumeraryNode SkAnd SlDegree of (A)SThe adjacent matrix of the summary graph obtained after the hyper point pairs are combined, A 'is the adjacent matrix of the graph reconstructed by the summary graph, A is the adjacent matrix of the relation graph data, A' (i, j) is the adjacent edge weight from the node i to the node j in the adjacent matrix of the graph reconstructed by the summary graph, AS(i, j) is the adjacent edge weight from node i to node j in the adjacent matrix of the abstract graph, A (i, j) is the adjacent edge weight from node i to node j in the adjacent matrix of the relation graph data, LN is the function of the length of the coded positive integer, LNU is the function of the length of the Bernoulli code, n and m are the number of nodes and edges respectively, w is the number of the nodes and edges respectivelyiIs the weight of the edge, diFor the degree of node structure, L (M) is the description length of the summary graph, and L (D | M) is the reconstruction error.
The abstract extraction method of the big homogeneity relation graph comprises the following steps of: each node in the current graph data can obtain a hash value according to the neighbor nodes, and the nodes with the same hash value are divided into a group.
The abstract extraction method of the big homogeneous relation graph is characterized in that the relation graph data is an unweighted undirected graph.
The invention also provides a system for abstracting the big map of the homogeneous relation, which comprises the following steps:
the system comprises a module 1, a data processing module and a data processing module, wherein the module 1 is used for acquiring relational graph data to be abstracted as current graph data, the relational graph data is a big homogeneous relational graph, and each node in the current graph data is regarded as a super point;
the module 2 is used for grouping nodes in the current graph data through locality sensitive hashing according to the adjacency matrix of the current graph data;
a module 3, configured to randomly select a plurality of super point pairs from the group, respectively calculate a difference between the merged super point pair and the relational graph data, and select the super point pair with the smallest difference to merge, so as to obtain reconstructed graph data;
and the module 4 is used for outputting the reconstructed picture data as a summary extraction result.
In the abstract extraction system of the big homogeneity relation graph, after the module 3 obtains the reconstructed graph data, the iteration number is added by 1; and judging whether the current iteration number reaches a preset value, if so, calling the module 4, and otherwise, calling the module 2 again by taking the reconstructed graph data as the current graph data.
The abstract extraction system of the big map of the homogeneous relation comprises the following modules 3: obtaining the difference L (M, D) between the combined pair of the excess points and the data of the relational graph through the following formula;
Figure BDA0002988957450000041
Figure BDA0002988957450000042
Figure BDA0002988957450000043
L(M,D)=L(M)+L(D|M)
wherein d isiAnd djDegree, D, representing nodes i and jkAnd DlRepresenting a supernode SkAnd SlDegree of (A)SThe adjacent matrix of the summary graph obtained after the hyper point pairs are combined, A 'is the adjacent matrix of the graph reconstructed by the summary graph, A is the adjacent matrix of the relation graph data, A' (i, j) is the adjacent edge weight from the node i to the node j in the adjacent matrix of the graph reconstructed by the summary graph, AS(i, j) is the adjacent edge weight from node i to node j in the adjacent matrix of the abstract graph, A (i, j) is the adjacent edge weight from node i to node j in the adjacent matrix of the relation graph data, LN is the function of the length of the coded positive integer, LNU is the function of the length of the Bernoulli code, n and m are the number of nodes and edges respectively, w is the number of the nodes and edges respectivelyiIs the weight of the edge, diFor the degree of node i, L (M) is the description length of the summary graph, and L (D | M) is the reconstruction error.
The abstract extraction system of the big map of the homogeneous relation comprises the following modules 2: each node in the current graph data can obtain a hash value according to the neighbor nodes, and the nodes with the same hash value are divided into a group.
The abstract extraction system of the big homogeneous relation graph is characterized in that the relation graph data is an unweighted undirected graph.
According to the scheme, the invention has the advantages that:
(1) novel reconstruction scheme: we have designed a Graph abstract summary model called Degree-forecasting Graph summary model (DPGS), and propose a novel reconstruction scheme based on the configuration model. We have theoretically demonstrated that our DPGS uses reconstruction errors to limit map perturbations.
(2) Compatibility: the scheme designed by the user can be universally applied to different diagram abstract summarizing scenes, and the diagram abstract summarizing quality is improved.
(3) Effectiveness: comparison between the synthetic and real-world maps confirmed the superiority of the designed reconstruction method and demonstrated that our DPGS algorithm outperformed several of the latest methods with better summarization. In addition, the abstract map can help to effectively and effectively train the neural network of the map.
(4) And (3) expandability: the DPGS model has high running speed, and theoretical analysis shows that the complexity is in a linear relation on the number of edges and can be applied to abstract summarization of large graph data.
Drawings
FIG. 1 is a graph data diagram;
FIG. 2 is a adjacency matrix diagram of an oblivious graph;
FIG. 3 is a adjacency matrix diagram of the ownership map;
FIG. 4 is a schematic diagram of the operation of the process of the present invention;
FIG. 5 is a flow chart of an implementation of the method of the present invention.
Detailed Description
The invention relates to the field of data mining, in particular to a rapid summary abstract and reconstruction technology and a device of a big homogeneous relation graph, which comprise the following steps:
the invention provides a technology for converting a large graph into a summary graph, which can measure whether the summary graph is excellent or not specifically, the summary graph is restored to a reconstructed graph from the summary graph, and then the difference between the reconstructed graph and an original graph is calculated, wherein the smaller the difference is, the more excellent the summary graph is, and the better the technology for converting the large graph into the summary graph is.
Therefore, the invention uses a new reconstruction method based on the configuration model to better measure the difference between the reconstructed image and the original image. (1) Reconstruction scheme: given a summary graph, we can reconstruct the original graph based on the graph summary model. A new abstract graph reconstruction method is defined in the method: a is to beSAnd a' are respectively represented as a summary map and a reconstructed adjacency matrix. The configuration-based reconstruction method (CR method) is calculated as follows:
Figure BDA0002988957450000051
wherein SkAnd SlAre supernodes belonging to nodes i and j, respectively. We use diAnd djRepresenting degrees of nodes i and j; we used DkAnd DlRepresenting a supernode SkAnd SlDegree of (c). The reconstructed edge weight a' (i, j) is therefore proportional to the product of the end point degrees. In the abstract diagram, each node of the original graph belongs to one unique supernode. And k and l respectively represent the super node subscripts of the nodes i and j in the abstract graph.
In equation one, Sk and Sl are represented in Dk and Dl. That is, the edge weights of the reconstructed graph will take into account the degree di dj of the node itself and the degree Dk Dl of the supernodes Sk, Sl to which they belong.
(2) Degree maintenance: the reconstruction method can be used for achieving degree preservation, and has the following properties:
Figure BDA0002988957450000061
where A' is the adjacency matrix of the reconstructed graph, A is the adjacency matrix of the original graph, A (i, j) represents the adjacent edge weight from the point i to the point j in the adjacency matrix,
Figure BDA0002988957450000062
the degree of node i is indicated.
(3) Define a new merit function: the method uses the MDL principle to find the abstract diagram. We minimize the total description length while assuming one needs both the summary map and the reconstruction error to accurately reconstruct the original map. Define a new graph summary and reconstruction error function:
l (M, D) ═ L (M) + L (D | M) (formula three)
Where L (M) is the description length of the summary map and L (D | M) is the reconstruction error. The method uses KL divergence to represent the coding error of the critical matrix of the original image and the adjacent matrix of the summary map, and is defined as follows
Figure BDA0002988957450000063
L(M)=LN(n)+LN(m)+nLN(n)+∑iLN(di)+∑iLN(wi) + LNU (n (n +1)/2, m) (equation five)
Where LN is a function of the coded positive integer length and LNU is a bernoulli coded length function. n, m is the number of nodes and edges, wiIs the weight of the edge, diIs the degree of node i.
(4) Defining eigenvalue perturbation:
the normalized Laplacian matrices of the original graph and the reconstructed graph are denoted as L and L'. Then, the total squared errors of their eigenvalues (denoted by λ (i) and λ' (i)) are as follows:
Figure BDA0002988957450000064
(5) designing a fast algorithm, called DPGS algorithm, wherein the input of the algorithm is G ═ V, E, the iteration number of the algorithm is T, and the output is GS=(VS,ES,AS) The core steps of the algorithm are as follows:
Figure BDA0002988957450000065
Figure BDA0002988957450000071
in order to make the aforementioned features and effects of the present invention more comprehensible, embodiments accompanied with figures are described in detail below.
In this embodiment, the implementation flow is shown in fig. 5, and the detailed implementation process is described in detail by taking an undirected graph as an example. The specific embodiment is as follows:
step 1, a undirected weightless graph is given, as shown in fig. 4(1), an adjacency matrix a, a node number N and an edge number E of the undirected weightless graph are obtained, and the iteration number T of the setting method is obtained, wherein a, N and E are determined according to an actual graph, T is a parameter, and the parameter can be set by experts or experiments. And assume that each point is a super point. One or more nodes may be included in a supernode.
Step 2, the basic idea of the LSH method is as follows: after two adjacent data points in the original data space are subjected to the same mapping or projection transformation (projection), the probability that the two data points are still adjacent in the new data space is high, and the probability that non-adjacent data points are mapped to the same group is low. All the super points are divided into different groups using the LSH method. When grouped by the LSH method here, a is used for the calculation. Specifically, each point computes a hash value based on its neighbors, and the hash values are grouped into groups that are identical. As in fig. 4(2), assuming that 4, 5, 6,7, and 8 are grouped into a group, it should be noted that this illustration is merely for the purpose of illustrating the algorithm flow, and that the specific points are grouped into a group, depending on the specific LSH method.
And 3, randomly sampling different point pairs for the divided groups, and calculating a formula three and a formula four to minimize the value of the formula three, namely the new graph abstract and the reconstruction error function. Merging a point pair will change the result of equation 3; therefore, a plurality of point pairs are sampled in step 3, the values of formula 3 after combination are respectively calculated, and the point pair which enables formula 3 to be minimum is selected for combination. For example, two point pairs of (5,8) and (6,7) are sampled, and after calculation, the merging (5,8) is found to make the value of formula 3 smaller, so that the merging (5,8) into a new super node is selected.
The purpose of using sampling here is to eliminate the need to compute each pair of nodes, resulting in less computational complexity. As shown in fig. 4 (sample point 5 and 8), formula three and formula four are calculated, as shown in fig. 4 (sample point 6 and 7), formula three and formula four are calculated, and points 5 and 8 are merged together according to the calculation result. It should be noted that this illustration is merely for the purpose of illustrating the algorithm flow, and that the specific points need to be combined into a group, depending on the calculations of the specific formula three and formula four.
And 4, updating the LSH function, and grouping the over points again according to the updated LSH function.
Step 5, repeating the step 2, the step 3 and the step 4 until the set iteration number T, and ending
Step 6, returning the abstract of the original graph
Although specific embodiments of the invention have been disclosed for illustrative purposes and the accompanying drawings, which are included to provide a further understanding of the invention and are incorporated by reference, those skilled in the art will appreciate that: the corresponding methods and tools may be implemented on other platforms without departing from the spirit and scope of the present invention and the appended claims. Therefore, the present invention should not be limited to the disclosure of the embodiment and the drawings.
The following are system examples corresponding to the above method examples, and this embodiment can be implemented in cooperation with the above embodiments. The related technical details mentioned in the above embodiments are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the above-described embodiments.
The invention also provides a system for abstracting the big map of the homogeneous relation, which comprises the following steps:
the system comprises a module 1, a data processing module and a data processing module, wherein the module 1 is used for acquiring relational graph data to be abstracted as current graph data, the relational graph data is a big homogeneous relational graph, and each node in the current graph data is regarded as a super point;
the module 2 is used for grouping nodes in the current graph data through locality sensitive hashing according to the adjacency matrix of the current graph data;
a module 3, configured to randomly select a plurality of super point pairs from the group, respectively calculate a difference between the merged super point pair and the relational graph data, and select the super point pair with the smallest difference to merge, so as to obtain reconstructed graph data;
and the module 4 is used for outputting the reconstructed picture data as a summary extraction result.
In the abstract extraction system of the big homogeneity relation graph, after the module 3 obtains the reconstructed graph data, the iteration number is added by 1; and judging whether the current iteration number reaches a preset value, if so, calling the module 4, and otherwise, calling the module 2 again by taking the reconstructed graph data as the current graph data.
The abstract extraction system of the big map of the homogeneous relation comprises the following modules 3: obtaining the difference L (M, D) between the combined pair of the excess points and the data of the relational graph through the following formula;
Figure BDA0002988957450000081
Figure BDA0002988957450000082
Figure BDA0002988957450000083
L(M,D)=L(M)+L(D|M)
wherein d isiAnd djDegree, D, representing nodes i and jkAnd DlRepresenting a supernode SkAnd SlDegree of (A)SThe adjacent matrix of the summary graph obtained after the hyper point pairs are combined, A 'is the adjacent matrix of the graph reconstructed by the summary graph, A is the adjacent matrix of the relation graph data, A' (i, j) is the adjacent edge weight from the node i to the node j in the adjacent matrix of the graph reconstructed by the summary graph, AS(i, j) is in the adjacency matrix of the abstract diagramThe adjacent edge weight from the node i to the node j, A (i, j) is the adjacent edge weight from the node i to the node j in the adjacency matrix of the relational graph data, LN is a function of the length of the coded positive integer, LNU is a function of the length of the Bernoulli code, n and m are the number of nodes and the number of edges respectively, wiIs the weight of the edge, diFor the degree of node i, L (M) is the description length of the summary graph, and L (D | M) is the reconstruction error.
The abstract extraction system of the big map of the homogeneous relation comprises the following modules 2: each node in the current graph data can obtain a hash value according to the neighbor nodes, and the nodes with the same hash value are divided into a group.
The abstract extraction system of the big homogeneous relation graph is characterized in that the relation graph data is an unweighted undirected graph.

Claims (10)

1. A method for extracting an abstract of a big homogeneous relation graph is characterized by comprising the following steps:
step 1, obtaining relation graph data to be abstracted as current graph data, wherein the relation graph data is a large homogeneous relation graph, and each node in the current graph data is regarded as a super point;
step 2, grouping nodes in the current graph data through locality sensitive hashing according to the adjacency matrix of the current graph data;
step 3, randomly selecting a plurality of super point pairs from the group, respectively calculating the difference between the combined super point pairs and the relational graph data, and selecting the super point pair with the minimum difference for combination to obtain the reconstructed graph data;
and 4, outputting the reconstructed picture data as a summary extraction result.
2. The method for abstracting a summary of a large map of a homogeneous relationship as claimed in claim 1, wherein after the reconstructed map data is obtained in the step 3, 1 is added to the iteration number; and (4) judging whether the current iteration number reaches a preset value, if so, executing the step (4), otherwise, taking the reconstructed picture data as the current picture data, and executing the step (2) again.
3. The method for abstracting a summary of a large map of a homogeneous relationship as claimed in claim 1, wherein the step 3 comprises: obtaining the difference L (M, D) between the combined pair of the excess points and the data of the relational graph through the following formula;
Figure FDA0002988957440000011
Figure FDA0002988957440000012
Figure FDA0002988957440000013
L(M,D)=L(M)+L(D|M)
wherein d isiAnd djDegree, D, representing nodes i and jkAnd DlRepresenting a supernode SkAnd SlDegree of (A)SThe adjacent matrix of the summary graph obtained after the hyper point pairs are combined, A 'is the adjacent matrix of the graph reconstructed by the summary graph, A is the adjacent matrix of the relation graph data, A' (i, j) is the adjacent edge weight from the node i to the node j in the adjacent matrix of the graph reconstructed by the summary graph, AS(i, j) is the adjacent edge weight from node i to node j in the adjacent matrix of the abstract graph, A (i, j) is the adjacent edge weight from node i to node j in the adjacent matrix of the relation graph data, LN is the function of the length of the coded positive integer, LNU is the function of the length of the Bernoulli code, n and m are the number of nodes and edges respectively, w is the number of the nodes and edges respectivelyiIs the weight of the edge, diFor the degree of node i, L (M) is the description length of the summary graph, and L (D | M) is the reconstruction error.
4. The method for abstracting a summary of a large map of a homogeneous relationship as claimed in claim 1, wherein the step 2 comprises: each node in the current graph data can obtain a hash value according to the neighbor nodes, and the nodes with the same hash value are divided into a group.
5. The method as claimed in claim 1, wherein the relationship graph data is an unweighted undirected graph.
6. A system for abstracting a large map of homogeneous relationships, comprising:
the system comprises a module 1, a data processing module and a data processing module, wherein the module 1 is used for acquiring relational graph data to be abstracted as current graph data, the relational graph data is a big homogeneous relational graph, and each node in the current graph data is regarded as a super point;
the module 2 is used for grouping nodes in the current graph data through locality sensitive hashing according to the adjacency matrix of the current graph data;
a module 3, configured to randomly select a plurality of super point pairs from the group, respectively calculate a difference between the merged super point pair and the relational graph data, and select the super point pair with the smallest difference to merge, so as to obtain reconstructed graph data;
and the module 4 is used for outputting the reconstructed picture data as a summary extraction result.
7. The system for abstracting a summary of a big map of a homogeneous relationship as described in claim 6, wherein after obtaining the reconstructed map data in the module 3, the iteration number is increased by 1; and judging whether the current iteration number reaches a preset value, if so, calling the module 4, and otherwise, calling the module 2 again by taking the reconstructed graph data as the current graph data.
8. The system for abstracting a summary of a large map of homogeneous relationships as set forth in claim 6, wherein the module 3 comprises: obtaining the difference L (M, D) between the combined pair of the excess points and the data of the relational graph through the following formula;
Figure FDA0002988957440000021
Figure FDA0002988957440000022
Figure FDA0002988957440000023
L(M,D)=L(M)+L(D|M)
wherein d isiAnd djDegree, D, representing nodes i and jkAnd DlRepresenting a supernode SkAnd SlDegree of (A)SThe adjacent matrix of the summary graph obtained after the hyper point pairs are combined, A 'is the adjacent matrix of the graph reconstructed by the summary graph, A is the adjacent matrix of the relation graph data, A' (i, j) is the adjacent edge weight from the node i to the node j in the adjacent matrix of the graph reconstructed by the summary graph, AS(i, j) is the adjacent edge weight from node i to node j in the adjacent matrix of the abstract graph, A (i, j) is the adjacent edge weight from node i to node j in the adjacent matrix of the relation graph data, LN is the function of the length of the coded positive integer, LNU is the function of the length of the Bernoulli code, n and m are the number of nodes and edges respectively, w is the number of the nodes and edges respectivelyiIs the weight of the edge, diFor the degree of node i, L (M) is the description length of the summary graph, and L (D | M) is the reconstruction error.
9. The system for abstracting a summary of a large map of homogeneous relationships as set forth in claim 1, wherein the module 2 comprises: each node in the current graph data can obtain a hash value according to the neighbor nodes, and the nodes with the same hash value are divided into a group.
10. The system for abstracting a big map of homogenous relations as in claim 6, wherein the relational map data is an unweighted undirected map.
CN202110308958.7A 2021-03-23 2021-03-23 Abstract extraction method and system for homogeneity relation large graph Active CN113139098B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110308958.7A CN113139098B (en) 2021-03-23 2021-03-23 Abstract extraction method and system for homogeneity relation large graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110308958.7A CN113139098B (en) 2021-03-23 2021-03-23 Abstract extraction method and system for homogeneity relation large graph

Publications (2)

Publication Number Publication Date
CN113139098A true CN113139098A (en) 2021-07-20
CN113139098B CN113139098B (en) 2023-12-12

Family

ID=76811605

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110308958.7A Active CN113139098B (en) 2021-03-23 2021-03-23 Abstract extraction method and system for homogeneity relation large graph

Country Status (1)

Country Link
CN (1) CN113139098B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130110835A1 (en) * 2011-10-28 2013-05-02 Zhijiang He Method for calculating proximities between nodes in multiple social graphs
CN110598055A (en) * 2019-08-23 2019-12-20 华北电力大学 Parallel graph summarization method based on attribute graph
CN111159483A (en) * 2019-12-26 2020-05-15 华中科技大学 Social network diagram abstract generation method based on incremental calculation
CN111553215A (en) * 2020-04-20 2020-08-18 深圳云天励飞技术有限公司 Personnel association method and device, and graph convolution network training method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130110835A1 (en) * 2011-10-28 2013-05-02 Zhijiang He Method for calculating proximities between nodes in multiple social graphs
CN110598055A (en) * 2019-08-23 2019-12-20 华北电力大学 Parallel graph summarization method based on attribute graph
CN111159483A (en) * 2019-12-26 2020-05-15 华中科技大学 Social network diagram abstract generation method based on incremental calculation
CN111553215A (en) * 2020-04-20 2020-08-18 深圳云天励飞技术有限公司 Personnel association method and device, and graph convolution network training method and device

Also Published As

Publication number Publication date
CN113139098B (en) 2023-12-12

Similar Documents

Publication Publication Date Title
Cheng et al. TF-Label: A topological-folding labeling scheme for reachability querying in a large graph
CN113961759B (en) Abnormality detection method based on attribute map representation learning
Jin et al. Community structure mining in big data social media networks with MapReduce
CN113221183B (en) Method, device and system for realizing privacy protection of multi-party collaborative update model
US20130339290A1 (en) Method for updating betweenness centrality of graph
Kubica et al. Single-shot quantum error correction with the three-dimensional subsystem toric code
Cerinšek et al. Generalized two-mode cores
Su et al. A seed-expanding method based on random walks for community detection in networks with ambiguous community structures
WO2022142001A1 (en) Target object evaluation method based on multi-score card fusion, and related device therefor
Vasmer et al. Cellular automaton decoders for topological quantum codes with noisy measurements and beyond
CN112231514B (en) Data deduplication method and device, storage medium and server
Abbe Graph compression: The effect of clusters
Huai et al. Zerobn: Learning compact neural networks for latency-critical edge systems
Yeh et al. Embedding compression with hashing for efficient representation learning in large-scale graph
CN114579584A (en) Data table processing method and device, computer equipment and storage medium
Yang et al. κhgcn: Tree-likeness modeling via continuous and discrete curvature learning
Lin et al. A hybrid recommendation algorithm based on hadoop
Yan et al. Micronet for efficient language modeling
Band et al. Compressed neighbour lists for SPH
Al-Shammary et al. Dynamic fractal clustering technique for SOAP web messages
US10795920B2 (en) Information processing device, information processing method, and computer-readable storage medium
Shiriaev et al. Fast operation of determining the sign of a number in RNS using the Akushsky core function
CN113139098A (en) Abstract extraction method and system for big homogeneous relation graph
Lin et al. Improving federated relational data modeling via basis alignment and weight penalty
Zou et al. Hybrid collaborative filtering with semi-stacked denoising autoencoders for recommendation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant