CN111159483B - Tensor calculation-based social network diagram abstract generation method - Google Patents

Tensor calculation-based social network diagram abstract generation method Download PDF

Info

Publication number
CN111159483B
CN111159483B CN201911373671.1A CN201911373671A CN111159483B CN 111159483 B CN111159483 B CN 111159483B CN 201911373671 A CN201911373671 A CN 201911373671A CN 111159483 B CN111159483 B CN 111159483B
Authority
CN
China
Prior art keywords
tensor
boolean
matrix
graph
decomposition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911373671.1A
Other languages
Chinese (zh)
Other versions
CN111159483A (en
Inventor
谢夏
王健
金海�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201911373671.1A priority Critical patent/CN111159483B/en
Publication of CN111159483A publication Critical patent/CN111159483A/en
Application granted granted Critical
Publication of CN111159483B publication Critical patent/CN111159483B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for generating a social network diagram abstract based on tensor calculation, and belongs to the field of social networks. Comprising the following steps: tensor representation is carried out on the social network diagram in the target time period to obtain a Boolean tensor T G The method comprises the steps of carrying out a first treatment on the surface of the For Boolean tensor T G Tensor decomposition is carried out to obtain a decomposed node matrix N 1 ,N 2 Attribute matrix A 1 ,…A h‑3 And a time matrix T; node matrix N 1 Or N 2 Clustering is carried out to obtain a cluster center and the type of each node; and taking the cluster center as the superpoints of the graph abstract, and calculating the superedge weight between the superpoints to obtain the graph abstract. According to the method, the nodes, the node attributes and the time stamps of the social network are subjected to multidimensional data fusion, unified expression of Gao Weitu data is realized based on the binary and tensor high-dimensional expression characteristics of the social network graph, and the Boolean Zhang Lianghua expression of the complex social network is realized. Tensor CP decomposition is introduced, prior information such as decomposition results of old graph tensors is fully utilized, the size of the decomposition tensors is reduced, and the decomposition efficiency of the graph abstract is improved.

Description

Tensor calculation-based social network diagram abstract generation method
Technical Field
The invention belongs to the field of social networks, and particularly relates to a method for generating a social network diagram abstract based on tensor calculation.
Background
Social network analysis is a popular topic of the data mining community in recent years, and querying and reasoning about the interrelation between entities in a social network can inspire interesting and deep insight into various phenomena. However, due to the characteristics of the social network, such as complex dynamic and changeable structure, huge data, the expression and mining of the social network graph data are limited by computing resources and cost. Thus, the starting point for analyzing such complex large graph data is typically a concise representation, i.e., a graph summary. It helps to understand these datasets and represent queries in a meaningful way. The graph summarization plays a very important role in the processing of graph data, from reducing the number of bits required to encode the original graph to more complex database operations, and so on.
In recent years, tensor methods have been applied to graph summarization methods, which can produce more accurate weighted graph summaries. Tensors are a form of data storage that is multidimensional, with the dimensions of the data being referred to as the order of the tensors. Since real tensor data often has high-dimensional sparse characteristics, we generally use a tensor decomposition method to preserve original information, reduce computational complexity, and reduce data loss.
The current graph summarization method only focuses on time sequence dynamics or node attributes of graph data, user nodes in the social network contain various attributes, the connection relationship between users can change newly every moment, and the social network graph data has dynamics and node attributes at the same time. In addition, for the time sequence dynamic diagram, the current method can repeatedly calculate the historical data, so that the calculation efficiency is low.
Disclosure of Invention
Aiming at the defects and improvement demands of the prior art, the invention provides a generation method of a social network graph abstract based on tensor calculation, which aims to uniformly express the dynamic property and node property of social network graph data by adopting a tensor calculation framework and introduce a Boolean tensor decomposition method to realize extensible and efficient graph abstract calculation.
To achieve the above object, according to a first aspect of the present invention, there is provided a method for generating a social network graph summary based on tensor calculation, the method comprising the steps of:
s1, tensor representation is carried out on a social network diagram in a target time period to obtain a target Boolean tensor T G
S2, for a target Boolean tensor T G Tensor decomposition is carried out to obtain a decomposed node matrix N 1 ,N 2
S3, node matrix N 1 Or N 2 Clustering is carried out to obtain a cluster center and the type of each node;
s4, regarding the cluster center as the superpoints of the graph abstract, and calculating the superedge weight between the superpoints to obtain the graph abstract of the social network graph.
Preferably, the social network graph is a dynamic undirected unauthorized graph and corresponds to the time stamp one by one.
Preferably, step S2 comprises the steps of:
s21, old Boolean tensor T old And target Boolean tensor T G Merged into a boolean tensor T all Boolean tensor T all The last order is the time dimension, the old Boolean tensor T old A tensor representation of the social network graph for a previous period of time;
s22, P-Boolean tensor T all Performing biased sampling to generate k sub-tensors sT i
S23, for each sub-tensor sT i Performing parallel distributed Boolean CP decomposition, and calculating to obtain decomposition factor matrix of each sub-tensor
Figure GDA0004221333490000021
S24, sub tensor sT i Boolean decomposition matrix of (2)
Figure GDA0004221333490000022
And old Boolean tensor T old Is>
Figure GDA0004221333490000023
Merging to obtain a new Boolean tensor T all Boolean CP decomposition results->
Figure GDA0004221333490000024
Wherein i is more than or equal to 1 and less than or equal to k, j is more than or equal to 1 and less than or equal to h.
Preferably, the step S22 includes the steps of:
s221, for h-order old Boolean tensor T old Summing each order of (2) to obtain
Figure GDA0004221333490000031
S222. Will
Figure GDA0004221333490000032
Divided byT old The number of non-zero elements in the index is calculated to obtain the sampling probability of each order of index
Figure GDA0004221333490000033
S223, calculating T according to the set sampling factors old Size L of each-order sampling index j
S224, according to sampling probability
Figure GDA0004221333490000034
For T old L is carried out by the j-th order index j Subsampling to obtain sample index set +.>
Figure GDA0004221333490000035
S225, collecting sampling index
Figure GDA0004221333490000036
And target Boolean tensor T G Is combined to obtain { V } 1 ,V 2 ,...,V h ∪{V new }, wherein V is new Representing T G Is a time dimension index of (2);
s226, according to the index set { V ] 1 ,V 2 ,...,V h ∪{V new -obtaining a sample sub-tensor;
s227, repeating the steps S221-S226 until k sub tensors are generated.
Preferably, the step S23 includes the steps of:
s231, sub-tensor sT i Factor matrix of (2)
Figure GDA0004221333490000037
Initializing for Y times, namely initializing a Boolean matrix with non-zero term probability p each time, and taking a factor matrix with the minimum reduction error as a final initialization matrix;
s232, carrying out h iterations, fixing (h-1) factor matrixes in each iteration process, and optimizing the rest factor matrixes to minimize the overall reduction error, thereby completing one round of iteration;
s233, repeating the step 232 until the number of iteration rounds reaches k or the iteration error is smaller than e, and returning to the Boolean factor matrix
Figure GDA0004221333490000038
Preferably, step S24 comprises the steps of:
s241 tensor sT is added 1 Boolean decomposition matrix of (2)
Figure GDA0004221333490000039
And old tensor T old Is>
Figure GDA00042213334900000310
Combining to obtain a Boolean decomposition matrix set +.>
Figure GDA00042213334900000311
S242, sub-tensor sT 2 Boolean decomposition matrix of (2)
Figure GDA00042213334900000312
And Boolean decomposition matrix set->
Figure GDA00042213334900000313
The corresponding matrices are combined, and so on, until the sub-tensor sT k Is>
Figure GDA00042213334900000314
And Boolean decomposition matrix set->
Figure GDA00042213334900000315
Combining the corresponding matrixes to obtain a new tensor T all Is a Boolean CP decomposition matrix->
Figure GDA00042213334900000316
Preferably, the merging of the boolean decomposition matrices comprises the following steps:
(1) Calculating tensors V and U, where V x Is that
Figure GDA0004221333490000041
Line x, u x Is->
Figure GDA0004221333490000042
Corresponding to the index row of samples of (a), V is V x Tensor recovered from other factor matrix, U is U x Tensors recovered from other factor matrices;
(2) Calculating reconstruction error epsilon of tensor V and old tensor factor matrix 1 And epsilon 2
ε 1 =||V-T x ||
ε 2 =||U-T x ||
Wherein T is x Slice tensors for corresponding index rows;
(3) Judging whether epsilon is satisfied 1 <ε 2 If yes, u of the original tensor factor matrix x Using v x And updating, otherwise, not updating.
Preferably, in step S3, a Hamming distance is selected, the number r of cluster centers is set, and a K-Means clustering mode is adopted to obtain cluster centers S i And the cluster to which each node belongs, i=1.
Preferably, step S4 comprises the steps of:
s41, calculating the over-edge weight between the over-points in the graph abstract, wherein the calculation formula is as follows:
Figure GDA0004221333490000043
wherein S is i 、S j Cluster center calculated for clustering algorithm, l and m are S respectively i 、S j In (2), L is the Boolean tensor T all Time dimension length, N is T all The number of nodes, σ (S i ) Is S i The number of points included;
s42, calculating a reconstruction error of the graph abstract, wherein the calculation formula is as follows:
Figure GDA0004221333490000044
s43, judging whether the reconstruction error meets a set threshold value, if so, taking the cluster as a node in the graph abstract, taking the over-edge weight as the weight of the edge of the graph abstract, otherwise, changing the number of cluster centers, and then entering step S3.
To achieve the above object, according to a second aspect of the present invention, there is provided a computer-readable storage medium, wherein the computer-readable storage medium has stored thereon a computer program, which when executed by a processor, implements the method for generating a social network graph summary based on tensor computation according to the first aspect.
In general, through the above technical solutions conceived by the present invention, the following beneficial effects can be obtained:
(1) Aiming at the problem that the existing graph summarization method only focuses on the dynamic property or node attribute of graph data, the method fuses the multi-dimensional data of the nodes, node attributes and time stamps of the social network, and based on the binary property and tensor high-dimensional expression characteristics of the social network graph, uniform expression of Gao Weitu data is realized, and Boolean Zhang Lianghua expression of the complex social network is realized.
(2) Aiming at the problem of low calculation efficiency of the existing graph abstract method, tensor Boolean CP decomposition is introduced, prior information such as decomposition results of old graph tensors is fully utilized, the size of the decomposition tensors is reduced, and the decomposition efficiency of the graph abstract is improved.
Drawings
Fig. 1 is a flowchart of a method for generating a social network diagram abstract based on tensor calculation according to an embodiment of the invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
First, some terms related to the present invention will be explained.
The abstract of the drawing: the abstract is a concise representation of the original graph, and a large number of points and edges in the graph are aggregated into super points and super edges, which is beneficial to the visualization of the large graph and the mining of graph data. The superpoints are point sets aggregated by a plurality of nodes in the graph, the superedges are edge sets aggregated by a plurality of edges in the graph, and the superedge weight is calculated by edge adjacency characteristics and weights in the sets.
Boolean tensor: all elements are tensors of 0 or 1, and the dynamic unweighted graph can be represented as a boolean tensor due to the binary nature of the adjacency matrix of the unweighted graph, where the order is the dimension of the tensor.
Undirected unauthorized graph: edges in the graph have neither direction nor weight, where the dynamic undirected graph is an undirected graph at each timestamp.
Tensor decomposition: the scheme of representing tensors as a basic sequence of operations on other simpler tensors is generally applicable to tensor padding, dimension reduction representation, feature extraction, and the like.
CP decomposition: canonical Polyadic Decomposition, a common form of tensor decomposition, decomposes a tensor into a sum of a plurality of rank 1 tensors, a rank 1 tensor being a special tensor that can be decomposed into the outer products of a plurality of vectors.
As shown in FIG. 1, the invention provides a method for generating a social network diagram abstract based on tensor calculation, which comprises the following steps:
s1, tensor representation is carried out on a social network diagram in a target time period to obtain a Boolean tensor T G
Preferably, the social network graph is a dynamic undirected unauthorized graph and corresponds to the time stamp one by one.
And abstracting the users in the social network as nodes, abstracting the relationship among the users in the social network as edges, and obtaining a social network diagram. For example, in a microblog social network, a microblog user is a node, each node has a plurality of node attributes, such as gender, academic, work and occupation, and a concern relationship between users is an edge. The relationship of interest between users is dynamically changing, and thus, the social network diagram data is dynamic.
In this embodiment, the target period of time is 1 day, that is, the social network diagram abstract within 1 day needs to be generated. In the microblog social network, in the generated graph abstract, user nodes with similar user attributes and user interests are represented by one superpoint, and the connection relationship among different user superpoints is represented by a superside.
The graph data is constructed as a high-order tensor, the node attribute and the time stamp of the graph data are used as different dimensions of the tensor, the tensor is a binary tensor, and non-zero elements in the tensor represent two nodes of one edge in the dynamic attribute graph, the node attribute and the time stamp of the edge.
For a high-order sparse tensor, if all elements in the tensor are stored, a large amount of storage space is consumed, so for a graph tensor, the invention uses only tuples to store index values of non-zero elements in different dimensions, for example, (Node 1, node2, node1.Attribute, node2.Attribute, T..) to indicate that, at time T, there is an edge with Node1 and Node2 as vertices, while the attributes of Node1 and Node2 are respectively: node1.Attribute, node2.Attribute. To support the computation of large-scale graph data, the graph tensor tuples are uploaded into a distributed file system HDFS.
S2, for Boolean tensor T G Tensor decomposition is carried out to obtain a decomposed node matrix N 1 ,N 2 Attribute matrix A 1 ,…A h-3 And a time matrix T.
Decomposed node matrix N 1 ,N 2 Is a feature vector, attribute matrix A, for representing node adjacency characteristics 1 ,…A h-3 Is a feature vector for representing the node attribute of the graph, and the time matrix T is a feature vector for representing the graph in the time dimension.
Preferably, step S2 comprises the steps of:
s21, boolean tensor T old And Boolean tensor T G Merged into a boolean tensor T all Wherein the last order is a time dimension, the Boolean tensor T old Is a tensor representation of the social networking graph over a previous period of time.
In the present embodiment, the Boolean tensor T old Is a tensor representation of the social networking graph of the previous day.
S22, P-Boolean tensor T all Performing biased sampling to generate k sub-tensors sT i ,1≤i≤k。
For Boolean tensor T all Biased sampling according to importance measure to increase non-zero term density of sub-tensors of sample and increase decomposition result of each sub-tensor to T all Influence of update. The following procedure is illustrated with h=2.
Assume that
Figure GDA0004221333490000071
The sampling probability is 0.5.
Preferably, the step S22 includes the steps of:
s221, for h-order old Boolean tensor T old Summing each order of (2) to obtain
Figure GDA0004221333490000081
In the present embodiment of the present invention,
Figure GDA0004221333490000082
Figure GDA0004221333490000083
s222, by
Figure GDA0004221333490000084
Divided by T old The number of non-zero items in the index is calculated to obtain the sampling probability of each order of index
Figure GDA0004221333490000085
In the present embodiment of the present invention,
Figure GDA0004221333490000086
Figure GDA0004221333490000087
s223, calculating T according to the set sampling factors old Size L of each-order sampling index j
In the present embodiment, L 1 =2*0.5=1,L 2 =2*0.5=1。
S224, according to sampling probability
Figure GDA0004221333490000088
For T old L is carried out by the j-th order index j Subsampling to obtain sample index set +.>
Figure GDA0004221333490000089
In the present embodiment, in a first dimension, the sample size is 1, generally [0,1]The sampling probability of the corresponding element is [ 0.33.0.67 ]]The method comprises the steps of carrying out a first treatment on the surface of the In the second dimension, the sample size is 1, generally [0,1]The sampling probability of the corresponding element is [ 0.33.0.67 ]]. Assume the sampling result
Figure GDA00042213334900000810
Figure GDA00042213334900000811
S225, combining the sampling index set and the index set of the new tensor to obtain { V } 1 ,V 2 ,...,V h ∪{V new }, wherein V is new Representing T G Is used for the time dimension index of (a).
In the present embodiment, V new =[2,3]The final sample index is { [1 ]],[1,2,3]}。
S226, according to the sampling index { V ] 1 ,V 2 ,...,V h ∪{V new And } takes the sample sub-tensors.
In the present embodiment, the sub-tensor is T all [1,{1,2,3}]=[111]。
S227, repeating the steps S221-S226 until k sub tensors sT are generated 1 ,......,sT k
S23, for each sub-tensor sT i Performing parallel distributed Boolean CP decomposition, and calculating to obtain a decomposition factor matrix of each matrix
Figure GDA00042213334900000812
Preferably, the step S23 includes the steps of:
s231, sub-tensor sT i Factor matrix of (2)
Figure GDA0004221333490000091
And initializing for Y times, namely initializing to be a Boolean matrix with non-zero term probability p each time, and taking a factor matrix with the minimum reduction error as a final initialization matrix.
In this embodiment, Y is set according to actual needs, and is generally an arbitrary integer from 5 to 20.
S232, carrying out h iterations, fixing (h-1) factor matrixes in each iteration process, and optimizing the rest factor matrixes to minimize the overall reduction error.
This embodiment employs least squares optimization. The following procedure is illustrated with h=3.
Sub-tensor sT i The decomposition factor matrices of (a) are respectively
Figure GDA0004221333490000092
Fix->
Figure GDA0004221333490000093
Optimization
Figure GDA0004221333490000094
Minimizing the reduction error; fix->
Figure GDA0004221333490000095
Optimization->
Figure GDA0004221333490000096
Minimizing the reduction error; fix->
Figure GDA0004221333490000097
Optimization->
Figure GDA0004221333490000098
So that the reduction error is minimized.
S233, repeating the step 232 until the number of iteration rounds reaches k or the iteration error is smaller than e, and returning to the Boolean factor matrix
Figure GDA0004221333490000099
In this embodiment, k and e are set according to actual needs.
S24, sub tensor sT i Boolean decomposition matrix of (2)
Figure GDA00042213334900000910
And old tensor T old Is>
Figure GDA00042213334900000911
Merging to obtain a new tensor T all Boolean CP decomposition results->
Figure GDA00042213334900000912
Combining the two to obtain a new tensor T all Boolean CP decomposition results of (C)
Figure GDA00042213334900000913
Can be the old tensor T all Introducing updates to the decomposition matrix, reducing the error of the decomposition.
Preferably, step S24 comprises the steps of:
s241 tensor sT is added 1 Is the Boolean decomposition moment of (2)Array
Figure GDA00042213334900000914
And old tensor T old Is>
Figure GDA00042213334900000915
Combining to obtain a Boolean decomposition matrix set +.>
Figure GDA00042213334900000916
S242, sub-tensor sT 2 Boolean decomposition matrix of (2)
Figure GDA00042213334900000917
And Boolean decomposition matrix set->
Figure GDA00042213334900000918
The corresponding matrices are combined, and so on, until the sub-tensor sT k Is>
Figure GDA00042213334900000919
And Boolean decomposition matrix set->
Figure GDA00042213334900000920
Combining the corresponding matrixes to obtain a new tensor T all Is a Boolean CP decomposition matrix->
Figure GDA00042213334900000921
Figure GDA00042213334900000922
Figure GDA0004221333490000101
Figure GDA0004221333490000102
Preferably, the merging of the boolean decomposition matrices comprises the following steps:
(1) Calculating tensors V and U, wherein V is V x Tensor recovered from other factor matrix, tensor U is U x Tensor recovered from other factor matrices, v x Is that
Figure GDA0004221333490000103
Line x, u x Is->
Figure GDA0004221333490000104
Corresponding to the index row.
(2) Calculating reconstruction error epsilon of tensor V and old tensor factor matrix 1 And epsilon 2
ε 1 =||V-T x ||
ε 2 =||U-T x ||
Wherein T is x For the slice tensor of the corresponding index row, the tensor 1-norm is denoted as ||, i.e. the number of non-zero entries in the boolean tensor.
(3) Judging whether epsilon is satisfied 1 <ε 2 If yes, u of the original tensor factor matrix x Using v x And updating, otherwise, not updating.
Satisfy epsilon 1 <ε 2 The u of the original tensor factor matrix represents the update line to reduce the overall reconstruction error x Using v x And updating.
S3, node matrix N 1 Or N 2 Clustering is carried out, and a cluster center and the type of each node are obtained.
Preferably, in the step S3, the clustering mode is a K-Means clustering mode, and the Hamming distance is selected as the distance.
The method comprises the following steps:
s31, selecting a row vector set { N } of a node Boolean factor matrix N 1 ,n 2 ,...,n l "l" is the number of rows of matrix N, also the number of graph nodes。
S32, selecting r vectors from the vector as initial cluster centers, wherein r represents the number of cluster centers and is also the number of super points generated in the final graph abstract.
In this embodiment, r is K, and is initialized to 100.
S33, calculating the Hamming distance from other nodes to each cluster center, and dividing the Hamming distance into clusters closest to the other nodes.
S34, updating the cluster center by using the rounding average value of all the vectors according to all the vectors in each cluster, and completing one round of iteration.
S35, if the iteration times reach a specified value, outputting the cluster of each point.
In this embodiment, the iteration number specified value is 10.
And S4, regarding the cluster center as the superpoints of the graph abstract, and calculating the superedge weight between the superpoints to obtain the complete graph abstract.
Preferably, step S4 comprises the steps of:
s41, calculating the superedge weight between the superpoints in the graph abstract according to the graph node adjacency similarity formula.
Figure GDA0004221333490000111
S42, calculating the reconstruction error of the graph abstract according to the Euclidean distance of the tensor.
Figure GDA0004221333490000112
Wherein S is i 、S j Cluster center calculated for clustering algorithm, l and m are S respectively i 、S j In (2), L is the Boolean tensor T all Time dimension length, N is T all The number of nodes, σ (S i ) Is S i The number of points involved, || represents the absolute value operator.
S43, judging whether the reconstruction error meets a set threshold value, if so, taking the cluster as a node in the graph abstract, taking the over-edge weight as the weight of the edge of the graph abstract, otherwise, changing the number of cluster centers, and then entering step S3.
In this embodiment, the reconstruction error setting threshold is 1000. If the reconstruction error does not meet the set threshold, the number of clustering centers is increased.
It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (8)

1.A method for generating a social network diagram abstract based on tensor calculation is characterized by comprising the following steps:
s1, tensor representation is carried out on a social network diagram in a target time period to obtain a target Boolean tensor T G The method comprises the steps of carrying out a first treatment on the surface of the The social network graph is a dynamic undirected unauthorized graph and corresponds to the time stamp one by one;
s2, for a target Boolean tensor T G Tensor decomposition is carried out to obtain a decomposed node matrix N 1 ,N 2
S3, node matrix N 1 Or N 2 Clustering is carried out to obtain a cluster center and the type of each node;
s4, regarding the cluster center as the superpoints of the graph abstract, and calculating the superedge weight between the superpoints to obtain the graph abstract of the social network graph;
wherein, step S2 includes the following steps:
s21, old Boolean tensor T old And target Boolean tensor T G Merged into a boolean tensor T all Boolean tensor T all The last order is the time dimension, the old Boolean tensor T old A tensor representation of the social network graph for a previous period of time;
s22, P-Boolean tensor T all Performing biased sampling to generate k sub-tensors sT i
S23, for each sub-tensor sT i A parallel distributed boolean CP decomposition is performed,calculating to obtain a decomposition factor matrix of each sub-tensor
Figure FDA0004221333480000011
S24, sub tensor sT i Boolean decomposition matrix of (2)
Figure FDA0004221333480000012
And old Boolean tensor T old Is>
Figure FDA0004221333480000013
Merging to obtain a new Boolean tensor T all Boolean CP decomposition results->
Figure FDA0004221333480000014
Wherein i is more than or equal to 1 and less than or equal to k, j is more than or equal to 1 and less than or equal to h.
2. The method according to claim 1, wherein the step S22 comprises the steps of:
s221, for h-order old Boolean tensor T old Summing each order of (2) to obtain
Figure FDA0004221333480000015
S222. Will
Figure FDA0004221333480000016
Divided by T old The number of non-zero elements in the index is calculated to obtain the sampling probability of each order of index
Figure FDA0004221333480000017
S223, calculating T according to the set sampling factors old Size L of each-order sampling index j
S224, according to sampling probability
Figure FDA0004221333480000018
For T old L is carried out by the j-th order index j Subsampling to obtain sample index set +.>
Figure FDA0004221333480000021
S225, collecting sampling index
Figure FDA0004221333480000022
And target Boolean tensor T G Is combined to obtain { V } 1 ,V 2 ,...,V h ∪{V new }, wherein V is new Representing T G Is a time dimension index of (2);
s226, according to the index set { V ] 1 ,V 2 ,...,V h ∪{V new -obtaining a sample sub-tensor;
s227, repeating the steps S221-S226 until k sub tensors are generated.
3. The method according to claim 1, wherein the step S23 comprises the steps of:
s231, sub-tensor sT i Factor matrix of (2)
Figure FDA0004221333480000023
Initializing for Y times, namely initializing a Boolean matrix with non-zero term probability p each time, and taking a factor matrix with the minimum reduction error as a final initialization matrix;
s232, carrying out h iterations, fixing (h-1) factor matrixes in each iteration process, and optimizing the rest factor matrixes to minimize the overall reduction error, thereby completing one round of iteration;
s233, repeating the step 232 until the number of iteration rounds reaches k or the iteration error is smaller than e, and returning to the Boolean factor matrix
Figure FDA0004221333480000024
4. The method of claim 1, wherein step S24 comprises the steps of:
s241 tensor sT is added 1 Boolean decomposition matrix of (2)
Figure FDA0004221333480000025
And old Boolean tensor T old Is>
Figure FDA0004221333480000026
Combining to obtain a Boolean decomposition matrix set +.>
Figure FDA0004221333480000027
S242, sub-tensor sT 2 Boolean decomposition matrix of (2)
Figure FDA0004221333480000028
And Boolean decomposition matrix set->
Figure FDA0004221333480000029
The corresponding matrices are combined, and so on, until the sub-tensor sT k Is>
Figure FDA00042213334800000210
And Boolean decomposition matrix set->
Figure FDA00042213334800000211
Combining the corresponding matrixes to obtain a new tensor T all Is a Boolean CP decomposition matrix->
Figure FDA00042213334800000212
5. The method of claim 4, wherein the combining of the boolean decomposition matrices comprises the steps of:
(1) Calculation sheetQuantity V and tensor U, where V x Is that
Figure FDA00042213334800000213
Line x, u x Is->
Figure FDA00042213334800000214
Corresponding to the index row of samples of (a), V is V x Tensor recovered from other factor matrix, u is u x Tensors recovered from other factor matrices;
(2) Calculating reconstruction error epsilon of tensor V and old tensor factor matrix 1 And epsilon 2
ε 1 =||V-T x ||
ε 2 =||U-T x ||
Wherein T is x Slice tensors for corresponding index rows;
(3) Judging whether epsilon is satisfied 1 <ε 2 If yes, u of the original tensor factor matrix x Using v x And updating, otherwise, not updating.
6. The method of claim 1, wherein in step S3, a Hamming distance is selected, a cluster center number r is set, and a K-Means clustering mode is adopted to obtain cluster centers S i And the cluster to which each node belongs, i=1.
7. The method according to any one of claims 1 to 5, wherein step S4 comprises the steps of:
s41, calculating the over-edge weight between the over-points in the graph abstract, wherein the calculation formula is as follows:
Figure FDA0004221333480000031
wherein S is i 、S j Cluster center calculated for clustering algorithm, l and m are S respectively i 、S j In (2), L is the Boolean tensor T all Time dimension length, N is T all The number of nodes, σ (S i ) Is S i The number of points included;
s42, calculating a reconstruction error of the graph abstract, wherein the calculation formula is as follows:
Figure FDA0004221333480000032
s43, judging whether the reconstruction error meets a set threshold value, if so, taking the cluster as a node in the graph abstract, taking the over-edge weight as the weight of the edge of the graph abstract, otherwise, changing the number of cluster centers, and then entering step S3.
8. A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when executed by a processor, the computer program implements the method for generating a tensor-based social network graph summary according to any one of claims 1 to 7.
CN201911373671.1A 2019-12-26 2019-12-26 Tensor calculation-based social network diagram abstract generation method Active CN111159483B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911373671.1A CN111159483B (en) 2019-12-26 2019-12-26 Tensor calculation-based social network diagram abstract generation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911373671.1A CN111159483B (en) 2019-12-26 2019-12-26 Tensor calculation-based social network diagram abstract generation method

Publications (2)

Publication Number Publication Date
CN111159483A CN111159483A (en) 2020-05-15
CN111159483B true CN111159483B (en) 2023-07-04

Family

ID=70558533

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911373671.1A Active CN111159483B (en) 2019-12-26 2019-12-26 Tensor calculation-based social network diagram abstract generation method

Country Status (1)

Country Link
CN (1) CN111159483B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111881191B (en) * 2020-08-05 2021-06-11 留洋汇(厦门)金融技术服务有限公司 Client portrait key feature mining system and method under mobile internet
CN112287118B (en) * 2020-10-30 2023-06-02 西南电子技术研究所(中国电子科技集团公司第十研究所) Event mode frequent subgraph mining and prediction method
CN112507245B (en) * 2020-12-03 2023-07-18 中国人民大学 Social network friend recommendation method based on graph neural network
CN113139098B (en) * 2021-03-23 2023-12-12 中国科学院计算技术研究所 Abstract extraction method and system for homogeneity relation large graph
CN113157981B (en) * 2021-03-26 2022-12-13 支付宝(杭州)信息技术有限公司 Graph network relation diffusion method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107545509A (en) * 2017-07-17 2018-01-05 西安电子科技大学 A kind of group dividing method of more relation social networks
CN107656928A (en) * 2016-07-25 2018-02-02 长沙有干货网络技术有限公司 The method that a kind of isomery social networks of user clustering is recommended
CN107767280A (en) * 2017-10-16 2018-03-06 湖北文理学院 A kind of high-quality node detecting method based on element of time
CN109697467A (en) * 2018-12-24 2019-04-30 宁波大学 A kind of summarization methods of complex network figure

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8060512B2 (en) * 2009-06-05 2011-11-15 Xerox Corporation Hybrid tensor-based cluster analysis
US10956500B2 (en) * 2017-01-19 2021-03-23 Google Llc Dynamic-length stateful tensor array
US10268646B2 (en) * 2017-06-06 2019-04-23 Facebook, Inc. Tensor-based deep relevance model for search on online social networks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107656928A (en) * 2016-07-25 2018-02-02 长沙有干货网络技术有限公司 The method that a kind of isomery social networks of user clustering is recommended
CN107545509A (en) * 2017-07-17 2018-01-05 西安电子科技大学 A kind of group dividing method of more relation social networks
CN107767280A (en) * 2017-10-16 2018-03-06 湖北文理学院 A kind of high-quality node detecting method based on element of time
CN109697467A (en) * 2018-12-24 2019-04-30 宁波大学 A kind of summarization methods of complex network figure

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Pauli Miettinen.《IEEE/Walk’n’Merge: A Scalable Algorithm for Boolean Tensor Factorization》.2013,全文. *

Also Published As

Publication number Publication date
CN111159483A (en) 2020-05-15

Similar Documents

Publication Publication Date Title
CN111159483B (en) Tensor calculation-based social network diagram abstract generation method
Saldana et al. How many communities are there?
Aljuaid et al. Proper imputation techniques for missing values in data sets
CN112182245B (en) Knowledge graph embedded model training method and system and electronic equipment
Wyse et al. Inferring structure in bipartite networks using the latent blockmodel and exact ICL
Yu et al. Zinb-based graph embedding autoencoder for single-cell rna-seq interpretations
Zhang et al. Gaussian mixture model clustering with incomplete data
JP5089854B2 (en) Method and apparatus for clustering of evolving data streams via online and offline components
Wang et al. Consistent multiple graph embedding for multi-view clustering
Yao et al. A review on optimal subsampling methods for massive datasets
CN113206831B (en) Data acquisition privacy protection method facing edge calculation
Huang et al. Spectral clustering via adaptive layer aggregation for multi-layer networks
Chi et al. Stable estimation of a covariance matrix guided by nuclear norm penalties
Wu et al. Generalized linear models with low rank effects for network data
Wang et al. QoS prediction of web services based on reputation-aware network embedding
Wang et al. Time series clustering based on sparse subspace clustering algorithm and its application to daily box-office data analysis
Wang et al. Towards personalized federated learning via heterogeneous model reassembly
CN113239266B (en) Personalized recommendation method and system based on local matrix decomposition
Zhou et al. A dynamic logistic regression for network link prediction
Lima Hawkes processes modeling, inference, and control: An overview
Li et al. An alternating nonmonotone projected Barzilai–Borwein algorithm of nonnegative factorization of big matrices
Chen et al. Community Detection Based on DeepWalk Model in Large‐Scale Networks
CN114429404A (en) Multi-mode heterogeneous social network community discovery method
Gao et al. Dynamic community detection using nonnegative matrix factorization
Chen et al. Multi-view clustering method based on graph attention autoencoder

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant