CN111159483B - Tensor calculation-based social network diagram abstract generation method - Google Patents
Tensor calculation-based social network diagram abstract generation method Download PDFInfo
- Publication number
- CN111159483B CN111159483B CN201911373671.1A CN201911373671A CN111159483B CN 111159483 B CN111159483 B CN 111159483B CN 201911373671 A CN201911373671 A CN 201911373671A CN 111159483 B CN111159483 B CN 111159483B
- Authority
- CN
- China
- Prior art keywords
- tensor
- boolean
- matrix
- graph
- decomposition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000004364 calculation method Methods 0.000 title claims abstract description 18
- 238000010586 diagram Methods 0.000 title claims abstract description 15
- 239000011159 matrix material Substances 0.000 claims abstract description 65
- 238000000354 decomposition reaction Methods 0.000 claims abstract description 56
- 239000013256 coordination polymer Substances 0.000 claims abstract description 13
- 238000005070 sampling Methods 0.000 claims description 24
- 230000009467 reduction Effects 0.000 claims description 10
- 238000004422 calculation algorithm Methods 0.000 claims description 3
- 238000004590 computer program Methods 0.000 claims description 3
- 238000003064 k means clustering Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 claims description 3
- 230000004927 fusion Effects 0.000 abstract 1
- 239000013598 vector Substances 0.000 description 9
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000005065 mining Methods 0.000 description 2
- 230000006855 networking Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003012 network analysis Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9024—Graphs; Linked lists
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/906—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Economics (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method for generating a social network diagram abstract based on tensor calculation, and belongs to the field of social networks. Comprising the following steps: tensor representation is carried out on the social network diagram in the target time period to obtain a Boolean tensor T G The method comprises the steps of carrying out a first treatment on the surface of the For Boolean tensor T G Tensor decomposition is carried out to obtain a decomposed node matrix N 1 ,N 2 Attribute matrix A 1 ,…A h‑3 And a time matrix T; node matrix N 1 Or N 2 Clustering is carried out to obtain a cluster center and the type of each node; and taking the cluster center as the superpoints of the graph abstract, and calculating the superedge weight between the superpoints to obtain the graph abstract. According to the method, the nodes, the node attributes and the time stamps of the social network are subjected to multidimensional data fusion, unified expression of Gao Weitu data is realized based on the binary and tensor high-dimensional expression characteristics of the social network graph, and the Boolean Zhang Lianghua expression of the complex social network is realized. Tensor CP decomposition is introduced, prior information such as decomposition results of old graph tensors is fully utilized, the size of the decomposition tensors is reduced, and the decomposition efficiency of the graph abstract is improved.
Description
Technical Field
The invention belongs to the field of social networks, and particularly relates to a method for generating a social network diagram abstract based on tensor calculation.
Background
Social network analysis is a popular topic of the data mining community in recent years, and querying and reasoning about the interrelation between entities in a social network can inspire interesting and deep insight into various phenomena. However, due to the characteristics of the social network, such as complex dynamic and changeable structure, huge data, the expression and mining of the social network graph data are limited by computing resources and cost. Thus, the starting point for analyzing such complex large graph data is typically a concise representation, i.e., a graph summary. It helps to understand these datasets and represent queries in a meaningful way. The graph summarization plays a very important role in the processing of graph data, from reducing the number of bits required to encode the original graph to more complex database operations, and so on.
In recent years, tensor methods have been applied to graph summarization methods, which can produce more accurate weighted graph summaries. Tensors are a form of data storage that is multidimensional, with the dimensions of the data being referred to as the order of the tensors. Since real tensor data often has high-dimensional sparse characteristics, we generally use a tensor decomposition method to preserve original information, reduce computational complexity, and reduce data loss.
The current graph summarization method only focuses on time sequence dynamics or node attributes of graph data, user nodes in the social network contain various attributes, the connection relationship between users can change newly every moment, and the social network graph data has dynamics and node attributes at the same time. In addition, for the time sequence dynamic diagram, the current method can repeatedly calculate the historical data, so that the calculation efficiency is low.
Disclosure of Invention
Aiming at the defects and improvement demands of the prior art, the invention provides a generation method of a social network graph abstract based on tensor calculation, which aims to uniformly express the dynamic property and node property of social network graph data by adopting a tensor calculation framework and introduce a Boolean tensor decomposition method to realize extensible and efficient graph abstract calculation.
To achieve the above object, according to a first aspect of the present invention, there is provided a method for generating a social network graph summary based on tensor calculation, the method comprising the steps of:
s1, tensor representation is carried out on a social network diagram in a target time period to obtain a target Boolean tensor T G ;
S2, for a target Boolean tensor T G Tensor decomposition is carried out to obtain a decomposed node matrix N 1 ,N 2 ;
S3, node matrix N 1 Or N 2 Clustering is carried out to obtain a cluster center and the type of each node;
s4, regarding the cluster center as the superpoints of the graph abstract, and calculating the superedge weight between the superpoints to obtain the graph abstract of the social network graph.
Preferably, the social network graph is a dynamic undirected unauthorized graph and corresponds to the time stamp one by one.
Preferably, step S2 comprises the steps of:
s21, old Boolean tensor T old And target Boolean tensor T G Merged into a boolean tensor T all Boolean tensor T all The last order is the time dimension, the old Boolean tensor T old A tensor representation of the social network graph for a previous period of time;
s22, P-Boolean tensor T all Performing biased sampling to generate k sub-tensors sT i ;
S23, for each sub-tensor sT i Performing parallel distributed Boolean CP decomposition, and calculating to obtain decomposition factor matrix of each sub-tensor
S24, sub tensor sT i Boolean decomposition matrix of (2)And old Boolean tensor T old Is>Merging to obtain a new Boolean tensor T all Boolean CP decomposition results->
Wherein i is more than or equal to 1 and less than or equal to k, j is more than or equal to 1 and less than or equal to h.
Preferably, the step S22 includes the steps of:
S222. WillDivided byT old The number of non-zero elements in the index is calculated to obtain the sampling probability of each order of index
S223, calculating T according to the set sampling factors old Size L of each-order sampling index j ;
S224, according to sampling probabilityFor T old L is carried out by the j-th order index j Subsampling to obtain sample index set +.>
S225, collecting sampling indexAnd target Boolean tensor T G Is combined to obtain { V } 1 ,V 2 ,...,V h ∪{V new }, wherein V is new Representing T G Is a time dimension index of (2);
s226, according to the index set { V ] 1 ,V 2 ,...,V h ∪{V new -obtaining a sample sub-tensor;
s227, repeating the steps S221-S226 until k sub tensors are generated.
Preferably, the step S23 includes the steps of:
s231, sub-tensor sT i Factor matrix of (2)Initializing for Y times, namely initializing a Boolean matrix with non-zero term probability p each time, and taking a factor matrix with the minimum reduction error as a final initialization matrix;
s232, carrying out h iterations, fixing (h-1) factor matrixes in each iteration process, and optimizing the rest factor matrixes to minimize the overall reduction error, thereby completing one round of iteration;
s233, repeating the step 232 until the number of iteration rounds reaches k or the iteration error is smaller than e, and returning to the Boolean factor matrix
Preferably, step S24 comprises the steps of:
s241 tensor sT is added 1 Boolean decomposition matrix of (2)And old tensor T old Is>Combining to obtain a Boolean decomposition matrix set +.>
S242, sub-tensor sT 2 Boolean decomposition matrix of (2)And Boolean decomposition matrix set->The corresponding matrices are combined, and so on, until the sub-tensor sT k Is>And Boolean decomposition matrix set->Combining the corresponding matrixes to obtain a new tensor T all Is a Boolean CP decomposition matrix->
Preferably, the merging of the boolean decomposition matrices comprises the following steps:
(1) Calculating tensors V and U, where V x Is thatLine x, u x Is->Corresponding to the index row of samples of (a), V is V x Tensor recovered from other factor matrix, U is U x Tensors recovered from other factor matrices;
(2) Calculating reconstruction error epsilon of tensor V and old tensor factor matrix 1 And epsilon 2 ;
ε 1 =||V-T x ||
ε 2 =||U-T x ||
Wherein T is x Slice tensors for corresponding index rows;
(3) Judging whether epsilon is satisfied 1 <ε 2 If yes, u of the original tensor factor matrix x Using v x And updating, otherwise, not updating.
Preferably, in step S3, a Hamming distance is selected, the number r of cluster centers is set, and a K-Means clustering mode is adopted to obtain cluster centers S i And the cluster to which each node belongs, i=1.
Preferably, step S4 comprises the steps of:
s41, calculating the over-edge weight between the over-points in the graph abstract, wherein the calculation formula is as follows:
wherein S is i 、S j Cluster center calculated for clustering algorithm, l and m are S respectively i 、S j In (2), L is the Boolean tensor T all Time dimension length, N is T all The number of nodes, σ (S i ) Is S i The number of points included;
s42, calculating a reconstruction error of the graph abstract, wherein the calculation formula is as follows:
s43, judging whether the reconstruction error meets a set threshold value, if so, taking the cluster as a node in the graph abstract, taking the over-edge weight as the weight of the edge of the graph abstract, otherwise, changing the number of cluster centers, and then entering step S3.
To achieve the above object, according to a second aspect of the present invention, there is provided a computer-readable storage medium, wherein the computer-readable storage medium has stored thereon a computer program, which when executed by a processor, implements the method for generating a social network graph summary based on tensor computation according to the first aspect.
In general, through the above technical solutions conceived by the present invention, the following beneficial effects can be obtained:
(1) Aiming at the problem that the existing graph summarization method only focuses on the dynamic property or node attribute of graph data, the method fuses the multi-dimensional data of the nodes, node attributes and time stamps of the social network, and based on the binary property and tensor high-dimensional expression characteristics of the social network graph, uniform expression of Gao Weitu data is realized, and Boolean Zhang Lianghua expression of the complex social network is realized.
(2) Aiming at the problem of low calculation efficiency of the existing graph abstract method, tensor Boolean CP decomposition is introduced, prior information such as decomposition results of old graph tensors is fully utilized, the size of the decomposition tensors is reduced, and the decomposition efficiency of the graph abstract is improved.
Drawings
Fig. 1 is a flowchart of a method for generating a social network diagram abstract based on tensor calculation according to an embodiment of the invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
First, some terms related to the present invention will be explained.
The abstract of the drawing: the abstract is a concise representation of the original graph, and a large number of points and edges in the graph are aggregated into super points and super edges, which is beneficial to the visualization of the large graph and the mining of graph data. The superpoints are point sets aggregated by a plurality of nodes in the graph, the superedges are edge sets aggregated by a plurality of edges in the graph, and the superedge weight is calculated by edge adjacency characteristics and weights in the sets.
Boolean tensor: all elements are tensors of 0 or 1, and the dynamic unweighted graph can be represented as a boolean tensor due to the binary nature of the adjacency matrix of the unweighted graph, where the order is the dimension of the tensor.
Undirected unauthorized graph: edges in the graph have neither direction nor weight, where the dynamic undirected graph is an undirected graph at each timestamp.
Tensor decomposition: the scheme of representing tensors as a basic sequence of operations on other simpler tensors is generally applicable to tensor padding, dimension reduction representation, feature extraction, and the like.
CP decomposition: canonical Polyadic Decomposition, a common form of tensor decomposition, decomposes a tensor into a sum of a plurality of rank 1 tensors, a rank 1 tensor being a special tensor that can be decomposed into the outer products of a plurality of vectors.
As shown in FIG. 1, the invention provides a method for generating a social network diagram abstract based on tensor calculation, which comprises the following steps:
s1, tensor representation is carried out on a social network diagram in a target time period to obtain a Boolean tensor T G 。
Preferably, the social network graph is a dynamic undirected unauthorized graph and corresponds to the time stamp one by one.
And abstracting the users in the social network as nodes, abstracting the relationship among the users in the social network as edges, and obtaining a social network diagram. For example, in a microblog social network, a microblog user is a node, each node has a plurality of node attributes, such as gender, academic, work and occupation, and a concern relationship between users is an edge. The relationship of interest between users is dynamically changing, and thus, the social network diagram data is dynamic.
In this embodiment, the target period of time is 1 day, that is, the social network diagram abstract within 1 day needs to be generated. In the microblog social network, in the generated graph abstract, user nodes with similar user attributes and user interests are represented by one superpoint, and the connection relationship among different user superpoints is represented by a superside.
The graph data is constructed as a high-order tensor, the node attribute and the time stamp of the graph data are used as different dimensions of the tensor, the tensor is a binary tensor, and non-zero elements in the tensor represent two nodes of one edge in the dynamic attribute graph, the node attribute and the time stamp of the edge.
For a high-order sparse tensor, if all elements in the tensor are stored, a large amount of storage space is consumed, so for a graph tensor, the invention uses only tuples to store index values of non-zero elements in different dimensions, for example, (Node 1, node2, node1.Attribute, node2.Attribute, T..) to indicate that, at time T, there is an edge with Node1 and Node2 as vertices, while the attributes of Node1 and Node2 are respectively: node1.Attribute, node2.Attribute. To support the computation of large-scale graph data, the graph tensor tuples are uploaded into a distributed file system HDFS.
S2, for Boolean tensor T G Tensor decomposition is carried out to obtain a decomposed node matrix N 1 ,N 2 Attribute matrix A 1 ,…A h-3 And a time matrix T.
Decomposed node matrix N 1 ,N 2 Is a feature vector, attribute matrix A, for representing node adjacency characteristics 1 ,…A h-3 Is a feature vector for representing the node attribute of the graph, and the time matrix T is a feature vector for representing the graph in the time dimension.
Preferably, step S2 comprises the steps of:
s21, boolean tensor T old And Boolean tensor T G Merged into a boolean tensor T all Wherein the last order is a time dimension, the Boolean tensor T old Is a tensor representation of the social networking graph over a previous period of time.
In the present embodiment, the Boolean tensor T old Is a tensor representation of the social networking graph of the previous day.
S22, P-Boolean tensor T all Performing biased sampling to generate k sub-tensors sT i ,1≤i≤k。
For Boolean tensor T all Biased sampling according to importance measure to increase non-zero term density of sub-tensors of sample and increase decomposition result of each sub-tensor to T all Influence of update. The following procedure is illustrated with h=2.
Preferably, the step S22 includes the steps of:
s222, byDivided by T old The number of non-zero items in the index is calculated to obtain the sampling probability of each order of index
s223, calculating T according to the set sampling factors old Size L of each-order sampling index j 。
In the present embodiment, L 1 =2*0.5=1,L 2 =2*0.5=1。
S224, according to sampling probabilityFor T old L is carried out by the j-th order index j Subsampling to obtain sample index set +.>
In the present embodiment, in a first dimension, the sample size is 1, generally [0,1]The sampling probability of the corresponding element is [ 0.33.0.67 ]]The method comprises the steps of carrying out a first treatment on the surface of the In the second dimension, the sample size is 1, generally [0,1]The sampling probability of the corresponding element is [ 0.33.0.67 ]]. Assume the sampling result
S225, combining the sampling index set and the index set of the new tensor to obtain { V } 1 ,V 2 ,...,V h ∪{V new }, wherein V is new Representing T G Is used for the time dimension index of (a).
In the present embodiment, V new =[2,3]The final sample index is { [1 ]],[1,2,3]}。
S226, according to the sampling index { V ] 1 ,V 2 ,...,V h ∪{V new And } takes the sample sub-tensors.
In the present embodiment, the sub-tensor is T all [1,{1,2,3}]=[111]。
S227, repeating the steps S221-S226 until k sub tensors sT are generated 1 ,......,sT k 。
S23, for each sub-tensor sT i Performing parallel distributed Boolean CP decomposition, and calculating to obtain a decomposition factor matrix of each matrix
Preferably, the step S23 includes the steps of:
s231, sub-tensor sT i Factor matrix of (2)And initializing for Y times, namely initializing to be a Boolean matrix with non-zero term probability p each time, and taking a factor matrix with the minimum reduction error as a final initialization matrix.
In this embodiment, Y is set according to actual needs, and is generally an arbitrary integer from 5 to 20.
S232, carrying out h iterations, fixing (h-1) factor matrixes in each iteration process, and optimizing the rest factor matrixes to minimize the overall reduction error.
This embodiment employs least squares optimization. The following procedure is illustrated with h=3.
Sub-tensor sT i The decomposition factor matrices of (a) are respectivelyFix->OptimizationMinimizing the reduction error; fix->Optimization->Minimizing the reduction error; fix->Optimization->So that the reduction error is minimized.
S233, repeating the step 232 until the number of iteration rounds reaches k or the iteration error is smaller than e, and returning to the Boolean factor matrix
In this embodiment, k and e are set according to actual needs.
S24, sub tensor sT i Boolean decomposition matrix of (2)And old tensor T old Is>Merging to obtain a new tensor T all Boolean CP decomposition results->
Combining the two to obtain a new tensor T all Boolean CP decomposition results of (C)Can be the old tensor T all Introducing updates to the decomposition matrix, reducing the error of the decomposition.
Preferably, step S24 comprises the steps of:
s241 tensor sT is added 1 Is the Boolean decomposition moment of (2)ArrayAnd old tensor T old Is>Combining to obtain a Boolean decomposition matrix set +.>
S242, sub-tensor sT 2 Boolean decomposition matrix of (2)And Boolean decomposition matrix set->The corresponding matrices are combined, and so on, until the sub-tensor sT k Is>And Boolean decomposition matrix set->Combining the corresponding matrixes to obtain a new tensor T all Is a Boolean CP decomposition matrix->
…
Preferably, the merging of the boolean decomposition matrices comprises the following steps:
(1) Calculating tensors V and U, wherein V is V x Tensor recovered from other factor matrix, tensor U is U x Tensor recovered from other factor matrices, v x Is thatLine x, u x Is->Corresponding to the index row.
(2) Calculating reconstruction error epsilon of tensor V and old tensor factor matrix 1 And epsilon 2 ;
ε 1 =||V-T x ||
ε 2 =||U-T x ||
Wherein T is x For the slice tensor of the corresponding index row, the tensor 1-norm is denoted as ||, i.e. the number of non-zero entries in the boolean tensor.
(3) Judging whether epsilon is satisfied 1 <ε 2 If yes, u of the original tensor factor matrix x Using v x And updating, otherwise, not updating.
Satisfy epsilon 1 <ε 2 The u of the original tensor factor matrix represents the update line to reduce the overall reconstruction error x Using v x And updating.
S3, node matrix N 1 Or N 2 Clustering is carried out, and a cluster center and the type of each node are obtained.
Preferably, in the step S3, the clustering mode is a K-Means clustering mode, and the Hamming distance is selected as the distance.
The method comprises the following steps:
s31, selecting a row vector set { N } of a node Boolean factor matrix N 1 ,n 2 ,...,n l "l" is the number of rows of matrix N, also the number of graph nodes。
S32, selecting r vectors from the vector as initial cluster centers, wherein r represents the number of cluster centers and is also the number of super points generated in the final graph abstract.
In this embodiment, r is K, and is initialized to 100.
S33, calculating the Hamming distance from other nodes to each cluster center, and dividing the Hamming distance into clusters closest to the other nodes.
S34, updating the cluster center by using the rounding average value of all the vectors according to all the vectors in each cluster, and completing one round of iteration.
S35, if the iteration times reach a specified value, outputting the cluster of each point.
In this embodiment, the iteration number specified value is 10.
And S4, regarding the cluster center as the superpoints of the graph abstract, and calculating the superedge weight between the superpoints to obtain the complete graph abstract.
Preferably, step S4 comprises the steps of:
s41, calculating the superedge weight between the superpoints in the graph abstract according to the graph node adjacency similarity formula.
S42, calculating the reconstruction error of the graph abstract according to the Euclidean distance of the tensor.
Wherein S is i 、S j Cluster center calculated for clustering algorithm, l and m are S respectively i 、S j In (2), L is the Boolean tensor T all Time dimension length, N is T all The number of nodes, σ (S i ) Is S i The number of points involved, || represents the absolute value operator.
S43, judging whether the reconstruction error meets a set threshold value, if so, taking the cluster as a node in the graph abstract, taking the over-edge weight as the weight of the edge of the graph abstract, otherwise, changing the number of cluster centers, and then entering step S3.
In this embodiment, the reconstruction error setting threshold is 1000. If the reconstruction error does not meet the set threshold, the number of clustering centers is increased.
It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.
Claims (8)
1.A method for generating a social network diagram abstract based on tensor calculation is characterized by comprising the following steps:
s1, tensor representation is carried out on a social network diagram in a target time period to obtain a target Boolean tensor T G The method comprises the steps of carrying out a first treatment on the surface of the The social network graph is a dynamic undirected unauthorized graph and corresponds to the time stamp one by one;
s2, for a target Boolean tensor T G Tensor decomposition is carried out to obtain a decomposed node matrix N 1 ,N 2 ;
S3, node matrix N 1 Or N 2 Clustering is carried out to obtain a cluster center and the type of each node;
s4, regarding the cluster center as the superpoints of the graph abstract, and calculating the superedge weight between the superpoints to obtain the graph abstract of the social network graph;
wherein, step S2 includes the following steps:
s21, old Boolean tensor T old And target Boolean tensor T G Merged into a boolean tensor T all Boolean tensor T all The last order is the time dimension, the old Boolean tensor T old A tensor representation of the social network graph for a previous period of time;
s22, P-Boolean tensor T all Performing biased sampling to generate k sub-tensors sT i ;
S23, for each sub-tensor sT i A parallel distributed boolean CP decomposition is performed,calculating to obtain a decomposition factor matrix of each sub-tensor
S24, sub tensor sT i Boolean decomposition matrix of (2)And old Boolean tensor T old Is>Merging to obtain a new Boolean tensor T all Boolean CP decomposition results->
Wherein i is more than or equal to 1 and less than or equal to k, j is more than or equal to 1 and less than or equal to h.
2. The method according to claim 1, wherein the step S22 comprises the steps of:
S222. WillDivided by T old The number of non-zero elements in the index is calculated to obtain the sampling probability of each order of index
S223, calculating T according to the set sampling factors old Size L of each-order sampling index j ;
S224, according to sampling probabilityFor T old L is carried out by the j-th order index j Subsampling to obtain sample index set +.>
S225, collecting sampling indexAnd target Boolean tensor T G Is combined to obtain { V } 1 ,V 2 ,...,V h ∪{V new }, wherein V is new Representing T G Is a time dimension index of (2);
s226, according to the index set { V ] 1 ,V 2 ,...,V h ∪{V new -obtaining a sample sub-tensor;
s227, repeating the steps S221-S226 until k sub tensors are generated.
3. The method according to claim 1, wherein the step S23 comprises the steps of:
s231, sub-tensor sT i Factor matrix of (2)Initializing for Y times, namely initializing a Boolean matrix with non-zero term probability p each time, and taking a factor matrix with the minimum reduction error as a final initialization matrix;
s232, carrying out h iterations, fixing (h-1) factor matrixes in each iteration process, and optimizing the rest factor matrixes to minimize the overall reduction error, thereby completing one round of iteration;
4. The method of claim 1, wherein step S24 comprises the steps of:
s241 tensor sT is added 1 Boolean decomposition matrix of (2)And old Boolean tensor T old Is>Combining to obtain a Boolean decomposition matrix set +.>
S242, sub-tensor sT 2 Boolean decomposition matrix of (2)And Boolean decomposition matrix set->The corresponding matrices are combined, and so on, until the sub-tensor sT k Is>And Boolean decomposition matrix set->Combining the corresponding matrixes to obtain a new tensor T all Is a Boolean CP decomposition matrix->
5. The method of claim 4, wherein the combining of the boolean decomposition matrices comprises the steps of:
(1) Calculation sheetQuantity V and tensor U, where V x Is thatLine x, u x Is->Corresponding to the index row of samples of (a), V is V x Tensor recovered from other factor matrix, u is u x Tensors recovered from other factor matrices;
(2) Calculating reconstruction error epsilon of tensor V and old tensor factor matrix 1 And epsilon 2 ;
ε 1 =||V-T x ||
ε 2 =||U-T x ||
Wherein T is x Slice tensors for corresponding index rows;
(3) Judging whether epsilon is satisfied 1 <ε 2 If yes, u of the original tensor factor matrix x Using v x And updating, otherwise, not updating.
6. The method of claim 1, wherein in step S3, a Hamming distance is selected, a cluster center number r is set, and a K-Means clustering mode is adopted to obtain cluster centers S i And the cluster to which each node belongs, i=1.
7. The method according to any one of claims 1 to 5, wherein step S4 comprises the steps of:
s41, calculating the over-edge weight between the over-points in the graph abstract, wherein the calculation formula is as follows:
wherein S is i 、S j Cluster center calculated for clustering algorithm, l and m are S respectively i 、S j In (2), L is the Boolean tensor T all Time dimension length, N is T all The number of nodes, σ (S i ) Is S i The number of points included;
s42, calculating a reconstruction error of the graph abstract, wherein the calculation formula is as follows:
s43, judging whether the reconstruction error meets a set threshold value, if so, taking the cluster as a node in the graph abstract, taking the over-edge weight as the weight of the edge of the graph abstract, otherwise, changing the number of cluster centers, and then entering step S3.
8. A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when executed by a processor, the computer program implements the method for generating a tensor-based social network graph summary according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911373671.1A CN111159483B (en) | 2019-12-26 | 2019-12-26 | Tensor calculation-based social network diagram abstract generation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911373671.1A CN111159483B (en) | 2019-12-26 | 2019-12-26 | Tensor calculation-based social network diagram abstract generation method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111159483A CN111159483A (en) | 2020-05-15 |
CN111159483B true CN111159483B (en) | 2023-07-04 |
Family
ID=70558533
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911373671.1A Active CN111159483B (en) | 2019-12-26 | 2019-12-26 | Tensor calculation-based social network diagram abstract generation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111159483B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111881191B (en) * | 2020-08-05 | 2021-06-11 | 留洋汇(厦门)金融技术服务有限公司 | Client portrait key feature mining system and method under mobile internet |
CN112287118B (en) * | 2020-10-30 | 2023-06-02 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Event mode frequent subgraph mining and prediction method |
CN112507245B (en) * | 2020-12-03 | 2023-07-18 | 中国人民大学 | Social network friend recommendation method based on graph neural network |
CN113139098B (en) * | 2021-03-23 | 2023-12-12 | 中国科学院计算技术研究所 | Abstract extraction method and system for homogeneity relation large graph |
CN113157981B (en) * | 2021-03-26 | 2022-12-13 | 支付宝(杭州)信息技术有限公司 | Graph network relation diffusion method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107545509A (en) * | 2017-07-17 | 2018-01-05 | 西安电子科技大学 | A kind of group dividing method of more relation social networks |
CN107656928A (en) * | 2016-07-25 | 2018-02-02 | 长沙有干货网络技术有限公司 | The method that a kind of isomery social networks of user clustering is recommended |
CN107767280A (en) * | 2017-10-16 | 2018-03-06 | 湖北文理学院 | A kind of high-quality node detecting method based on element of time |
CN109697467A (en) * | 2018-12-24 | 2019-04-30 | 宁波大学 | A kind of summarization methods of complex network figure |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8060512B2 (en) * | 2009-06-05 | 2011-11-15 | Xerox Corporation | Hybrid tensor-based cluster analysis |
US10956500B2 (en) * | 2017-01-19 | 2021-03-23 | Google Llc | Dynamic-length stateful tensor array |
US10268646B2 (en) * | 2017-06-06 | 2019-04-23 | Facebook, Inc. | Tensor-based deep relevance model for search on online social networks |
-
2019
- 2019-12-26 CN CN201911373671.1A patent/CN111159483B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107656928A (en) * | 2016-07-25 | 2018-02-02 | 长沙有干货网络技术有限公司 | The method that a kind of isomery social networks of user clustering is recommended |
CN107545509A (en) * | 2017-07-17 | 2018-01-05 | 西安电子科技大学 | A kind of group dividing method of more relation social networks |
CN107767280A (en) * | 2017-10-16 | 2018-03-06 | 湖北文理学院 | A kind of high-quality node detecting method based on element of time |
CN109697467A (en) * | 2018-12-24 | 2019-04-30 | 宁波大学 | A kind of summarization methods of complex network figure |
Non-Patent Citations (1)
Title |
---|
Pauli Miettinen.《IEEE/Walk’n’Merge: A Scalable Algorithm for Boolean Tensor Factorization》.2013,全文. * |
Also Published As
Publication number | Publication date |
---|---|
CN111159483A (en) | 2020-05-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111159483B (en) | Tensor calculation-based social network diagram abstract generation method | |
Saldana et al. | How many communities are there? | |
Aljuaid et al. | Proper imputation techniques for missing values in data sets | |
CN112182245B (en) | Knowledge graph embedded model training method and system and electronic equipment | |
Wyse et al. | Inferring structure in bipartite networks using the latent blockmodel and exact ICL | |
Yu et al. | Zinb-based graph embedding autoencoder for single-cell rna-seq interpretations | |
Zhang et al. | Gaussian mixture model clustering with incomplete data | |
JP5089854B2 (en) | Method and apparatus for clustering of evolving data streams via online and offline components | |
Wang et al. | Consistent multiple graph embedding for multi-view clustering | |
Yao et al. | A review on optimal subsampling methods for massive datasets | |
CN113206831B (en) | Data acquisition privacy protection method facing edge calculation | |
Huang et al. | Spectral clustering via adaptive layer aggregation for multi-layer networks | |
Chi et al. | Stable estimation of a covariance matrix guided by nuclear norm penalties | |
Wu et al. | Generalized linear models with low rank effects for network data | |
Wang et al. | QoS prediction of web services based on reputation-aware network embedding | |
Wang et al. | Time series clustering based on sparse subspace clustering algorithm and its application to daily box-office data analysis | |
Wang et al. | Towards personalized federated learning via heterogeneous model reassembly | |
CN113239266B (en) | Personalized recommendation method and system based on local matrix decomposition | |
Zhou et al. | A dynamic logistic regression for network link prediction | |
Lima | Hawkes processes modeling, inference, and control: An overview | |
Li et al. | An alternating nonmonotone projected Barzilai–Borwein algorithm of nonnegative factorization of big matrices | |
Chen et al. | Community Detection Based on DeepWalk Model in Large‐Scale Networks | |
CN114429404A (en) | Multi-mode heterogeneous social network community discovery method | |
Gao et al. | Dynamic community detection using nonnegative matrix factorization | |
Chen et al. | Multi-view clustering method based on graph attention autoencoder |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |