CN112256941A - Social network reconstruction method based on local information - Google Patents

Social network reconstruction method based on local information Download PDF

Info

Publication number
CN112256941A
CN112256941A CN202011123548.7A CN202011123548A CN112256941A CN 112256941 A CN112256941 A CN 112256941A CN 202011123548 A CN202011123548 A CN 202011123548A CN 112256941 A CN112256941 A CN 112256941A
Authority
CN
China
Prior art keywords
node
network
nodes
placeholder
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011123548.7A
Other languages
Chinese (zh)
Other versions
CN112256941B (en
Inventor
韩忠明
李俊
段大高
李胜男
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Technology and Business University
Original Assignee
Beijing Technology and Business University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Technology and Business University filed Critical Beijing Technology and Business University
Priority to CN202011123548.7A priority Critical patent/CN112256941B/en
Publication of CN112256941A publication Critical patent/CN112256941A/en
Application granted granted Critical
Publication of CN112256941B publication Critical patent/CN112256941B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a social network reconstruction method based on local information, which comprises the following steps: preprocessing data; step two: adding placeholder nodes according to the node attributes and the network topology structure; the placeholder node is a temporary node which exists for locating the missing point; step three: clustering the placeholder nodes added in the step two to determine the specific number and positions of the missing points, eliminating redundant placeholder nodes and increasing the accuracy of network reconstruction; step four: determining a missing edge in the network using a link prediction algorithm; step five: and reconstructing the network structure according to the processing result of the step. The method can effectively determine the position of the missing point, solve the problem of the missing point firstly, and can well make effective support for the complement of subsequent missing edges, thereby improving the prediction accuracy; the method can enable the relevant research of public opinion transmission to be more efficient, and effectively control the serious influence and harm caused by poor public opinion transmission.

Description

Social network reconstruction method based on local information
Technical Field
The invention relates to the field of network reconstruction, in particular to a social network reconstruction method based on local information, which can be applied to the aspects of public opinion propagation, rumor detection and the like.
Background
At the present stage, the rapidity of the network and the rapid increase of the number of netizens increase the scale of the network greatly, and the social network and other complex networks appear, so that various scientific researches aiming at the complex networks begin to develop rapidly. In the study of complex networks, the study of the network itself is an important point, and the large scale of the network determines its complexity. In the research of complex networks such as social networks, the research of network reconstruction is a more prominent problem, and the research of network reconstruction can be used as a prerequisite for other researches based on complete networks. The invention takes reconstruction of the social network as an explanation, the reconstruction of the social network can be applied to media transmission of the social network, various information layers on the internet are infinite, and particularly, various messages can be rapidly spread by means of media transmission. In reality, it is relatively difficult to obtain a large amount of effective information, and due to the loss of data caused by noise data and other various reasons, the reconstruction of the social network can be applied to such problems, and the correct and effective reconstruction of the social network can help to control the propagation of public sentiment and avoid the generation of adverse effects. The network negative public opinion propagation is a great trouble in the present society, serious damage is brought to the safety and stability of the society, the network public opinion propagation mode is complex and various and is difficult to control, and in addition, information in the network is lost, so that how to find the network negative public opinion propagation structure and propagation path is a great problem. The method can solve the problem of network information loss in public opinion propagation control by reconstructing the social network based on the local information, and solves the positioning problem of public opinion propagation structures and paths by reconstructing the complete social network.
In the research on network reconfiguration, researchers have proposed a link prediction method and a missing point identification method. The problem of edge and node missing in the network is solved to a certain extent by related link prediction and missing point identification, link prediction mostly aims at reconstruction of missing edges in the network, but due to complexity of the network, not only edge missing but also node missing can occur in network missing, and considerable difficulty is brought to reconstruction of the network. At present, network reconstruction has relatively few researches on solving the common deficiency of edges and nodes, and the accuracy is low. The invention provides a social network reconstruction method based on local information, which is used for complementing some defects of the existing research technology and is used for complementing and supporting effective implementation of various propagation models.
Disclosure of Invention
The invention aims to provide a social network reconstruction method based on local information, which is mainly applied to public sentiment propagation and other propagation models and is used for solving the problem of network edge and node loss in the propagation process. The invention can simultaneously solve the problem of coexistence of node deletion and edge deletion in network reconstruction. The method has the advantages that the loss of the network structure is not only simple edge loss or node loss, but also the research on network reconstruction is complex when the two types of loss occur simultaneously. The invention is mainly applied to public opinion propagation, rumor detection and other aspects, is used for perfecting the deletion of related network data in the research of public opinion propagation models and the like, well solves the problem of insufficient network structure missing information, and is beneficial to the construction of public opinion propagation models, public opinion control and the like. Although the relationship network can be considered to be replaced in the propagation model effect testing stage, the problem of network data loss may occur in practical scene application, and the replacement of a new network cannot guarantee the completeness and effectiveness of data, so that it is very necessary to reconstruct a complete social network by using a corresponding method in a relevant propagation model, particularly a public opinion propagation model.
The technical scheme of the invention is as follows:
the method comprises the following steps: preprocessing data; social network data are obtained through a web crawler or an existing data set is used, the data are sorted and analyzed to obtain experimental data, and the experimental data are divided into a user node table and a user relation table to be stored respectively.
Step two: adding placeholder nodes according to the node attributes and the network topology structure; the placeholder node is a temporary node which exists for locating the missing point;
step three: clustering the placeholder nodes added in the step two to determine the specific number and positions of the missing points, eliminating redundant placeholder nodes and increasing the accuracy of network reconstruction;
step four: determining a missing edge in the network using a link prediction algorithm;
step five: and reconstructing the network structure according to the processing result of the step.
The following further introduces details of related steps, and first briefly introduces the symbols and concepts involved in the present invention: the network used in the experiment of the invention can be regarded as an undirected and unwarranted graph, the nodes related to the invention are represented by V, and the missing points refer to the nodes which are lacked by the known network relative to the original complete network; the placeholder refers to a temporary node added for marking and positioning the missing point, and as the experiment advances, the placeholder node is further optimized to be the missing point treatment. A missing point refers to a node that is present in the parent set but not in the subset. The missing points are randomly missing from the father network, and the node numbers are irrelevant to the node sequence in the network.
The following are the detailed steps of the invention:
the method comprises the following steps: and (4) preprocessing data.
11) Cleaning and sorting the data to obtain a user relation table and a user node table, wherein the user node table comprises user node id, node degree, fan and attention number of the user node, such as (id)i,difollowerCounts, followingCounts); the user relationship table includes connection relationships between user nodes, e.g., (id)1,id2) (ii) a The two tables are linked by node id;
12) constructing a sub-network G ', G' ═ { V ', E' }, V '═ V' using a user relationship table1,V2,V3,…,Vm},E’={(Vi,Vj) … } consisting of m nodes, the structure of which is known; the network G' is known to belong to a sub-network of the network G, in which the node specifications are known as n, ViRepresenting nodes, the set V 'belonging to a subset of the set V, the set E' belonging to a subset of the set E, n>m;
13) The node table is used for acquiring an original degree set D { (V) of the node1,D1),(V2,D2),(V3,D3),…,(Vm,Dm) And node existence set d { (V)1,d1),(V2,d2),(V3,d3),…,(Vm,dm)}。DiDegree of node i in network G, expressed as sum of fan number and attention number of user node, diRepresents the degree of node i in the network G';
14) constructing an adjacency matrix T, T of the network G' by knowing the network GijAnd representing the connection relation between the nodes i and j, wherein the connection relation between the nodes is represented by 0 and 1, and if the nodes have connection, the connection is 1, otherwise, the connection is 0.
Step two: and adding a placeholder node according to the node attribute and the network topology, wherein the placeholder node is a temporary node which exists for positioning the missing point.
21) Randomly selecting a node i in the network G' as a starting node, and firstly judging diWhether or not equal to Di
22) If d isi<DiAdding placeholder nodes for the nodes, wherein the number of the added placeholder nodes is Di-diDetermining;
23) if d isi=DiAnd skipping over the current node and judging the next node.
24) And recursively traversing all the nodes in the network G ', adding the placeholder nodes into the network G ', and updating the adjacency matrix T of the network G ' to obtain a new adjacency matrix T1 and a new network G1.
Step three: and clustering the added placeholder nodes by using a clustering algorithm to determine missing points.
And clustering the placeholder nodes added in the step two to determine the specific number and position of the missing points, eliminate redundant placeholder nodes and increase the accuracy of network reconstruction.
31) Firstly, calculating a connection vector of each placeholder node through a link prediction algorithm, wherein the connection relation indicates whether edges are connected between the nodes, the connection vector indicates a vector of the connection relation between the node and other nodes, and the link prediction algorithm is only used for calculating the connection vector. Obtaining a similarity matrix S through a link prediction algorithm, wherein a row where the placeholder nodes are located in the matrix S represents a connection vector between the placeholder nodes, namely SiA connection vector representing the placeholder node i.
32) Calculating the Pearson correlation degree between the placeholder node connection vector pairs, and constructing a correlation matrix R, wherein elements in the correlation matrix R are formed by the Pearson correlation degree between the placeholder nodes, and the calculation formula of the Pearson correlation degree is as follows:
Figure BDA0002732861490000041
rXYrepresenting Pearson's correlation, X, Y two different variables, E the expectation symbol, EX, EY the expectation of variable X, Y, EXY-EXEY the covariance of variable X, Y, (EX)2-(EX)2) And (EY)2-(EY)2) Representing the variance of the variables X, Y, respectively, the pearson correlation of two variables may be defined as the product of the covariance between them divided by their respective standard deviations.
33) After the correlation matrix R is obtained through calculation, K-means clustering operation is carried out, each row in the matrix R represents the correlation between one placeholder node and other placeholder nodes, the placeholder nodes can be regarded as one clustering object, each clustering object also represents one placeholder node, the placeholder nodes are clustered into K placeholder clusters through clustering operation, the K value is known because the scale of the network G is known, K is | V | - | V' |, | V | represents the number of nodes in the network G, the placeholder nodes in the same placeholder cluster represent a missing point, the missing point is added into the network G1, and meanwhile, the adjacency matrix T1 of the network G1 is updated to obtain a new adjacency matrix T2 and a new network G2 constructed by the adjacency matrix T2.
Step four: determining missing edges in a network using a link prediction algorithm
41) And randomly selecting a node i in the network G2 as a starting node, and if the selected node is a missing node determined after the step trimerization class is completed, skipping the node. Recalculating the degree d of a node from the adjacency matrix T2 of the network G2i'. Judgment of di' and DiSize of (d)i' degree, D, of node i in network G2iRepresenting the degree of node i in the network G.
42) Calculating a similarity matrix Sim, Sim by a link prediction algorithmiAnd representing the row of the node i in the similarity matrix Sim, wherein each element represents the similarity score Grade of the node i and other nodes. The method can be applied to various link prediction algorithms, and the embodiment selects a CN (common neighbors) algorithm, and the CN algorithm judges the similarity between the nodes according to the number of common neighbors between the two nodes. The definition of CN is: sijWhere | n (i) # n (j) |, i and j denote nodes, and n (i) and n (j) denote neighbor nodes of node i and node j.
43) Judgment of DiAnd di' size, if Di>diIf yes, the node i is not full, the node i has a missing edge, and a connecting edge is selected for the node i according to the similarity score in the similarity matrix Sim to enter the next node operation.
44) If d isi’=DiThen calculate Di-diAnd then judging according to the similarity matrix Sim, judging the similarity scores of the node i and other nodes, if the similarity score with the connected missing point is smaller than the similarity score with the non-missing point node, disconnecting the node with the missing point, selecting the non-missing point node with high similarity score to connect, and if not, skipping the current node i and entering the next node operation.
45) And (4) repeating the steps 41) and 42) and 43) until all nodes except the missing points determined after the clustering is completed in the step 3 in the network G2 are calculated, and updating the adjacency matrix T2 to obtain a new adjacency matrix T3.
Step five: and reconstructing the network structure according to the adjacency matrix T3 obtained in the fourth step.
The invention has the beneficial effects that:
1. the social network is reconstructed based on local and limited information, a new placeholder introduction mode is added, the Pearson correlation degree is used for judging the correlation, the position of the missing point can be effectively determined, the problem of the missing point is solved firstly, effective support can be well made for complement of subsequent missing edges, and prediction accuracy is improved.
2. The social network is reconstructed based on the local information, so that the problems related to network node information loss and the like in the public opinion transmission research process are effectively solved, the public opinion transmission related research is more efficient, and the serious influence and harm caused by poor public opinion transmission are effectively controlled.
Drawings
In order to effectively and more clearly illustrate the embodiments of the present invention, reference will be made to the following description taken in conjunction with the accompanying drawings. The following description figures are only some of the key embodiments, and other figures may be further taken from the following figures.
FIG. 1 is a schematic flow chart of the present invention
FIG. 2 is a diagram of a topology of a social network constructed by experiments according to an embodiment of the present invention
FIG. 3 is a network topology structure diagram after the missing point addition in step two of the embodiment of the present invention
FIG. 4 is a schematic diagram of missing point clustering after the third step of processing according to the embodiment of the present invention
FIG. 5 is a block diagram of a reconstructed social network topology according to an embodiment of the invention
FIG. 6 shows an experimental data set of an example of the present invention
FIG. 7 shows a new adjacency matrix T1 and a new network G1 according to an embodiment of the invention
FIG. 8 shows a new adjacency matrix T2 and a network G2 constructed by the adjacency matrix T2 according to an embodiment of the invention
FIG. 9 is a reconstructed scientific collaboration network in accordance with an embodiment of the present invention
Detailed Description
The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
As shown in fig. 1, the method for reconstructing a social network based on local information of the present invention includes the following specific steps:
the method comprises the following steps: data pre-processing
a. Assuming a small social network G with 7 nodes for verifying the propagation model, the method can be applied to a large-scale network, and the small-scale network is briefly described here. A sub-network of network G' belonging to network G is shown in fig. 2 as an experimental data set. The social network G' is in a incomplete state, and therefore is very disadvantageous for implementing the propagation model, and the propagation model needs to use a complete network structure as a support, so that a network replacement must be considered, or a method for complementing the incomplete network is adopted. The replacement of the network is complex, the same problem may occur in a new network, and some data loss problems may occur, so that a technical method may be considered to perfect a supplementary network structure, and the following known conditions may be obtained through a user relationship table and a node table:
G’={V,E},V’={V1,V2,V3,V5,V6},E’={(V1,V2),(V1,V3),(V2,V3),(V3,V5),(V5,V6)}。
D={(V1,3),(V2,4),(V3,4),(V5,4),(V6,2)},d={(V1,2),(V2,2),(V3,3),(V5,3),(V6,1)}。
b. and constructing an adjacency matrix, and constructing a corresponding adjacency matrix T according to the acquired topological structure of the known sub-network G'.
Figure BDA0002732861490000061
The adjacency matrix T is an adjacency matrix of 5 multiplied by 5, and the adjacency matrix is constructed by 0 and 1, wherein 0 represents that two nodes have no relation, and 1 represents that two nodes are related.
Step two: adding placeholder nodes according to node attributes and network topology
21) Randomly selecting a node V in a network G1As a start node, first, d is judged1Whether or not equal to D1
22) E.g. d1=2<D1If 3, adding placeholder nodes for the nodes, wherein the number of the added placeholder nodes is D1-d13-2-1, and is thus node V1Add a placeholder node Z1; by analogy, judging the next node; until all the nodes in the network G 'are judged, 7 placeholder nodes from Z1 to Z7 are added in total, and meanwhile, the adjacency matrix T of the network G' is updated to obtain a new adjacency matrix
Figure BDA0002732861490000071
And a new network G1, as shown in fig. 3.
T1 is a matrix of 12X 12. And step two, adding 7 placeholder nodes in total, and updating the adjacent matrix T from 5 multiplied by 5 to a matrix T1 of 12 multiplied by 12.
Step three: clustering the added placeholder nodes by using a clustering algorithm to determine missing points
And clustering the 7 placeholder nodes added in the step two to determine the specific positions of the missing points, eliminating redundant placeholder nodes and redundant edges and increasing the accuracy of network reconstruction.
31) The connection vector for each placeholder node is first calculated by the CN link prediction algorithm. Obtaining a similarity matrix S through a link prediction algorithm, wherein a row where the placeholder nodes are located in the matrix S represents a connection vector between the placeholder nodes, namely SiA connection vector representing the placeholder node i.
According to the CN algorithm, obtaining a similarity matrix S:
Figure BDA0002732861490000081
32) similarity matrix S1It represents the first row in the matrix, node V1The other nodes are consistent. The join vector for the placeholder nodes added in step two is located in lines 6 to 12. Extracting the connection vector of the placeholder nodes, calculating the Pearson correlation degree r between every two placeholder node connection vectors,
Figure BDA0002732861490000082
constructing a correlation matrix R by using the Pearson correlation,
Figure BDA0002732861490000083
33) performing K-means clustering operation, wherein each row in the matrix R represents the correlation between one placeholder node and other placeholder nodes to serve as a clustering object, each clustering object also represents one placeholder node, clustering the sample set into placeholder clusters through the clustering operation, each placeholder cluster represents a missing point, and updating the adjacency matrix T1 of the network G1 to obtain a new adjacency matrix T1
Figure BDA0002732861490000084
And a network G2 constructed by the adjacency matrix T2, as shown in FIG. 4, the set of placeholder nodes in the two ellipses represents a missing point, and the two missing points L1 and L2 are determined after clustering is completed.
Step four: determining missing edges in a network using a link prediction algorithm
41) And randomly selecting a node in the network G2 to start, judging whether the selected node is a missing point determined after the step of trimerization is completed, if not, performing subsequent calculation, and if so, re-selecting the node.
42) By node V1For example, a similarity matrix is calculated, i.e. a node V is calculated1The connection vector of (2).
43) Judging to obtain D1=d1' -3 due to node V1If the node is connected with a missing point, judging again according to the similarity score, and selecting the node V5Make connection and disconnect the node V1Linkage to the point of deletion.
44) If d isi’=DiJudgment node V1And node V5Is higher than the score associated with the miss point L1, node V is disconnected1Ligation to deletion point L1, joining V5Is connected with it.
45) Repeating the steps 41), 42) and 43) until all the nodes except the missing points determined after the step of trimerization in the network G2 are calculated, and updating the adjacency matrix T2 to obtain a new adjacency matrix
Figure BDA0002732861490000091
Step five: the network is constructed by the adjacency matrix, and as shown in fig. 5, the gray nodes are the complementary missing points.
The reconstructed network can be applied to a public opinion propagation model, the network required by the public opinion propagation model has a complete network structure and has no information loss, and the network reconstructed by the method can well solve the problem and enable the research of the propagation process to be more accurate and clear. The invention is not limited to public opinion transmission, and can be applied to other disease transmission models like SIR, SIS and the like to perfect a network structure and realize good transmission model effect.
Specific application examples of the method of the present invention will be given below. The method can be well applied to research on scientific research cooperative relationships of scholars, research on relationships between mining guides and students and the like, and a complete network structure is required to be used as a support, so that the method can be used in the scene. The following were used: the practical application of the invention is illustrated by taking a scientific research cooperative relationship network among scholars as an example.
The method comprises the following steps: data pre-processing
Suppose a partnership network G with 10 more closely related DBLP authors, the network G 'belonging to a sub-network of the network G, G' being shown in fig. 6 as an experimental data set. Due to the problems of data loss and the like, the relational network G' is in a defect state at the moment, partial edges and nodes are lacked, and partial scholars and the cooperative relations thereof are not collected. At this time, the method of the invention can be used for constructing a complete scientific research cooperative relationship network. In the scientific research cooperative network, the connection relationship indicates that there is a cooperative paper among scholars, the scholars are used as nodes, the number of the scholars cooperating with the scholars is used as degree, a user relationship table and a node table are constructed according to data, and the following known conditions are obtained at the same time:
G’={V’,E’},V’={V1,V2,V3,V4,V5,V7,V8,V9},E’={(V1,V2),(V1,V3),(V2,V8),(V3,V4),(V3,V5),(V4,V5),(V4,V9),(V7,V8),(V8,V9)};
D={(V1,3),(V2,5),(V3,4),(V4,6),(V5,1),(V7,3),(V8,3),(V9,4)},d={(V1,2),(V2,2),(V3,3),(V4,4),(V5,1),(V7,2),(V8,3),(V9,3)}。
and constructing an adjacency matrix, and constructing a corresponding adjacency matrix T according to the acquired topological structure of the known sub-network G'. The adjacency matrix is constructed by 0 and 1, wherein 0 represents that no cooperative paper exists between the two scholars, and 1 represents that a cooperative paper exists between the two scholars.
Step two: adding placeholder nodes according to node attributes and network topology
21) Randomly selecting a node V in a network G1As a start node, first, d is judged1Whether or not equal to D1
22) E.g. d1<D1Then is a nodeAdding placeholder nodes, wherein the number of added placeholder nodes is Di-di3-2-1, and is thus node V1Adding 1 placeholder node; by analogy, judging the next node; until all the nodes in the graph G 'are judged to be completed, the adjacency matrix T of the network G' is updated to obtain a new adjacency matrix T1 and a new network G1, as shown in fig. 7. And step two, adding 7 placeholder nodes in total, and updating the adjacent matrix T from 8 multiplied by 8 to a matrix T1 of 15 multiplied by 15.
Step three: clustering the added placeholder nodes by using a clustering algorithm to determine missing points
And clustering the 7 placeholder nodes added in the step two to determine the specific positions of the missing points, eliminating redundant placeholder nodes and redundant edges and increasing the accuracy of network reconstruction.
31) The connection vector for each placeholder node is first calculated by the CN link prediction algorithm. Obtaining a similarity matrix S through a link prediction algorithm, wherein a row where the placeholder nodes are located in the matrix S represents a connection vector between the placeholder nodes, namely SiA connection vector representing the placeholder node i.
According to the CN algorithm, obtaining a similarity matrix S:
S=T1×T1
32) similarity matrix S1It represents the first row in the matrix, node V1The other nodes are consistent. Extracting the connection vector of the placeholder nodes, calculating the Pearson correlation degree r between every two placeholder node connection vectors,
Figure BDA0002732861490000101
Figure BDA0002732861490000102
constructing a correlation matrix R by using the Pearson correlation,
Figure BDA0002732861490000111
33) performing K-means clustering operation, wherein each row in the matrix R represents the correlation between one placeholder node and other placeholder nodes, and serves as a clustering object, each clustering object also represents one placeholder node, clustering the sample set into placeholder clusters through the clustering operation, each placeholder cluster represents a missing point, and updating the adjacency matrix T1 of the network G1 to obtain a new adjacency matrix T2 and a network G2 constructed by the adjacency matrix T2, as shown in FIG. 8, each placeholder node set represents a missing point.
Step four: determining missing edges in a network using a link prediction algorithm
41) And randomly selecting a node in the network G2 to start, judging whether the selected node is a missing point determined after the step of trimerization is completed, if not, performing subsequent calculation, and if so, re-selecting the node.
42) By node V1For example, a similarity matrix is calculated, i.e. a node V is calculated1The connection vector of (2).
43) Judging to obtain D1=d1', due to node ViIf the node is connected with a missing point, judging again according to the similarity score, and selecting the node V with high similarity score4Make connection and disconnect the node V1Linkage to the point of deletion.
44) If d is1’=D1Judgment node V1And node V4Is higher than the score associated with the missing point, node V is disconnected1Ligation to the deletion site, V4Is connected with it.
45) Repeating the steps 41), 42) and 43) until all nodes except the missing points determined after the trimerization in the step are completed in the network G2 are calculated once, and updating the adjacency matrix T2 to obtain a new adjacency matrix T3;
step five: the network is constructed from a adjacency matrix as shown in fig. 9.
The reconstructed network can be applied to research of scientific research cooperative relationship of scholars, research of relations between teachers and students of scientific research cooperation and the like. The invention is not limited to the cooperative relation research, and can be applied to other fields to perfect the network structure and realize good research effect.

Claims (5)

1. A social network reconstruction method based on local information is characterized in that: the method comprises the following steps:
the method comprises the following steps: preprocessing data; acquiring social network data through a web crawler or using an existing data set, sorting and analyzing the data to obtain experimental data, and dividing the experimental data into a user node table and a user relationship table to be stored respectively;
step two: adding placeholder nodes according to the node attributes and the network topology structure; the placeholder node is a temporary node which exists for locating the missing point;
step three: clustering the placeholder nodes added in the step two to determine the specific number and positions of the missing points, eliminating redundant placeholder nodes and increasing the accuracy of network reconstruction;
step four: determining a missing edge in the network using a link prediction algorithm;
step five: and reconstructing the network structure according to the processing result of the step.
2. The method of claim 1, wherein the social network reconstructing method based on local information comprises: the specific process of the step one is as follows:
11) cleaning and sorting the data to obtain a user relation table and a user node table, wherein the two tables are connected through a node id;
12) constructing a sub-network G ', G' ═ { V ', E' }, V '═ V' using a user relationship table1,V2,V3,…,Vm},E’={(Vi,Vj) … } consisting of m nodes, the structure of which is known; the network G' is known to belong to a sub-network of the network G, in which the node specifications are known as n, ViRepresenting nodes, the set V 'belonging to a subset of the set V, the set E' belonging to a subset of the set E, n>m;
13) The node table is used for acquiring an original degree set D { (V) of the node1,D1),(V2,D2),(V3,D3),…,(Vm,Dm) And node existence set d { (V)1,d1),(V2,d2),(V3,d3),…,(Vm,dm)};DiDegree of node i in network G, expressed as sum of fan number and attention number of user node, diRepresents the degree of node i in the network G';
14) constructing an adjacency matrix T, T of the network G' by knowing the network GijAnd representing the connection relation between the nodes i and j, wherein the connection relation between the nodes is represented by 0 and 1, and if the nodes have connection, the connection is 1, otherwise, the connection is 0.
3. The method of claim 1, wherein the social network reconstructing method based on local information comprises: the second step comprises the following specific processes:
21) randomly selecting a node i in the network G' as a starting node, and firstly judging diWhether or not equal to Di
22) If d isi<DiAdding placeholder nodes for the nodes, wherein the number of the added placeholder nodes is Di-diDetermining;
23) if d isi=DiIf yes, skipping over the current node and judging the next node;
24) and recursively traversing all the nodes in the network G ', adding the placeholder nodes into the network G ', and updating the adjacency matrix T of the network G ' to obtain a new adjacency matrix T1 and a new network G1.
4. The method of claim 1, wherein the social network reconstructing method based on local information comprises: the third step comprises the following specific processes:
31) firstly, calculating a connection vector of each placeholder node through a link prediction algorithm, wherein a connection relation indicates whether edges exist between the nodes or notConnecting, wherein the connecting vector refers to a vector of the connection relation between the node and other nodes; obtaining a similarity matrix S through a link prediction algorithm, wherein a row where the placeholder nodes are located in the matrix S represents a connection vector between the placeholder nodes, namely SiA connection vector representing placeholder node i;
32) calculating the Pearson correlation degree between the placeholder node connection vector pairs, and constructing a correlation matrix R, wherein elements in the correlation matrix R are formed by the Pearson correlation degree between the placeholder nodes, and the calculation formula of the Pearson correlation degree is as follows:
Figure FDA0002732861480000021
33) after the correlation matrix R is obtained through calculation, K-means clustering operation is carried out, each row in the matrix R represents the correlation between one placeholder node and other placeholder nodes and can be regarded as a clustering object, each clustering object also represents one placeholder node, the placeholder nodes are clustered into K placeholder clusters through clustering operation, the placeholder nodes in the same placeholder cluster represent a missing point, the missing point is added into the network G1, and meanwhile, the adjacent matrix T1 of the network G1 is updated to obtain a new adjacent matrix T2 and a new network G2 constructed by the adjacent matrix T2.
5. The method of claim 1, wherein the social network reconstructing method based on local information comprises: the fourth specific process of the step is as follows:
41) randomly selecting a node i in the network G2 as a starting node, and if the selected node is a missing point determined after the step trimerization is completed, skipping the node; recalculating the degree d of a node from the adjacency matrix T2 of the network G2i'; judgment of di' and DiSize of (d)i' degree, D, of node i in network G2iRepresents the degree of node i in network G;
42) calculating a similarity matrix Sim, Sim by a link prediction algorithmiRepresenting a similarity matrix Sim, a row of the node i is positioned, wherein each element represents the similarity score Grade of the node i and other nodes;
43) judgment of DiAnd di' size, if Di>diIf yes, indicating that the node i is not full, the node i has a missing edge, selecting a connecting edge for the node i according to the similarity score in the similarity matrix Sim, and entering the next node operation;
44) if d isi’=DiThen calculate Di-diThen, judging according to the similarity matrix Sim, judging the similarity score of the node i and other nodes, if the similarity score with the connected missing point is smaller than the similarity score with the non-missing point node, disconnecting the node with the missing point, selecting the non-missing point node with high similarity score to connect, otherwise, skipping the current node i, and entering the next node operation;
45) and (4) repeating the steps 41) and 42) and 43) until all nodes except the missing points determined after the clustering is completed in the step 3 in the network G2 are calculated, and updating the adjacency matrix T2 to obtain a new adjacency matrix T3.
CN202011123548.7A 2020-10-20 2020-10-20 Social network reconstruction method based on local information Active CN112256941B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011123548.7A CN112256941B (en) 2020-10-20 2020-10-20 Social network reconstruction method based on local information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011123548.7A CN112256941B (en) 2020-10-20 2020-10-20 Social network reconstruction method based on local information

Publications (2)

Publication Number Publication Date
CN112256941A true CN112256941A (en) 2021-01-22
CN112256941B CN112256941B (en) 2023-10-13

Family

ID=74245084

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011123548.7A Active CN112256941B (en) 2020-10-20 2020-10-20 Social network reconstruction method based on local information

Country Status (1)

Country Link
CN (1) CN112256941B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8560605B1 (en) * 2010-10-21 2013-10-15 Google Inc. Social affinity on the web
CN105228185A (en) * 2015-09-30 2016-01-06 中国人民解放军国防科学技术大学 A kind of method for Fuzzy Redundancy node identities in identification communication network
CN110321493A (en) * 2019-06-24 2019-10-11 重庆邮电大学 A kind of abnormality detection of social networks and optimization method, system and computer equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8560605B1 (en) * 2010-10-21 2013-10-15 Google Inc. Social affinity on the web
CN105228185A (en) * 2015-09-30 2016-01-06 中国人民解放军国防科学技术大学 A kind of method for Fuzzy Redundancy node identities in identification communication network
CN110321493A (en) * 2019-06-24 2019-10-11 重庆邮电大学 A kind of abnormality detection of social networks and optimization method, system and computer equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
魏静;朱恒民;宋瑞晓;蒋世兵;: "个体视角下的网络舆情传递链路预测分析", 现代图书情报技术, no. 01 *

Also Published As

Publication number Publication date
CN112256941B (en) 2023-10-13

Similar Documents

Publication Publication Date Title
Celik et al. Mixed-drove spatiotemporal co-occurrence pattern mining
CN110880019A (en) Method for adaptively training target domain classification model through unsupervised domain
Baingana et al. Joint community and anomaly tracking in dynamic networks
CN112165401A (en) Edge community discovery algorithm based on network pruning and local community expansion
Chen et al. Learning Bayesian network structure from distributed data
Chen et al. A method for local community detection by finding maximal-degree nodes
CN111401149A (en) Lightweight video behavior identification method based on long-short-term time domain modeling algorithm
Taha Disjoint community detection in networks based on the relative association of members
Chen et al. Distributed web mining using bayesian networks from multiple data streams
Zhao et al. Effective and efficient dense subgraph query in large-scale social Internet of Things
He et al. Genetic algorithm with ensemble learning for detecting community structure in complex networks
Jabbour et al. Detecting highly overlapping community structure by model-based maximal clique expansion
CN112949748A (en) Dynamic network anomaly detection algorithm model based on graph neural network
CN112256941A (en) Social network reconstruction method based on local information
CN105162648B (en) Corporations&#39; detection method based on backbone network extension
CN107480130B (en) Method for judging attribute value identity of relational data based on WEB information
CN113159976B (en) Identification method for important users of microblog network
Tao et al. Structural identity representation learning of blockchain transaction network for metaverse
Yu et al. A new method for link prediction using various features in social networks
Luo et al. Detecting community structure based on edge betweenness
Goel et al. Maintenance of structural hole spanners in dynamic networks
Ferdowsi et al. Generating high-quality synthetic graphs for community detection in social networks
CN106599187B (en) Edge instability based community discovery system and method
Wu et al. Detecting highly overlapping community structure based on maximal clique networks
Gómez et al. A divide-link algorithm based on fuzzy similarity for clustering networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant