CN112256941B - Social network reconstruction method based on local information - Google Patents

Social network reconstruction method based on local information Download PDF

Info

Publication number
CN112256941B
CN112256941B CN202011123548.7A CN202011123548A CN112256941B CN 112256941 B CN112256941 B CN 112256941B CN 202011123548 A CN202011123548 A CN 202011123548A CN 112256941 B CN112256941 B CN 112256941B
Authority
CN
China
Prior art keywords
node
network
nodes
placeholder
missing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011123548.7A
Other languages
Chinese (zh)
Other versions
CN112256941A (en
Inventor
韩忠明
李俊
段大高
李胜男
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Technology and Business University
Original Assignee
Beijing Technology and Business University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Technology and Business University filed Critical Beijing Technology and Business University
Priority to CN202011123548.7A priority Critical patent/CN112256941B/en
Publication of CN112256941A publication Critical patent/CN112256941A/en
Application granted granted Critical
Publication of CN112256941B publication Critical patent/CN112256941B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a social network reconstruction method based on local information, which comprises the following steps: preprocessing data; step two: adding placeholder nodes according to node attributes and a network topology structure; the placeholder node is a temporary node which exists for locating the missing point; step three: clustering the added placeholder nodes in the second step to determine the specific number and positions of the missing points, eliminate redundant placeholder nodes and increase the accuracy of network reconstruction; step four: determining missing edges in the network using a link prediction algorithm; step five: reconstructing a network structure according to the processing result of the steps. The method can effectively determine the position of the missing point, solve the problem of the missing point, effectively support the complement of the subsequent missing edge, and improve the prediction accuracy; the related research of public opinion transmission can be more efficient, and serious influence and harm caused by bad public opinion transmission can be effectively controlled.

Description

Social network reconstruction method based on local information
Technical Field
The invention relates to the field of network reconstruction, in particular to a social network reconstruction method based on local information, which can be applied to the aspects of public opinion propagation, rumor detection and the like.
Background
The rapidness of the network at present and the proliferation of the number of netizens greatly increase the scale of the network, the complicated networks such as social networks and the like are brought along, and various scientific researches on the complicated networks are rapidly developed. In the study of complex networks, the study of the network itself is an important point, and the complexity of the network is determined by the large scale of the network. In the research of complex networks such as social networks, the research of network reconstruction is a more prominent problem, and the research of network reconstruction can be taken as a precondition of other researches based on complete networks. The invention takes the reconstruction of the social network as an illustration, the reconstruction of the social network can be applied to the media transmission of the social network, various information layers on the Internet are endless, and particularly, various information layers can be rapidly expanded by means of the media transmission, so that researches on the transmission of public opinion such as rumors and the like appear, and the transmission structures and paths of rumors, public opinion and the like can be positioned and guided correctly by network technology means. In reality, it is relatively difficult to obtain a large amount of effective information, and due to noise data and other various reasons, the problem can be applied to the reconstruction of the social network, so that the correct and effective reconstruction of the social network can help to control the propagation of public opinion and avoid the generation of adverse effects. The network negative public opinion propagation is a great trouble in the current society, brings serious damage to the safety and stability of the society, is complex and various in network public opinion propagation modes, is difficult to control, and is a great problem how to find out the network negative public opinion propagation structure and propagation path along with the lack of information in the network. The invention can solve the problem of network information deficiency in public opinion propagation control based on local information reconstruction social network, and solve the positioning problem of public opinion propagation structure and path by reconstructing complete social network.
Regarding network reconfiguration research, scholars propose a method for predicting a link and a method for identifying a missing point. The problems of edge and node deletion in the network are solved to a certain extent in the related link prediction and missing point identification, and the link prediction is mostly aimed at the reconstruction of the missing edge in the network, but due to the complexity of the network, the missing of the network not only has the missing edge, but also has the missing node, which brings great difficulty to the reconstruction of the network. At present, the network reconstruction has relatively less research on solving the common deficiency of edges and nodes, and has lower accuracy. The invention provides a social network reconstruction method based on local information, which is used for complementing some defects of the existing research technology and is used for complementing and supporting effective implementation of each propagation model.
Disclosure of Invention
The invention aims to provide a social network reconstruction method based on local information, which is mainly applied to propagation models such as public opinion propagation and the like and is used for solving the problem of network edge and node missing in the propagation process. The invention can simultaneously solve the common problems of node deletion and edge deletion in network reconstruction. The network structure is not only simple in edge deletion or node deletion, but also complex in research on network reconstruction when the two are simultaneously generated, and the invention provides a social network reconstruction method based on local information aiming at the problem. The invention is mainly applied to the aspects of public opinion propagation, rumor detection and the like, is used for perfecting the deficiency of related network data in the research of a public opinion propagation model and the like, well solves the problem of insufficient information of network structure deficiency, and is beneficial to the construction of the public opinion propagation model, the public opinion control and the like. Although the propagation model effect test stage can consider replacing a relation network, the problem of network data deletion may occur in the application of an actual scene, and replacing a new network cannot ensure the integrity and effectiveness of data, so that the reconstruction of an integral social network by using a corresponding method in a relevant propagation model, particularly a public opinion propagation model, is very necessary.
The technical scheme of the invention is as follows:
step one: preprocessing data; and acquiring social network data through a web crawler or using an existing data set, performing sorting analysis on the data to obtain experimental data, and dividing the experimental data into a user node table and a user relation table to be respectively stored.
Step two: adding placeholder nodes according to node attributes and a network topology structure; the placeholder node is a temporary node which exists for locating the missing point;
step three: clustering the added placeholder nodes in the second step to determine the specific number and positions of the missing points, eliminate redundant placeholder nodes and increase the accuracy of network reconstruction;
step four: determining missing edges in the network using a link prediction algorithm;
step five: reconstructing a network structure according to the processing result of the steps.
The details of the relevant steps are further described below, and the symbols and concepts related to the invention are briefly described: the network used in the experiment of the invention can be regarded as an undirected and unauthorized graph, the invention relates to the node and is expressed by V, and the defect refers to the node which is lack of the known network relative to the original complete network; placeholder refers to a temporary node added to mark a location of a missing point, and as the trial advances, the placeholder node is further optimized to be a missing point process. Missing points refer to nodes that exist in the parent set and are not in the subset. The missing points are randomly missing from the parent network, and the node numbers are irrelevant to the order of the nodes in the network.
The following is the detailed steps of the invention:
step one: and (5) preprocessing data.
11 Cleaning and sorting the data to obtain a user relation table and a user node table, wherein the user node table comprises user node ids, node degrees, fan numbers of the user nodes and attention numbers, such as (id) i ,d i Followangcounts, followangcounts); the user relationship table includes connection relationships between user nodes, such as (id 1 ,id 2 ) The method comprises the steps of carrying out a first treatment on the surface of the The two tables are connected through a node id;
12 Constructing a sub-network G ', G ' = { V ', E ' }, V ' = { V using a user relationship table 1 ,V 2 ,V 3 ,…,V m },E’={(V i ,V j ) …, consisting of m nodes, the structure is known; knowing that the network G' belongs to a sub-network of the network G, the node size in the network G is known as n, V i Representing nodes, set V 'belonging to a subset of set V, set E' belonging to a subset of set E, n>m;
13 Node table is used to obtain the originality set d= { (V) of the nodes 1 ,D 1 ),(V 2 ,D 2 ),(V 3 ,D 3 ),…,(V m ,D m ) The node degree of existence set d= { (V) 1 ,d 1 ),(V 2 ,d 2 ),(V 3 ,d 3 ),…,(V m ,d m )}。D i Representing the degree of node i in the network G, expressed as the sum of the number of fans and the number of interests of the user node, d i Representing the degree of node i in the network G';
14 Through a known network G)' constructing an adjacency matrix T, T of the network G ij The connection relation between the nodes i and j is represented by 0 and 1, and if the nodes are connected, the connection relation is 1, otherwise, the connection relation is 0.
Step two: placeholder nodes are added according to node attributes and network topology, and are temporary nodes which exist for locating missing points.
21 Randomly selecting a node i in the network G' as a starting node, and firstly judging d) i Whether or not to equal D i
22 If d) i <D i Adding placeholder nodes for the nodes, wherein the number of the added placeholder nodes is represented by D i -d i Determining;
23 If d) i =D i And skipping the current node and judging the next node.
24 Recursively traversing all nodes in the network G ', adding placeholder nodes to the network G ', and simultaneously updating the adjacency matrix T of the network G ' to obtain a new adjacency matrix T1 and a new network G1.
Step three: and clustering the added placeholder nodes by using a clustering algorithm to determine missing points.
Clustering the added placeholder nodes in the second step to determine the specific number and positions of the missing points, eliminate redundant placeholder nodes and increase network reconstruction accuracy.
31 Firstly, calculating the connection vector of each placeholder node through a link prediction algorithm, wherein the connection relation refers to whether edges are connected between the nodes, and the connection vector refers to the vector of the connection relation between the nodes and other nodes. Obtaining a similarity matrix S through a link prediction algorithm, wherein the row of the placeholder nodes in the matrix S represents a connection vector among the placeholder nodes, namely S i Representing the connection vector of placeholder node i.
32 The pearson correlation between the placeholder node connection vector pairs is calculated, a correlation matrix R is constructed, and elements in the correlation matrix R are formed by pearson correlation between the placeholder nodes, wherein the calculation formula of the pearson correlation is as follows:
r XY representing pearson correlation, X, Y representing two different variables, E representing the expected sign, EX, EY representing the expected of variable X, Y, EXY-EXEY representing the covariance of variable X, Y, (EX) 2 -(EX) 2 ) And (EY) 2 -(EY) 2 ) Representing the variance of the variables X, Y, respectively, the pearson correlation of the two variables can be defined as the product of the covariance between them divided by their respective standard deviation.
33 After the calculation is completed to obtain the correlation matrix R, performing K-means clustering operation, wherein each row in the matrix R represents the correlation between one placeholder node and other placeholder nodes, and can be regarded as a clustering object, each clustering object also represents one placeholder node, and is clustered into K placeholders Fu Cu through the clustering operation, since the scale of the network G is known, the K value is known, k= |v| -V' |, the|v| represents the number of nodes in the network G, the placeholder nodes in the same placeholder cluster represent a missing point, the missing point is added into the network G1, and the adjacency matrix T1 of the network G1 is updated to obtain a new adjacency matrix T2 and a new network G2 constructed by the adjacency matrix T2.
Step four: determining missing edges in a network using a link prediction algorithm
41 Randomly selecting a node i in the network G2 as a start node, and if the selected node is a missing point determined after completion of the trimerization class, skipping the node. Recalculating the degree d of a node from the adjacency matrix T2 of the network G2 i '. Judgment d i ' and D i Size d of (d) i ' represents the degree, D, of node i in the network G2 i Representing the degree of node i in the network G.
42 Calculating a similarity matrix Sim by a link prediction algorithm i Representing the row of node i in the similarity matrix Sim, wherein each element represents node i and the othersSimilarity score for a node. The method can be applied to various link prediction algorithms, CN (Common Neighbors) algorithm is selected in the embodiment, and the CN algorithm judges the similarity between the nodes according to the number of the common neighbors between the two nodes. The definition of CN is: s is S ij = |n (i) Σn (j) |, i and j represent nodes, and n (i) and n (j) represent neighbor nodes of the node i and the node j.
43 Judgment of D) i And d i ' size, if D i >d i And if the node i is not full, the node i has missing edges, the continuous edges are selected for the node i through the similarity scores in the similarity matrix Sim, and the next node operation is entered.
44 If d) i ’=D i Then calculate D i -d i And judging according to the similarity matrix Sim, judging the similarity score of the node i and other nodes, if the similarity score of the node i and the connected missing node is smaller than that of the node with the non-missing node, disconnecting the node with the missing node, selecting the node with the non-missing node with high similarity score to connect, otherwise, skipping the current node i, and entering the next node operation.
45 Repeating steps 41), 42), 43) until all nodes in the network G2 except the missing points determined after the clustering by step 3 are calculated once, and updating the adjacency matrix T2 to obtain a new adjacency matrix T3.
Step five: reconstructing a network structure according to the adjacency matrix T3 obtained in the fourth step.
The invention has the beneficial effects that:
1. the social network is reconstructed based on local and limited information, a new placeholder introduction mode is added, the pearson correlation is used for carrying out correlation judgment, the position of the missing point can be effectively determined, the problem of the missing point is solved, effective support can be well carried out for the complement of the subsequent missing edge, and the prediction accuracy is improved.
2. Based on the local information reconstruction social network, the related problems of network node information deficiency and the like in the public opinion propagation research process are effectively solved, so that the related research of public opinion propagation is more efficient, and the serious influence and harm caused by bad public opinion propagation are effectively controlled.
Drawings
In order to efficiently and more clearly illustrate the specific embodiments of the present invention, the following description will be made with reference to the accompanying drawings. The following description is merely a partial example of a key implementation, and other figures may be further derived from the following figures.
FIG. 1 is a schematic flow chart of the present invention
FIG. 2 is a diagram showing a topology of a social network constructed by an embodiment of the present invention
FIG. 3 is a diagram showing the structure of a network topology after addition of missing points according to the second embodiment of the present invention
FIG. 4 is a schematic view of missing point clustering after the third step of the present invention
FIG. 5 is a diagram of a social network topology reconstructed by an example of the present invention
FIG. 6 shows experimental data sets of an example of the invention
FIG. 7 shows a new adjacency matrix T1 and a new network G1 in accordance with an embodiment of the present invention
FIG. 8 shows a new adjacency matrix T2 and a network G2 constructed from adjacency matrix T2 in accordance with an embodiment of the present invention
FIG. 9 shows a network of scientific research partnerships reconstructed by an example of the present invention
Detailed Description
The technical scheme of the invention is further described below with reference to the accompanying drawings and examples.
As shown in fig. 1, the social network reconstruction method based on local information of the present invention comprises the following specific steps:
step one: data preprocessing
a. Assuming a small social network G with 7 nodes for verifying the propagation model, the method can be used in a large-scale network, and a simple explanation is made on the small-scale network. One sub-network of the network G' is shown in fig. 2 as an experimental dataset. The social network G' is in a incomplete state at this time, so that it is very disadvantageous for the implementation of the propagation model, which needs to be supported by a complete network structure, so that it is necessary to consider replacing the network or complementing the incomplete network in a method. The replacement of the network is complex, the same problems can occur in the new network, and some data loss problems occur, so that the use of a technical method to perfect the structure of the complementary network can be considered, and the following known conditions can be obtained through the user relation table and the node table:
G’={V,E},V’={V 1 ,V 2 ,V 3 ,V 5 ,V 6 },E’={(V 1 ,V 2 ),(V 1 ,V 3 ),(V 2 ,V 3 ),(V 3 ,V 5 ),(V 5 ,V 6 )}。
D={(V 1 ,3),(V 2 ,4),(V 3 ,4),(V 5 ,4),(V 6 ,2)},d={(V 1 ,2),(V 2 ,2),(V 3 ,3),(V 5 ,3),(V 6 ,1)}。
b. and constructing an adjacency matrix, and constructing a corresponding adjacency matrix T according to the acquired topological structure of the known sub-network G'.
The adjacency matrix T is a 5 x 5 adjacency matrix, which is constructed by 0 and 1, 0 represents no relation between two nodes, and 1 represents correlation between two nodes.
Step two: adding placeholder nodes according to node attributes and network topology
21 Randomly selecting a node V in the network G 1 As a start node, first determine d 1 Whether or not to equal D 1
22 D) is as d 1 =2<D 1 If=3, adding placeholder nodes for the nodes, wherein the number of the added placeholder nodes is represented by D 1 -d 1 =3-2=1, and thus is node V 1 Adding a placeholder node Z1; and so on, judging the next node; until all nodes in the network G' are judged to be completed, adding Z1 to Z7 in total to occupy 7 positionsThe symbol node, and simultaneously updating the adjacency matrix T of the network G' to obtain a new adjacency matrixAnd a new network G1 as shown in fig. 3.
T1 is a matrix of 12 x 12. In the second step, 7 placeholder nodes are added, and the adjacent matrix T is updated to a matrix T1 of 12G from 5G 5.
Step three: clustering the added placeholder nodes by using a clustering algorithm to determine missing points
And D, clustering the 7 added placeholder nodes in the step two to determine the specific positions of the missing points, eliminating redundant placeholder nodes and redundant edges, and increasing the accuracy of network reconstruction.
31 First, the connection vector of each placeholder node is calculated by the CN link prediction algorithm. Obtaining a similarity matrix S through a link prediction algorithm, wherein the row of the placeholder nodes in the matrix S represents a connection vector among the placeholder nodes, namely S i Representing the connection vector of placeholder node i.
According to CN algorithm, obtaining similarity matrix S:
32 Similarity matrix S 1 Represents the first row in the matrix, node V 1 Is identical to the other nodes. The connection vector of the placeholder node added in the step two is positioned in the 6 th to 12 th rows. Extracting the connection vector of the placeholder nodes, calculating the pearson correlation r between every two placeholder node connection vectors,the correlation matrix R is constructed using pearson correlation,
33 K-means clustering operation is carried out, and the K-means clustering operation is carried out in a matrix REach row of (1) represents the correlation between one placeholder node and other placeholder nodes, and as a clustering object, each clustering object also represents one placeholder node, and through clustering operation, a sample set is clustered into placeholder clusters, each placeholder cluster represents a missing point, and simultaneously, the adjacency matrix T1 of the network G1 is updated to obtain a new adjacency matrixAnd a network G2 constructed by the adjacency matrix T2, as shown in fig. 4, the set of placeholder nodes in the two ellipses represents a missing point, and two missing points L1 and L2 are determined after clustering is completed.
Step four: determining missing edges in a network using a link prediction algorithm
41 Randomly selecting a node in the network G2, judging whether the selected node is a missing point determined after the completion of the step trimerization class, if not, performing subsequent calculation, and if so, re-selecting the node.
42 At node V) 1 For example, a similarity matrix is calculated, i.e. node V is calculated 1 Is used for the connection vector of (a).
43 Judging to obtain D 1 =d 1 ' because of node v=3 1 If the node V is connected with a missing point, judging again according to the similarity score, and selecting the node V 5 Make connection, disconnect node V 1 Connection to the deletion point.
44 If d) i ’=D i Judging node V 1 And node V 5 Is higher than the score with the missing point L1, thus disconnecting the node V 1 Connection to deletion point L1, V 5 Is connected with the connecting rod.
45 Repeating steps 41), 42), 43) until all nodes except the missing points determined after the completion of the trimerization class in the network G2 are calculated once, and updating the adjacency matrix T2 to obtain a new adjacency matrix
Step five: the network is constructed by the adjacency matrix, and as shown in fig. 5, the gray nodes are the complementary missing points.
The reconstructed network can be applied to a public opinion propagation model, the network required by the public opinion propagation model has a complete network structure and no information loss, and the reconstructed network can well solve the problem by the method, so that the research of the propagation process is more accurate and clear. The invention is not limited to public opinion propagation blocks, and can be applied to other disease propagation models like SIR, SIS and the like to perfect network structures and realize good propagation model effects.
Specific examples of application of the method of the present invention will be given below. The method can be well applied to researches on scientific research cooperation relations of students, researches on relations between excavation directors and students and the like, and a complete network structure is required to be used as a support, so that the method can be used in the scene. The following are provided: the practical application of the invention is described by taking a scientific research cooperation relation network among scholars as an example.
Step one: data preprocessing
Suppose a partnership network G with 10 closely related DBLP authors belongs to a sub-network of the network G, G' being shown in fig. 6 as an experimental dataset. Because of the problems of data loss and the like, the relationship network G' is in a incomplete state, and part of edges and nodes are absent, namely part of scholars and their cooperation relationship are not collected. At this time, the method of the invention can be used for constructing a complete scientific research cooperation relation network. In a scientific research cooperation network, the connection relationship indicates that a cooperation paper exists among scholars, the scholars are taken as nodes, the number of scholars who cooperate with the scholars is taken as the degree, a user relationship table and a node table are constructed according to data, and the following known conditions are obtained at the same time:
G’={V’,E’},V’={V 1 ,V 2 ,V 3 ,V 4 ,V 5 ,V 7 ,V 8 ,V 9 },E’={(V 1 ,V 2 ),(V 1 ,V 3 ),(V 2 ,V 8 ),(V 3 ,V 4 ),(V 3 ,V 5 ),(V 4 ,V 5 ),(V 4 ,V 9 ),(V 7 ,V 8 ),(V 8 ,V 9 )};
D={(V 1 ,3),(V 2 ,5),(V 3 ,4),(V 4 ,6),(V 5 ,1),(V 7 ,3),(V 8 ,3),(V 9 ,4)},d={(V 1 ,2),(V 2 ,2),(V 3 ,3),(V 4 ,4),(V 5 ,1),(V 7 ,2),(V 8 ,3),(V 9 ,3)}。
and constructing an adjacency matrix, and constructing a corresponding adjacency matrix T according to the acquired topological structure of the known sub-network G'. The adjacency matrix is built with 0, 1, 0 representing no partnership between two students, 1 representing partnership between two students.
Step two: adding placeholder nodes according to node attributes and network topology
21 Randomly selecting a node V in the network G 1 As a start node, first determine d 1 Whether or not to equal D 1
22 D) is as d 1 <D 1 Adding placeholder nodes for the nodes, wherein the number of the added placeholder nodes is represented by D i -d i =3-2=1, and thus is node V 1 Adding 1 placeholder node; and so on, judging the next node; until all nodes in the graph G 'are judged to be complete, the adjacency matrix T of the network G' is updated to obtain a new adjacency matrix T1, and a new network G1 is obtained, as shown in fig. 7. In the second step, 7 placeholder nodes are added, and the adjacent matrix T is updated to a matrix T1 of 15 x 15 from 8 x 8.
Step three: clustering the added placeholder nodes by using a clustering algorithm to determine missing points
And D, clustering the 7 added placeholder nodes in the step two to determine the specific positions of the missing points, eliminating redundant placeholder nodes and redundant edges, and increasing the accuracy of network reconstruction.
31 First, the connection vector of each placeholder node is calculated by the CN link prediction algorithm.Obtaining a similarity matrix S through a link prediction algorithm, wherein the row of the placeholder nodes in the matrix S represents a connection vector among the placeholder nodes, namely S i Representing the connection vector of placeholder node i.
According to CN algorithm, obtaining similarity matrix S:
S=T1×T1
32 Similarity matrix S 1 Represents the first row in the matrix, node V 1 Is identical to the other nodes. Extracting the connection vector of the placeholder nodes, calculating the pearson correlation r between every two placeholder node connection vectors, the correlation matrix R is constructed using pearson correlation,
33 K-means clustering, each row in the matrix R representing the correlation between one placeholder node and other placeholder nodes, as a clustering object, each cluster object also representing one placeholder node, clustering the sample sets into placeholder clusters by the clustering, each placeholder cluster representing one missing point, and updating the adjacency matrix T1 of the network G1 to obtain a new adjacency matrix T2 and the network G2 constructed by the adjacency matrix T2, as shown in fig. 8, each set of placeholder nodes representing one missing point.
Step four: determining missing edges in a network using a link prediction algorithm
41 Randomly selecting a node in the network G2, judging whether the selected node is a missing point determined after the completion of the step trimerization class, if not, performing subsequent calculation, and if so, re-selecting the node.
42 At node V) 1 For example, a similarity matrix is calculated, i.e. node V is calculated 1 Is used for the connection vector of (a).
43 Judging to obtain D 1 =d 1 ' due to node V i If the node V is connected with one missing point, judging again according to the similarity score, and selecting the node V with high similarity score 4 Make connection, disconnect node V 1 Connection to the deletion point.
44 If d) 1 ’=D 1 Judging node V 1 And node V 4 Is higher than the score with the missing point, thus disconnecting node V 1 Connection to the deletion point, V 4 Is connected with the connecting rod.
45 Repeating steps 41), 42) and 43) until all nodes except the missing points determined after the completion of the trimerization of step G2 are calculated once, and updating the adjacent matrix T2 to obtain a new adjacent matrix T3;
step five: the network is constructed by means of an adjacency matrix as shown in fig. 9.
The reconstructed network can be applied to scientific research cooperation relation research of students, research cooperation teacher-student relation research and the like, and the problem of data loss can be well solved by the reconstructed network, so that the research of cooperation relation is more accurate and clear. The invention is not limited to the research of cooperative relationship, and can be applied to other fields to perfect the network structure and realize good research effect.

Claims (5)

1. A social network reconstruction method based on local information is characterized by comprising the following steps: the method comprises the following steps:
step one: preprocessing data; acquiring social network data through a web crawler or using an existing data set, performing sorting analysis on the data to obtain experimental data, and dividing the experimental data into a user node table and a user relation table to be respectively stored;
step two: adding placeholder nodes according to node attributes and a network topology structure; the placeholder node is a temporary node which exists for locating the missing point;
step three: clustering the added placeholder nodes in the second step to determine the specific number and positions of the missing points, eliminate redundant placeholder nodes and increase the accuracy of network reconstruction;
step four: determining missing edges in the network using a link prediction algorithm;
step five: reconstructing a network structure according to the processing result of the steps.
2. The social network reconstruction method based on local information according to claim 1, wherein: the first specific process of the step is as follows:
11 Cleaning and sorting the data to obtain a user relation table and a user node table, wherein the two tables are connected through a node id;
12 Constructing a sub-network G ', G ' = { V ', E ' }, V ' = { V using a user relationship table 1 ,V 2 ,V 3 ,…,V m },E’={(V i ,V j ) …, consisting of m nodes, the structure is known; knowing that the network G' belongs to a sub-network of the network G, the node size in the network G is known as n, V i Representing nodes, set V 'belonging to a subset of set V, set E' belonging to a subset of set E, n>m;
13 Node table is used to obtain the originality set d= { (V) of the nodes 1 ,D 1 ),(V 2 ,D 2 ),(V 3 ,D 3 ),…,(V m ,D m ) The node degree of existence set d= { (V) 1 ,d 1 ),(V 2 ,d 2 ),(V 3 ,d 3 ),…,(V m ,d m )};D i Representing the degree of node i in the network G, expressed as the sum of the number of fans and the number of interests of the user node, d i Representing the degree of node i in the network G';
14 Building a adjacency matrix T, T of the network G 'by means of a known network G' ij The connection relation between the nodes i and j is represented by 0 and 1, and if the nodes are connected, the connection relation is 1, otherwise, the connection relation is 0.
3. The social network reconstruction method based on local information according to claim 1, wherein: the specific process of the second step is as follows:
21 Randomly selecting a node i in the network G' as a starting node, and firstly judging d) i Whether or not to equal D i
22 If d) i <D i Adding placeholder nodes for the nodes, wherein the number of the added placeholder nodes is represented by D i -d i Determining;
23 If d) i =D i Skipping the current node and judging the next node;
24 Recursively traversing all nodes in the network G ', adding placeholder nodes to the network G ', and simultaneously updating the adjacency matrix T of the network G ' to obtain a new adjacency matrix T1 and a new network G1.
4. The social network reconstruction method based on local information according to claim 1, wherein: the third concrete process is as follows:
31 Firstly, calculating a connection vector of each placeholder node through a link prediction algorithm, wherein the connection relation refers to whether edges are connected between nodes, and the connection vector refers to a vector of the connection relation between the nodes and other nodes; obtaining a similarity matrix S through a link prediction algorithm, wherein the row of the placeholder nodes in the matrix S represents a connection vector among the placeholder nodes, namely S i A connection vector representing a placeholder node i;
32 The pearson correlation between the placeholder node connection vector pairs is calculated, a correlation matrix R is constructed, and elements in the correlation matrix R are formed by pearson correlation between the placeholder nodes, wherein the calculation formula of the pearson correlation is as follows:
33 After the calculation is completed to obtain the correlation matrix R, performing K-means clustering operation, wherein each row in the matrix R represents the correlation between one placeholder node and other placeholder nodes, and can be regarded as a clustering object, each clustering object also represents one placeholder node, and through the clustering operation, the clustering objects are clustered into K placeholders Fu Cu, the placeholder nodes in the same placeholder cluster represent a missing point, the missing point is added into the network G1, and meanwhile, the adjacency matrix T1 of the network G1 is updated to obtain a new adjacency matrix T2 and a new network G2 constructed by the adjacency matrix T2.
5. The social network reconstruction method based on local information according to claim 1, wherein: the specific process of the step four is as follows:
41 Randomly selecting a node i in the network G2 as a starting node, and skipping the node if the selected node is a missing point determined after the completion of the trimerization class; recalculating the degree d of a node from the adjacency matrix T2 of the network G2 i 'A'; judgment d i ' and D i Size d of (d) i ' represents the degree, D, of node i in the network G2 i Representing the degree of node i in the network G;
42 Calculating a similarity matrix Sim by a link prediction algorithm i Representing the row of the node i in the similarity matrix Sim, wherein each element represents the similarity score Grade of the node i and other nodes;
43 Judgment of D) i And d i ' size, if D i >d i ' if yes, indicating that the node i is not full, wherein the node i has a missing edge, selecting a continuous edge for the node i through the similarity score in the similarity matrix Sim, and entering the next node operation;
44 If d) i ’=D i Then calculate D i -d i Judging according to the similarity matrix Sim, judging the similarity score of the node i and other nodes, if the similarity score of the node i and the connected missing node is smaller than that of the node with the non-missing node, disconnecting the node with the missing node, selecting the node with the high similarity score to connect the node with the non-missing node, otherwise, skipping the current node i, and entering the next node operation;
45 Repeating steps 41), 42), 43) until all nodes in the network G2 except the missing points determined after the clustering by step 3 are calculated once, and updating the adjacency matrix T2 to obtain a new adjacency matrix T3.
CN202011123548.7A 2020-10-20 2020-10-20 Social network reconstruction method based on local information Active CN112256941B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011123548.7A CN112256941B (en) 2020-10-20 2020-10-20 Social network reconstruction method based on local information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011123548.7A CN112256941B (en) 2020-10-20 2020-10-20 Social network reconstruction method based on local information

Publications (2)

Publication Number Publication Date
CN112256941A CN112256941A (en) 2021-01-22
CN112256941B true CN112256941B (en) 2023-10-13

Family

ID=74245084

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011123548.7A Active CN112256941B (en) 2020-10-20 2020-10-20 Social network reconstruction method based on local information

Country Status (1)

Country Link
CN (1) CN112256941B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8560605B1 (en) * 2010-10-21 2013-10-15 Google Inc. Social affinity on the web
CN105228185A (en) * 2015-09-30 2016-01-06 中国人民解放军国防科学技术大学 A kind of method for Fuzzy Redundancy node identities in identification communication network
CN110321493A (en) * 2019-06-24 2019-10-11 重庆邮电大学 A kind of abnormality detection of social networks and optimization method, system and computer equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8560605B1 (en) * 2010-10-21 2013-10-15 Google Inc. Social affinity on the web
CN105228185A (en) * 2015-09-30 2016-01-06 中国人民解放军国防科学技术大学 A kind of method for Fuzzy Redundancy node identities in identification communication network
CN110321493A (en) * 2019-06-24 2019-10-11 重庆邮电大学 A kind of abnormality detection of social networks and optimization method, system and computer equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
个体视角下的网络舆情传递链路预测分析;魏静;朱恒民;宋瑞晓;蒋世兵;;现代图书情报技术(01);全文 *

Also Published As

Publication number Publication date
CN112256941A (en) 2021-01-22

Similar Documents

Publication Publication Date Title
Celik et al. Mixed-drove spatiotemporal co-occurrence pattern mining
Huang et al. A similarity-based modularization quality measure for software module clustering problems
Baingana et al. Joint community and anomaly tracking in dynamic networks
Saoud et al. Community detection in networks based on minimum spanning tree and modularity
Mohammadmosaferi et al. Evolution of communities in dynamic social networks: An efficient map-based approach
Kaple et al. Viral marketing for smart cities: Influencers in social network communities
Chen et al. Learning Bayesian network structure from distributed data
Chen et al. A method for local community detection by finding maximal-degree nodes
Taha Disjoint community detection in networks based on the relative association of members
CN112165401A (en) Edge community discovery algorithm based on network pruning and local community expansion
Chen et al. Distributed web mining using bayesian networks from multiple data streams
CN112256941B (en) Social network reconstruction method based on local information
He et al. Genetic algorithm with ensemble learning for detecting community structure in complex networks
Fushimi et al. Estimating node connectedness in spatial network under stochastic link disconnection based on efficient sampling
CN112084418B (en) Microblog user community discovery method based on neighbor information and attribute network characterization learning
Jabbour et al. Detecting highly overlapping community structure by model-based maximal clique expansion
CN112949748A (en) Dynamic network anomaly detection algorithm model based on graph neural network
Zhang et al. Mean time to absorption on the joint Sierpinski gasket
Drugan et al. Detecting communities in sparse manets
CN105162648B (en) Corporations&#39; detection method based on backbone network extension
Roumpelaki et al. Marginal causal consistency in constraint-based causal learning
CN116342938A (en) Domain generalization image classification method based on mixture of multiple potential domains
Li A Data Mining-Based Method for Quality Assessment of Ideological and Political Education in Universities
CN113159976B (en) Identification method for important users of microblog network
Tao et al. Structural identity representation learning of blockchain transaction network for metaverse

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant