CN115333945B

CN115333945B - Local topology inference method of online social network

Info

Publication number: CN115333945B
Application number: CN202210776030.6A
Authority: CN
Inventors: 季宏宇; 李聪; 郝旭; 李翔
Original assignee: Fudan University
Current assignee: Fudan University
Priority date: 2022-07-03
Filing date: 2022-07-03
Publication date: 2023-06-16
Anticipated expiration: 2042-07-03
Also published as: CN115333945A

Abstract

The invention belongs to the technical field of large-scale network data analysis, and particularly relates to a local topology inference method of an online social network. The invention adopts an independent cascade model as an information dynamics propagation model, and deduces partial continuous edges of a network with N nodes through the observed cascade result; i.e., when specifying a set of key nodes, the goal is to infer the edges around all nodes in the set, including the edges pointed to from those nodes and all edges pointed to those nodes. The invention is based on an online social network, fully grasps the characteristics of large scale, sparse connecting edges and strong heterogeneity of the online social network, considers the directionality of information propagation, and provides a network local inference algorithm for performing forward neighbor inference and backward neighbor inference on key nodes in the network to obtain a local network topology composed of forward neighbors and backward neighbors of the key nodes. The algorithm accuracy rate of local topology inference is significantly improved compared to global network topology inference algorithms.

Description

Local topology inference method of online social network

Technical Field

The invention belongs to the technical field of large-scale network data analysis, and particularly relates to a local topology inference method of an online social network.

Background

In recent years, there have been many excellent efforts to infer social networks that are not directly available. The online social platform gradually goes deep into the life of people and becomes an indispensable social way in the life of people, so that the relationship on the online social platform of the inferred user gradually becomes a research hotspot. However, most of the work at present is not inferred specifically for online social networks, and does not take full advantage of the unique properties and characteristics of online social networks. Under the background, the invention provides a local topology inference method which is more suitable for online social network inference aiming at the characteristics of large scale, strong sparsity and strong heterogeneity of the online social network.

The size of online social networks tends to be huge, and in such quantities, attempts to infer the edges of the entire network are almost impossible to accomplish. In fact, due to the strong heterogeneity of online social networks, many nodes are at very marginal locations in the network, and when trying to infer user edges, the real interests are often not all nodes in the network, but some of the more important nodes, which may be called key nodes. In other words, in most cases we only want to infer what the edges around the critical node are, not all nodes that are not active. However, in the conventional network inference algorithms, even if only a part of the edges around the nodes are to be inferred, the inference needs to be performed on the whole network, and in the context of the online social network, such large-scale network inference needs to be performed, so that a large amount of observation data is required, the calculation amount is large, and the completion is almost impossible. Therefore, if the method can realize targeted inference of the edge connection condition around the key node concerned by people, a great amount of redundant calculation can be effectively avoided, and only observation data related to the concerned node is needed, so that the method has great practical significance and application prospect. More importantly, as the local inference is carried out on the key nodes, the interference of other irrelevant nodes is avoided, and the accuracy of the inference is effectively improved.

Disclosure of Invention

In view of the above circumstances, the invention aims to provide a local topology inference method more suitable for online social network inference so as to improve inference accuracy based on the online social network, which has the characteristics of large online social network scale, sparse connecting edges and strong heterogeneity.

The invention adopts an independent cascade model as a model of information dynamics propagation, and the observed cascade result D= [ D ] ¹ ，d ² ，…，d ^C ] ^T Inferring a network with N nodes

Is connected with the part of the edge. Specifically, when the set of key nodes S is specified _key When the goal is to infer the set S _key Including both the edges pointed to from these nodes and all edges pointing to these nodes.

The invention provides a local topology inference method of an online social network, which comprises the following specific steps:

step 1: specifying a set of key nodes S _key For the key node set S _key For one node u in the list, in the effective propagation time of the information, the node transmitting the information to the node u is called a forward neighbor of the node u; deducing the forward neighbor of the node u to obtain the local topology of the forward neighbor

Step 2: for a set of key nodes S _key For one node u in the list, the node u transmits the information in the effective propagation time of the information, and the nodes receiving the information transmitted by the node u are called backward neighbors of the node u; deducing backward neighbors of the key node set node u to obtain a local topology of the backward neighbors of the node u

Step 3: local network topology inference, namely, local topology of forward neighbors inferred by node u

And backward neighbor-local topology->

Integrating to obtain the forward and backward neighbor topology of the node u>

Key node S _key The corresponding local topology is->

In the invention, the specific flow of the step 1 is as follows:

step 1-1: finding a set of possible forward neighbors of node u based on timeliness of information propagation

Respectively using

To indicate the moment when node u and node v obtain information in the c-th propagation, in one propagation there must be +.>

Wherein t is _max Is the age of the message propagation. Accordingly, all propagation cascades can be traversed, all possible forward neighbors of node u are found, forming the set +.>

Step 1-2: calculating the probability that node u gets a message from node v in one propagation c

Equation (1) gives the probability that node u gets the message from node v in the c-th propagation, where a _v，u Representing the connected edge state of node pair (v, u), if there is a connected edge, a _v，u =1, otherwise a _v，u ＝0，μ _v，u (τ) is the latency distribution of node pair (v, u).

Equation (2) gives that node u is in the c-th propagation, in

The probability of not being infected by other nodes than node v.

Step 1-3: computing a set of key nodes S _key Likelihood function of forward neighbors and observed propagation results

Considering all possible forward neighbors in the c-th propagation, give node u +.>

The probability of obtaining information at the moment is shown in formula (3).

When all the cascade results are considered, the given node u produces a likelihood function of the propagation result D we observe, as shown in equation (4), where D _u Is the set of cascades in which node u participates.

Step 1-4: the introduction of survival and risk functions simplifies the likelihood functions.

Introducing common functions, wherein the survival function and the risk function are respectively represented by the formulas (5) and (6), so as to obtain a simplified likelihood function, as shown by the formula (7):

step 1-5: computing edge gain of maximum likelihood function when adding edges (m, u)

Calculating edge gain when the state (m, u) of node pair is switched from 0 to 1 when there is a conjoined edge

As shown in equation (8). Considering all the cascade results, the edge gain of the node when the edge is increased is obtained>

I.e., equation (9).

Step 1-6: computing edge gain of maximum likelihood function when deleting edge (m, u)

The edge gain can be obtained in the same way by comparing the increment of the maximum likelihood function when the edge is added, as shown in the formula (10) and the formula (11).

Step 1-7: inferred node u forward neighbor local topology

Bringing edge gains into the Markov chain-Monte Carlo sampling framework for the node u and all its possible forward neighbors

Traversing the node pairs formed by the method to maximize likelihood function and obtain a local topology of deducing forward neighbors of the node u by +.>

And (3) representing.

In the invention, the specific flow of the step 2 is as follows:

step 2-1: finding a set of possible backward neighbors of node u based on timeliness of information propagation

The specific steps are the same as 1-1.

Step 2-2: in the calculation of the c-th propagation, backward neighbor set

Likelihood function of intermediate node i and likelihood function taking into account cascading +.>

Equation (12) gives the node's backward neighbor set in the c-th propagation

Likelihood function of intermediate node i. Equation (13) considers all cascaded likelihood functions.

Step 2-3: computing edge gain of maximum likelihood function when adding edges (j, i)

Equation (14) gives the edge gain for adding one edge (j, i) to node i in the c-th pass, and equation (15) gives the edge gain when all cascades are considered.

Step 2-4: computing edge gain of maximum likelihood function when deleting edge (j, i)

So equation (16) gives the edge gains for deleting one edge (j, i) for node i in the c-th propagation, respectively, and equation (17) gives the edge gains when all cascades are considered.

Step 2-5: inferred node u backward neighbor local topology

And (3) representing.

The innovation point of the invention is that:

(1) Aiming at the characteristic of large scale of the online social network, the idea of carrying out local network inference around key nodes is provided, so that a large amount of redundant calculation and a large amount of requirements on input data are avoided, and the inference of the connecting edges in the large-scale online social network is possible;

(2) The invention provides a network local topology inference algorithm, and carries out simulation verification in a real data set, and results show that the network local topology inference algorithm can accurately infer local edges of a network and has higher inference accuracy.

The invention is based on an online social network, fully grasps the characteristics of large scale, sparse connecting edges and strong heterogeneity of the online social network, considers the directionality of information propagation, and provides a network local inference algorithm for performing forward neighbor inference and backward neighbor inference on key nodes in the network aiming at the key nodes, thereby obtaining the local network topology consisting of the forward neighbors and the backward neighbors of the key nodes. The algorithm accuracy of the local topology inference is significantly improved compared to the global network topology inference algorithm.

Drawings

FIG. 1 is an independent cascading model in the network inference problem of the present invention.

FIG. 2 is a schematic diagram of a partial inference as discussed in the present invention.

FIG. 3 is a flow chart of a Markov chain-Monte Carlo sampling method in the network inference problem of the present invention.

FIG. 4 is a plot of local inferred accuracy versus number of key nodes for BA scaleless network (a) and Twitter sub-social network (b).

Detailed Description

In order that the above-recited objects and novel features of the present invention can be readily understood, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings.

An independent cascade model as shown in fig. 1 was used as a model for the propagation of information dynamics by the observed cascade result d= [ D ] ¹ ，d ² ，…，d ^C ] ^T Inferring a network with N nodes

Is connected with the part of the edge. Specifically, when the set of key nodes S is specified _key When the goal is to infer the set S _key Including both the edges pointed to from these nodes and all edges pointing to these nodes. Since the adjacency matrix is a common and visual representation, the method for storing the network edge condition is shown in FIG. 2, which shows an adjacency matrix diagram of the whole network, wherein the shaded portion is shown as S _key ＝{key ₁ ，key ₂ We infer the target range at the time of }. Because of the directionality of information propagation, it is considered that the information propagates along the direction of the continuous edge, so that the neighbors of any node u are divided into forward neighbor nodes, that is, neighbors that can transfer information to u, and backward neighbor nodes, that is, nodes that can receive information sent by u. In the inference process, we will infer their forward neighbors and backward neighbors separately for each nodeTo the neighbors.

step 1: specifying a set of key nodes S _key For the key node set S _key For one node u, the node that transmits information to node u during the effective propagation time of the information is called the forward neighbor of node u. Deducing the forward neighbor of the node u to obtain the local topology of the forward neighbor

Step 2: for a set of key nodes S _key For one node u in the list, the node u transmits the information during the effective propagation time of the information, and the nodes receiving the information transmitted by the node u are called backward neighbors of the node u. Deducing backward neighbors of the key node set node u to obtain a local topology of the backward neighbors of the node u

And backward neighbor-local topology->

Integrating to obtain the forward and backward neighbor topology of the node u>

Key node S _key The corresponding local topology is->

In the invention, the specific flow of the step 1 is as follows:

step 1-1: finding possible front of node u based on timeliness of information propagationTo neighbor set

Node u, which obtains information from node v, must be t after node v obtains the message _max Information is obtained during the time period. Respectively using

I.e. not all nodes infected before node u may be its forward neighbors, but also have to meet +.>

Accordingly, all propagation cascades can be traversed, all possible forward neighbors of node u are found, forming the set +.>

Equation (35) gives the probability that node u gets the message from node v in the c-th propagation, where a _v，u Representing the connected edge state of node pair (v, u), if there is a connected edge, a _v，u =1, otherwise a _v，u ＝0，ρ _v，u (τ) is the latency distribution of node pair (v, u).

Equation (36) gives that node u is in the c-th propagation, in

The probability of not being infected by other nodes than node v. Since we only consider that the first time node u gets information to be valid, if we want to ensure that node u gets information from node v, we need to ensure that +.>

Not previously infected by other possible forward neighbors.

Node u is in c, possibly obtaining information from other nodes, so taking into account all possible forward neighbors in this propagation, giving node u +.>

The probability of obtaining information at the moment is shown in formula (20).

When all the cascade results are considered, the given node u produces a likelihood function of the propagation result D we observe, as shown in equation (38), where D _u Is the set of cascades in which node u participates.

The usual functions are introduced, the survival function and the risk function are represented by formulas (22) and (23), respectively, and a simplified likelihood function is obtained as shown by formula (24).

For either propagation c, we calculate for node u the increment of the maximum likelihood function when adding one conjoined edge (m, u), i.e., the edge gain when the state (m, u) of the node pair switches from 0 to 1 with a conjoined edge

As shown in equation (42). Although the formula appears to be complex, the main calculation amount of the (m, u) edge gain is only +.>

And->

Considering all the cascade results, the edge gain u of the node at the time of edge connection is increased, i.e., formula (43).

The edge gain can be obtained in the same way by comparing the increment of the maximum likelihood function when the edge is added, as shown in the formula (44) and the formula (45).

Step 1-7: inferred node u forward neighbor local topology

Bringing the edge gains into the Markov chain-Monte Carlo sampling framework shown in FIG. 4 for the node u and all its possible forward neighbors

And (3) representing.

In the invention, the specific flow of the step 2 is as follows:

The specific steps are the same as 1-1.

Step 2-2: computing a set of key nodes S _key Likelihood function of backward neighbor node i and observed propagation result

Equation (46) gives the node's backward neighbor set in the c-th propagation

The likelihood function of the intermediate node i, notably, is concerned only with the node i at S due to the local inference _key In (c), the condition u e S is limited in the formula _key And j E S _key . Equation (47) considers all cascaded likelihood functions.

Equation (48) gives the edge gain for adding one edge (j, i) to node i in the c-th pass, and equation (49) gives the edge gain when all cascades are considered.

So equation (50) gives the edge gains for deleting one edge (j, i) for node i in the c-th propagation, and equation (51) gives the edge gains when all cascades are considered.

Step 2-5: inferred node u backward neighbor local topology

Bringing the edge gains into the Markov chain-Monte Carlo sampling framework shown in FIG. 3 for the node u and all its possible backward neighbors

And (3) representing.

Taking a BA scaleless network comprising 1000 nodes and a Twitter sub-social network comprising 1973 nodes as examples, the algorithm of the present invention was used to make local network topology inferences. The parameters selected in this experiment were m=10，max _l ag＝10，burn _i n=10, and according to the number of network nodes, key node sets with different sizes are selected. The experimental results obtained are shown in table 1 and table 2. Wherein Table 1 and Table 2 are the accuracy of backward inference, forward inference, and local inference for BA scaleless networks and Twitter sub-social networks, respectively. FIG. 4 is a graph of BA scaleless networks and Twitter sub-social networks, local inference accuracy as a function of number of key nodes. It can be found that the accuracy of forward inference and local inference is less affected by the key nodes, while the accuracy of backward inference is more affected by the key nodes. In addition, the local inference accuracy on the BA scaleless network is generally higher than that on the Twitter sub-social network, which indicates that the type of network has a certain influence on the inference accuracy.

Table 1, local inferred accuracy of ba scaleless network.

Table 2, local inference accuracy for twitter sub-social networks.

/>

Claims

1. A local topology inference method of an online social network is characterized in that an independent cascade model is adopted as an information dynamics propagation model, and an observed cascade result D= [ D ] is adopted ¹ ，d ² ，…，d ^C ] ^T Inferring a network with N nodes

Is connected with the edge of the part; in particular, when referring toSet of fixed key nodes S _key When the goal is to infer the set S _key Including the edges pointed out from these nodes, and all edges pointing to these nodes; the method comprises the following specific steps:

Step 2: for a set of key nodes S _key For one node u in the list, the node u transmits the information in the effective propagation time of the information, and the nodes receiving the information transmitted by the node u are called backward neighbors of the node u; deducing backward neighbors of the node u in the key node set to obtain a local topology of the backward neighbors of the node u

And backward neighbor-local topology->

Integrating to obtain the forward and backward neighbor topology of the node u>

Set of key nodes S _key The corresponding local topology is->

Deducing a key node set S described in step 1 _key The forward neighbor of the intermediate node u is obtained to obtain the local topology of the forward neighbor of the node u

The specific flow is as follows:

step 1-1: finding a set of possible forward neighbors of node u from information propagation

Step 1-3: calculating likelihood functions of forward neighbors of key node u and observed propagation results

Step 1-4: leading in survival function and risk function to simplify likelihood function;

Step 1-7: inferred node u forward neighbor local topology

Deducing the key node set S in the step 2 _key Backward neighbor of middle node u, concrete flowThe process is as follows:

Step 2-2: calculating likelihood functions of possible backward neighbor nodes i of key node u and observed propagation results

Step 2-5: inferred node u backward neighbor local topology

In step 1:

step 1-1 of finding out a set of possible forward neighbors of the node u according to the timeliness of information propagation

The operation flow of (1) is as follows:

respectively using

Wherein t is _max Is the age of message propagation; accordingly, all propagation cascades are traversed, all possible forward neighbors of the node u are found, and a set is formed

In the calculation of one propagation c described in step 1-2, the probability of obtaining a message from node v by node u +.>

The method of (1) is as follows:

in the c-th propagation, the probability that node u gets a message from node v is:

wherein a is _v，u Representing the connected edge state of node pair (v, u), if there is a connected edge, a _v，u =1, otherwise a _v，u ＝0，ρ _v，u (τ) is the latency distribution of node pair (v, u);

node u in the c-th propagation, in

The probability that a node other than node v has not been infected is:

step 1-3 calculating the Key node set S _key Likelihood function of forward neighbors and observed propagation results

The operation flow of (1) is as follows:

giving the node taking into account all possible forward neighbors in the c-th propagationu in this propagation

The probability of obtaining information at the moment is shown in the formula (3):

when all the cascade results are considered, giving the likelihood function that node u produces the observed propagation result D, as shown in equation (4), where D _u Is the set of cascades in which node u participates:

the operation flow of introducing the survival function and the risk function simplified likelihood function in the step 1-4 is as follows:

introducing common functions, wherein the survival function and the risk function are respectively represented by the formulas (5) and (6), and further obtaining a simplified likelihood function, wherein the simplified likelihood function is represented by the formula (7):

computing edge gain of maximum likelihood function when adding edges (m, u) as described in steps 1-5

The method of (1) is as follows:

As shown in formula (8); considering all the cascade results, the edge gain of the node when the edge is increased is obtained>

Namely, formula (9):

computing edge gains of maximum likelihood functions when deleting edges (m, u) as described in steps 1-6

The specific method comprises the following steps:

comparing the increment of the maximum likelihood function when the edge is added, and obtaining edge gain according to the same processing mode, wherein the edge gain is shown as a formula (10) and a formula (11):

step 1-7 the inferred node u forward neighbor is a local topology

The specific method comprises the following steps:

A representation;

in step 2:

step 2-1 of finding out the possible backward neighbor set of the node u according to the timeliness of information propagation

The specific operation is the same as the step 1-1;

step 2-2 calculating backward neighbor set in the c-th propagation

The likelihood function calculation formula of the middle node i is as follows:

the calculation taking into account all the concatenated likelihood functions is:

computing edge gain of maximum likelihood function when adding edges (j, i) as described in step 2-3

The calculation formula is the following formula (14), and the calculation formula of the edge gain when all cascading is considered is the following formula (15);

edge gain of maximum likelihood function when deleting the edge (j, i) in step 2-4

The expression is the following expression (16), and the expression of the edge gain when all cascading is considered is the following expression (17):

step 2-5 the inferred node u backward neighbor is a local topology

The specific way is to bring the edge gain into the Markov chain-Monte Carlo sampling framework for the node u and all its possible forward neighbors +.>

And (3) representing.