CN112184468A

CN112184468A - Dynamic social relationship network link prediction method and device based on spatio-temporal relationship

Info

Publication number: CN112184468A
Application number: CN202011047469.2A
Authority: CN
Inventors: 江逸楠; 刘家琛; 王亚珅; 陈诚; 吉祥; 张雪莹
Original assignee: Electronic Science Research Institute of CTEC
Current assignee: Electronic Science Research Institute of CTEC
Priority date: 2020-09-29
Filing date: 2020-09-29
Publication date: 2021-01-05

Abstract

The invention provides a method and a device for predicting a dynamic social relationship network link based on a spatio-temporal relationship, wherein the method comprises the following steps: acquiring dynamic social relationship data, and preprocessing the dynamic social relationship data to generate a sample set; constructing a weighted similarity characteristic time sequence for any node pair in the sample set; calculating the characteristic value of any node pair at the moment to be predicted by adopting a preset algorithm based on the weighted similarity characteristic time sequence to construct a characteristic matrix; and inputting the characteristic matrix into a pre-trained classification model, and outputting possible links of the dynamic social relationship network at the moment to be predicted. The invention establishes the characteristic time sequence of the dynamic network on the basis of the network topological structure characteristics and the link generation time sequence information, and expands the prediction method from a static network to a dynamic time-varying network. Moreover, the weight is introduced into the link prediction problem, the network structure characteristics and the node link characteristics are fused, and the accuracy of the prediction result is improved by combining a statistical model and a supervised learning method.

Description

Dynamic social relationship network link prediction method and device based on spatio-temporal relationship

Technical Field

The invention relates to the technical field of machine learning, in particular to a method and a device for predicting a dynamic social relationship network link based on a spatiotemporal relationship.

Background

The research on the social relationship network is often modeled by adopting the idea of network science, namely, network nodes are used for representing individuals in the social network, and connection edges/links between the nodes are used for representing the relationships between the individuals. The link prediction problem of the social relationship network mainly carries out mining and prediction around the relationship between individuals, and is one of the basic problems of the social relationship network research. Meanwhile, a remarkable characteristic of the social relationship network is that the social relationship network has high dynamic performance, namely, the scale (the number of nodes/links), the structure and the interaction behavior among the nodes of the network are constantly changed. Therefore, the dynamic network link prediction problem considering the time-space characteristics and the frequency characteristics of node interaction has important practical application value.

One common idea of dynamic network link prediction is to introduce a time series prediction model in a method based on static network topology features, such as calculating a node similarity score in each time period by using the structure information of the network, and then calculating a future similarity score by using an autoregressive integrated moving average model (ARIMA) as a prediction model and performing final link prediction. However, considering that the similarity indexes are numerous, how to design a good similarity evaluation function is a difficult point of the method. The method based on machine learning introduces a classic classification algorithm to predict by regarding a link prediction problem as a binary classification problem, and has obtained a better result on static link prediction, but for a dynamic network, network spatio-temporal characteristics and weight characteristics need to be better considered.

Disclosure of Invention

The invention provides a method and a device for predicting a dynamic social relationship network link, aiming at solving the technical problem of improving the accuracy of dynamic social relationship network prediction.

The prediction method of the dynamic social relationship network link based on the spatio-temporal relationship comprises the following steps:

acquiring dynamic social relationship data, and preprocessing the dynamic social relationship data to generate a sample set;

constructing a weighted similarity characteristic time sequence for any node pair in the sample set;

calculating the characteristic value of any node pair at the moment to be predicted by adopting a preset algorithm based on the weighted similarity characteristic time sequence to construct a characteristic matrix;

and inputting the characteristic matrix into a pre-trained classification model, and outputting possible links of the dynamic social relationship network at the moment to be predicted.

According to the prediction method of the dynamic social relationship network link based on the spatio-temporal relationship, provided by the embodiment of the invention, the characteristic time sequence of the dynamic network is established on the basis of the network topological structure characteristics and link generation time sequence information, so that the application range of the prediction method is expanded from a static network to a dynamic time-varying network. The invention introduces the weight into the link prediction problem, and better reflects the practical characteristics of the network. The invention integrates the network structure characteristics and the node link characteristics, and combines the statistical model and the supervised learning method, thereby being more suitable for the actual situation and having better prediction effect, and improving the accuracy of the prediction result.

According to some embodiments of the invention, preprocessing the dynamic social relationship data comprises: and dividing the dynamic social relation data into a plurality of sub-networks according to a preset time interval.

In some embodiments of the invention, preprocessing the dynamic social relationship data comprises: and assigning a corresponding weight to each node pair based on the link relation of each node pair.

According to some embodiments of the invention, the pre-trained classification model is trained using a random forest or support vector machine algorithm.

In some embodiments of the invention, the predicted dynamic social relationship network is evaluated using an AUC evaluation metric.

The prediction device of the dynamic social relationship network link based on the spatio-temporal relationship comprises the following steps:

the data processing module is used for acquiring dynamic social relationship data and preprocessing the dynamic social relationship data to generate a sample set;

the characteristic time sequence construction module is used for constructing a weighted similarity characteristic time sequence for any node pair in the sample set;

the computing module is used for computing the characteristic value of any node pair at the moment to be predicted by adopting a preset algorithm based on the weighted similarity characteristic time sequence so as to construct a characteristic matrix;

and the classification prediction module is used for inputting the characteristic matrix into a pre-trained classification model and outputting possible links of the dynamic social relationship network at the moment to be predicted.

According to the prediction device of the dynamic social relationship network link based on the spatio-temporal relationship, provided by the embodiment of the invention, the characteristic time sequence of the dynamic network is established on the basis of the network topological structure characteristics and link generation time sequence information, so that the application range of the prediction method is expanded from a static network to a dynamic time-varying network. The real characteristics of the network are better reflected by introducing the weight into the link prediction problem. The invention integrates the network structure characteristics and the node link characteristics, and combines the statistical model and the supervised learning method, thereby being more suitable for the actual situation and having better prediction effect, and improving the accuracy of the prediction result.

According to some embodiments of the invention, the data processing module comprises: and the dividing module is used for dividing the dynamic social relationship data into a plurality of sub-networks according to a preset time interval.

In some embodiments of the invention, the data processing module comprises: and the weight assignment module is used for assigning corresponding weight to each node pair based on the link relation of each node pair.

In some embodiments of the invention, the apparatus further comprises:

and the result evaluation module is used for evaluating the predicted dynamic social relationship network by adopting an AUC evaluation index.

Drawings

FIG. 1 is a block diagram of a method for predicting links of a dynamic social relationship network based on spatiotemporal relationships, according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method for predicting links of a dynamic social relationship network based on spatiotemporal relationships, according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating a dynamic network model based on time slice division according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a prediction device for dynamic social relationship network links according to an embodiment of the present invention.

Reference numerals:

the prediction apparatus 100 is capable of predicting the prediction mode,

the system comprises a data processing module 10, a characteristic time sequence construction module 20, a calculation module 30 and a classification prediction module 40.

Detailed Description

To further explain the technical means and effects of the present invention adopted to achieve the intended purpose, the present invention will be described in detail with reference to the accompanying drawings and preferred embodiments.

s100, acquiring dynamic social relation data, and preprocessing the data to generate a sample set;

it should be noted that the preprocessing of the dynamic social relationship data may include: and dividing the dynamic social relationship data into a plurality of sub-networks according to a preset time interval. For example, based on time information known in dynamic networks, the entire time period may be divided into nA time slice, each time slice having an interval of (t)₁-t_o) And/n. That is, the interaction of the ith (i is more than or equal to 1 and less than or equal to n) time slice occurs in t₀,t₀+i*(t₁-t_o)/n]And (4) the following steps. If S denotes the entire network sequence, G_tRepresenting the network at time T, T being the time span of the entire network, S may be represented as S ═ G₁，G₂，G_t，…，G_T}。

Preprocessing the dynamic social relationship data, and may further include: and assigning corresponding weight to each node pair based on the link relation of each node pair. Based on different importance of components in the network, in order to measure influence of different edges on nodes on similarity, each node pair is given a proper weight w_(u,v)。

S200, constructing a weighted similarity characteristic time sequence for any node pair in the sample set;

for example, Weighted Common Neighbors (WCN), weighted Adamic-Adar (WAA), Weighted Resource Allocations (WRA), weighted preferential links (WPA), and Weighted Jaccard's Coeffient (WJC) may be chosen as the artificially extracted network node pair features.

On the basis, for any node pair, the weighted similarity characteristics of the node pair on each time slice are calculated to form a characteristic time sequence.

S300, calculating the eigenvalue of any node pair at the moment to be predicted by adopting a preset algorithm based on the weighted similarity characteristic time sequence to construct a characteristic matrix;

it should be noted that, after the time sequence of the network is established according to the time information, the evolution process of the network and the change trend of the node to the similarity can be obtained by observing the change situation between the adjacent subnetworks in the time sequence of the network, so that the time information of the network is utilized and finally applied to the link prediction of the dynamic network.

And S400, inputting the characteristic matrix into a pre-trained classification model, and outputting possible links of the dynamic social relationship network at the moment to be predicted.

According to some embodiments of the invention, the pre-trained classification model is trained using a random forest or support vector machine algorithm. The random forest classification model is composed of a series of decision trees, m samples are repeatedly and randomly extracted from an original training sample set in a replacement mode to generate a new training sample set, and then k classification trees are generated according to a self-help sample set to form a random forest. The support vector machine maps the vectors into a higher dimensional space in which a maximally spaced hyperplane is created. Two hyperplanes parallel to each other are built on both sides of the hyperplane separating the data, and the hyperplane separating the hyperplanes maximizes the distance between the two parallel hyperplanes. And training the training set by using the two supervised learning algorithms, and predicting the dynamic relationship of the nodes in the test set.

It should be noted that the present invention is not limited to the above two algorithms, and other machine learning algorithms, such as Multi-Layer probability, can be used for classification. The algorithm learns the characteristics more beneficial to the task through the hierarchical structure of the input layer, the hidden layer and the output layer.

In some embodiments of the invention, the predicted dynamic social relationship network is evaluated using an AUC evaluation metric. AUC may be understood as the probability that the score value of a randomly selected edge in a test set is higher than the score value of a randomly selected non-existing edge. During n independent comparisons, a missing link and a non-existing link are randomly selected to compare their similarity scores, and if the similarity score of an edge in the test set is greater than the similarity score of a non-existing edge for n' times, and the similarity scores of both n "times are the same, then the AUC is defined as follows:

by adopting the evaluation indexes, the prediction accuracy of the method can be evaluated, and the influence of the selection of different parameters such as time intervals on the prediction result can be further analyzed.

As shown in fig. 4, the apparatus 100 for predicting spatio-temporal relationship-based dynamic social relationship network links according to an embodiment of the present invention includes: a data processing module 10, a feature time series construction module 20, a calculation module 30 and a classification prediction module 40.

The data processing module 10 is configured to obtain dynamic social relationship data, and perform preprocessing to generate a sample set. Specifically, the data processing module 10 may include: the device comprises a dividing module and a weight assignment module.

The dividing module is used for dividing the dynamic social relationship data into a plurality of sub-networks according to a preset time interval. For example, based on time information known in the dynamic network, the dividing module may divide the entire time period into n time slices, each time slice being spaced apart by an interval of (t:)₁-t_o) And/n. That is, the interaction of the ith (i is more than or equal to 1 and less than or equal to n) time slice occurs in t₀,t₀+i*(t₁-t_o)/n]And (4) the following steps. If S denotes the entire network sequence, G_tRepresenting the network at time T, T being the time span of the entire network, S may be represented as S ═ G₁，G₂，G_t，…，G_T}。

And the weight assignment module is used for assigning corresponding weight to each node pair based on the link relation of each node pair. Based on different importance of components in the network, in order to measure influence of different edges on nodes on similarity, the weight assignment module can assign a proper weight w to each node pair_(u,v)。

The feature time sequence construction module 20 is configured to construct a weighted similarity feature time sequence for any node pair in the sample set.

On this basis, for any node pair, the feature time series construction module 20 calculates the weighted similarity feature of the node pair on each time slice to form a feature time series.

The calculation module 30 is configured to calculate a feature value of any node pair at a time to be predicted by using a preset algorithm based on the weighted similarity feature time sequence to construct a feature matrix.

The classification prediction module 40 is configured to input the feature matrix into a classification model trained in advance, and output a possible link of the dynamic social relationship network at a time to be predicted.

According to the prediction device 100 of the dynamic social relationship network link based on the spatio-temporal relationship, provided by the embodiment of the invention, the characteristic time sequence of the dynamic network is established on the basis of the network topological structure characteristics and the link generation time sequence information, so that the application range of the prediction method is expanded from a static network to a dynamic time-varying network. The real characteristics of the network are better reflected by introducing the weight into the link prediction problem. The invention integrates the network structure characteristics and the node link characteristics, and combines the statistical model and the supervised learning method, thereby being more suitable for the actual situation and having better prediction effect, and improving the accuracy of the prediction result.

In some embodiments of the invention, the apparatus further comprises: and the result evaluation module is used for evaluating the predicted dynamic social relationship network by adopting the AUC evaluation index. AUC may be understood as the probability that the score value of a randomly selected edge in a test set is higher than the score value of a randomly selected non-existing edge. During n independent comparisons, a missing link and a non-existing link are randomly selected to compare their similarity scores, and if the similarity score of an edge in the test set is greater than the similarity score of a non-existing edge for n' times, and the similarity scores of both n "times are the same, then the AUC is defined as follows:

The method and apparatus for predicting links of a dynamic social relationship network based on spatiotemporal relationships according to the present invention will be described in detail with reference to the accompanying drawings in a specific embodiment. It is to be understood that the following description is only exemplary, and not a specific limitation of the invention.

Aiming at the defects of the prior art, the invention aims to design a link prediction method which can combine the dynamic time sequence characteristics of the network with the learning characteristic representation and improve the accuracy of the link prediction of the dynamic weighting network.

In order to achieve the above objects and other related objects, the present invention provides a dynamic social relationship network link prediction method and apparatus based on machine learning and spatio-temporal relationships. Fig. 2 is a main flow chart of the prediction method of the present invention. As shown in fig. 2, the prediction method of the present invention includes the following steps:

and S1, preprocessing the raw data to generate a sample set.

Suppose a dynamic network from t₀Start to t₁And finishing the mutual information among all the components. The components and their interactions are abstracted into a undirected network G (V, E). Where V represents a set of components in the network and E represents a set of edges for which an interaction exists. At this time u, V ∈ V represents each component node in the network, and (u, V) ∈ E represents an edge in the network. In addition, based on different importance of components in the network, in order to measure the influence of different edges on the nodes on the similarity, the weight w which is given to each node pair is considered_(u,v)。

In order to take into account the evolution information of the network, the network is divided into a plurality of sub-networks by time series, wherein each sub-network can be considered as static. As shown in FIG. 3, G₁To G₃The connection situation between the nodes is constantly changing for the sub-network state of the left network at different time. Based on the known time information in the dynamic network, the whole time period is divided into n time slices, and the interval of each time slice is (t)₁-t_o) And/n. That is, the interaction of the ith (i is more than or equal to 1 and less than or equal to n) time slice occurs in t₀,t₀+i*(t₁-t_o)/n]And (4) the following steps. If S denotes the entire network sequence, G_tRepresenting the network at time T, T being the time span of the entire network, S may be represented as S ═ G₁，G₂，G_t，…，G_T}。

And S2, constructing a node pair weighted similarity characteristic time sequence.

Firstly, extracting the characteristics of weighted node pairs based on the similarity index of the local network structure. According to the method, a Weighted Common Neighbor (WCN), a weighted adaptive-Adar (WAA), a Weighted Resource Allocation (WRA), a weighted priority link (WPA) and a Weighted Jaccard's Coeffient (WJC) are selected as the network node pair characteristics extracted manually.

And S3, constructing a feature matrix.

After the time sequence of the network is established according to the time information, the evolution process of the network and the change trend of the node to the similarity can be obtained by observing the change condition between the adjacent sub-networks in the time sequence of the network, so that the time information of the network is utilized and finally applied to the link prediction of the dynamic network. The invention adopts a moving average model, the model averages n sub-network characteristics nearest to the t moment, and the model can be expressed as:

when n is T-1, the model evolves into an ensemble average model, and the expression of the model is as follows:

the calculation result of the model is the average value of the similarity of the latest n sequences in the observed sequence, and the final similarity is calculated to be used as the characteristic value of the node pair, so that the characteristic matrix is constructed.

And S4, constructing a training set and a testing set.

The task of dynamic link prediction is to predict a link which is likely to be newly added in the network at a future time by using the network at a historical time. For network time series S ═ G₁，G₂，G_t，…，G_TAnd taking the first n-1 sub-networks as a training set, and taking the nth sub-network as a test set. And giving labels to the training set according to different link states.

And S5, realizing link prediction through a machine learning classification algorithm.

The invention adopts two models of a random forest and a support vector machine as a classification algorithm. The random forest classification model is composed of a series of decision trees, m samples are repeatedly and randomly extracted from an original training sample set in a replacement mode to generate a new training sample set, and then k classification trees are generated according to a self-help sample set to form a random forest. The support vector machine maps the vectors into a higher dimensional space in which a maximally spaced hyperplane is created. Two hyperplanes parallel to each other are built on both sides of the hyperplane separating the data, and the hyperplane separating the hyperplanes maximizes the distance between the two parallel hyperplanes. And training the training set by using the two supervised learning algorithms, and predicting the dynamic relationship of the nodes in the test set.

And S6, evaluating the prediction result.

In order to evaluate the accuracy of the prediction result, the invention uses AUC (area Under the Receiver Operating Characteristic curve) as the accuracy evaluation index. AUC may be understood as the probability that the score value of a randomly selected edge in a test set is higher than the score value of a randomly selected non-existing edge. During n independent comparisons, a missing link and a non-existing link are randomly selected to compare their similarity scores, and if the similarity score of an edge in the test set is greater than the similarity score of a non-existing edge for n' times, and the similarity scores of both n "times are the same, then the AUC is defined as follows:

In summary, the dynamic evolution link prediction method based on machine learning provided by the invention has the following beneficial effects:

the method is suitable for the evolution prediction of the dynamic time-varying network. The invention establishes the characteristic time sequence of the dynamic network on the basis of the network topological structure characteristics and the link generation time sequence information, thereby expanding the application range of the prediction method from a static network to a dynamic time-varying network.

The method is suitable for link prediction of a weighting network. Individuals in the actual social network also have different degrees of correlation, such as interaction times, interaction frequency, and the like. Reflected in the abstracted network, i.e. with different weights for the links in the network. The invention introduces the weight into the link prediction problem, and better reflects the practical characteristics of the network.

The accuracy of the prediction result is improved. The invention integrates the network structure characteristics and the node link characteristics, and combines a statistical model and a supervised learning method, thereby being more suitable for the actual situation and having better prediction effect. Compared with the traditional static algorithm, the experimental result on the Email-Eu-core Temporal Network data set shows that the accuracy rate is improved by 5% by adopting the weighted dynamic Network prediction method; compared with a link prediction method without considering weight, the method provided by the invention improves the accuracy by 3%.

While the invention has been described in connection with specific embodiments thereof, it is to be understood that it is intended by the appended drawings and description that the invention may be embodied in other specific forms without departing from the spirit or scope of the invention.

Claims

1. A prediction method of a dynamic social relationship network link based on a spatio-temporal relationship is characterized by comprising the following steps:

2. The spatiotemporal relationship-based dynamic social relationship network link prediction method of claim 1, wherein preprocessing the dynamic social relationship data comprises: and dividing the dynamic social relation data into a plurality of sub-networks according to a preset time interval.

3. The spatio-temporal relationship-based dynamic social relationship network link prediction method according to claim 1 or 2, wherein the preprocessing of the dynamic social relationship data comprises: and assigning a corresponding weight to each node pair based on the link relation of each node pair.

4. The spatio-temporal relationship-based dynamic social relationship network link prediction method as claimed in claim 1, wherein the pre-trained classification model is trained using random forest or support vector machine algorithm.

5. The method of predicting spatiotemporal relationship-based dynamic social relationship network links of claim 1, wherein an AUC evaluation index is employed to evaluate the predicted dynamic social relationship network.

6. A prediction apparatus for dynamic social relationship network links based on spatio-temporal relationships, comprising:

7. The spatio-temporal relationship-based prediction device for dynamic social relationship network links according to claim 6, wherein the data processing module comprises: and the dividing module is used for dividing the dynamic social relationship data into a plurality of sub-networks according to a preset time interval.

8. The apparatus for predicting spatio-temporal relationship-based dynamic social relationship network links according to claim 6 or 7, wherein the data processing module comprises: and the weight assignment module is used for assigning corresponding weight to each node pair based on the link relation of each node pair.

9. The spatio-temporal relationship-based dynamic social relationship network link prediction device of claim 6, wherein the pre-trained classification model is trained using random forest or support vector machine algorithms.

10. The spatio-temporal relationship-based prediction device for dynamic social relationship network links according to claim 6, further comprising: