CN106100870A

CN106100870A - A kind of community network event detecting method based on link prediction

Info

Publication number: CN106100870A
Application number: CN201610374849.4A
Authority: CN
Inventors: 胡文斌; 王欢; 杜博; 严丽平; 邱振宇; 聂聪
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2016-05-31
Filing date: 2016-05-31
Publication date: 2016-11-09

Abstract

The invention discloses a kind of community network event detecting method based on link prediction, first use algorithm SimC to calculate the similarity of each time period input network G, and according to result of calculation, draw network evolution sequence GraphS；Then on GraphS, combine threshold value T use incident Detection Algorithm EventD, outgoing event sequence EventS.The present invention contributes to analyzing community network and develops, the important event occurred in detection network, and then guides network normally to develop.

Description

A Social Network Event Detection Method Based on Link Prediction

技术领域technical field

本发明属于社会网络分析技术领域，涉及一种基于链路预测的社会网络事件检测方法LinkEvent(由相似性计算算法SimC和事件检测算法EventD组成)，它可以对不同网络的波动性进行统一评价，并依此建立事件检测模型。The invention belongs to the technical field of social network analysis, and relates to a link prediction-based social network event detection method LinkEvent (composed of a similarity calculation algorithm SimC and an event detection algorithm EventD), which can uniformly evaluate the volatility of different networks, Based on this, an event detection model is established.

背景技术Background technique

社会网络是不断发展变化的，网络演化分析与事件检测是社会网络分析的重要组成部分。网络演化分析是指通过跟踪网络不同阶段的特征变化来描述其演化规律，进而分析网络增长、传播等行为，预测未来结构，甚至加以人为干预，以得到预期结果。网络演化分析技术已经随着社交网络的爆炸式发展而广泛应用于用户行为分析、消息传播引导等领域。然而不同社会网络的特征千差万别，演化机制纷繁复杂，如何高效地模拟真实网络的增长、传播等行为，已经成为网络演化分析当前面临的首要挑战。事件检测是网络演化分析技术的一项具体应用。一般是指在描述网络演化规律的基础上，通过分析网络各个阶段的差异，检测出网络中发生的事件并提出干预策略。事件检测在分析犯罪网络中核心头目的更替、判断公司邮件网络中组织架构的变迁等方面具有重要的指导意义。真实网络中，各种各样的事件都可能导致网络偏离正常演化，从而呈现不同的结构改变。如何定义并检测出这些事件，评估影响，提出相应策略进而干预，是事件检测研究的难点。Social networks are constantly evolving, and network evolution analysis and event detection are important components of social network analysis. Network evolution analysis refers to describing the evolution law by tracking the characteristic changes in different stages of the network, and then analyzing the behavior of network growth and propagation, predicting the future structure, and even intervening artificially to obtain the expected results. Network evolution analysis technology has been widely used in user behavior analysis, message dissemination guidance and other fields with the explosive development of social networks. However, the characteristics of different social networks vary greatly, and the evolution mechanism is complex. How to efficiently simulate the growth and propagation of real networks has become the primary challenge for network evolution analysis. Event detection is a specific application of network evolution analysis technology. Generally, on the basis of describing the evolution law of the network, by analyzing the differences in each stage of the network, detecting the events that occur in the network and proposing intervention strategies. Event detection has important guiding significance in analyzing the change of the core leader in the criminal network and judging the change of the organizational structure in the company's mail network. In a real network, various events may cause the network to deviate from the normal evolution, thus presenting different structural changes. How to define and detect these events, evaluate the impact, propose corresponding strategies and then intervene are the difficulties in the research of event detection.

为了反映网络的特征变化，揭示内在演化规律，学者们提出了各种各样的模型，较为典型的有(E-R)随机图模型，Watts-Strogatz(W-S)小世界模型，Barabási-Albert(B-A)无标度模型。这些方法的一般步骤是：基于一种或多种演化机制构建网络模型，调整模型参数以适配真实网络，仿真得到各个时间段的网络，最后通过度分布、平均聚集系数等网络统计特征来评价模型对真实网络的描述程度。此类模型演化方法(ModelEvolution，ME)的优点是实现简单，可根据网络特性调整参数来构建不同的网络。然而上述模型仅仅针对特定网络设计，很难兼顾各种统计特性，不同模型的表现情况缺乏统一的评价标准。同时，由于未考虑网络各时间段前后的相互关系，忽略了网络演化过程中的波动性，这些模型都难以描述网络演化的稳定程度，无法对网络中的事件进行检测。In order to reflect the characteristic changes of the network and reveal the internal evolution law, scholars have proposed various models, the more typical ones are (ER) random graph model, Watts-Strogatz (WS) small-world model, Barabási-Albert (BA) scale-free model. The general steps of these methods are: build a network model based on one or more evolutionary mechanisms, adjust the model parameters to fit the real network, simulate the network in each time period, and finally evaluate the network through statistical characteristics such as degree distribution and average aggregation coefficient. How well the model describes the real network. The advantage of this type of model evolution method (ModelEvolution, ME) is that it is simple to implement, and different networks can be constructed by adjusting parameters according to network characteristics. However, the above models are only designed for a specific network, and it is difficult to take into account various statistical characteristics, and the performance of different models lacks a unified evaluation standard. At the same time, these models are difficult to describe the stability of network evolution and cannot detect events in the network because they do not consider the interrelationships before and after each time period of the network and ignore the volatility in the network evolution process.

链路预测(Link Prediction，LP)是指，在给定网络当前时间段的拓扑结构(点与点之间的链接关系)的前提下，如何准确预测下一时间段新出现的边。其具体步骤为：按照某种指标计算当前时间段内所有点对的得分，删除已存在的点对(即网络中已经存在的边)，将剩余点对按照得分降序排列，根据评价指标选取前L个点对作为预测结果输出。与模型演化方法不同，链路预测充分利用了当前时间段的既有信息，采用了不同演化机制构建的各种指标对下一时间段的网络结构进行预测，同时由于具有统一的评价标准，各种指标的预测效果可以比较。[文献1]Palla G，Barabási A L，Vicsek T.Quantifying socialgroup evolution[J].Nature，2007，446(7136):664-667.Link prediction (Link Prediction, LP) refers to how to accurately predict the new edges in the next time period under the premise of the topology structure (link relationship between points) of the current time period of the network. The specific steps are: calculate the scores of all point pairs in the current time period according to a certain index, delete the existing point pairs (that is, the edges that already exist in the network), arrange the remaining point pairs in descending order of scores, and select the previous point pairs according to the evaluation index. L point pairs are output as prediction results. Different from the model evolution method, link prediction makes full use of the existing information in the current time period, and uses various indicators constructed by different evolution mechanisms to predict the network structure in the next time period. The predictive performance of the indicators can be compared. [Document 1] Palla G, Barabási A L, Vicsek T. Quantifying social group evolution [J]. Nature, 2007, 446(7136):664-667.

发明内容Contents of the invention

为了解决上述技术问题，本发明提出了一种基于对网络演化序列的相似性分析，利用链路预测的相关指标，描述网络的演化趋势，检测网络中事件的方法。In order to solve the above-mentioned technical problems, the present invention proposes a method based on similarity analysis of network evolution sequences, using related indicators of link prediction to describe network evolution trends and detect network events.

本发明所采用的技术方案是：一种基于链路预测的社会网络事件检测方法，其特征在于，包括以下步骤：The technical scheme adopted in the present invention is: a social network event detection method based on link prediction, which is characterized in that it comprises the following steps:

步骤1：对输入网络G采用算法SimC计算各个时间段的相似性，并根据计算结果，得出网络演化序列GraphS；Step 1: Use the algorithm SimC to calculate the similarity of each time period for the input network G, and obtain the network evolution sequence GraphS according to the calculation results;

步骤2：在GraphS上结合阈值T采用事件检测算法EventD，输出事件序列EventS。Step 2: Combine the threshold T with the event detection algorithm EventD on GraphS, and output the event sequence EventS.

作为优选，步骤1的具体实现过程是：Preferably, the specific implementation process of step 1 is:

对于给定的网络G＝{g¹,g²,g³,…,gⁿ}，t时刻的网络快照用图g^t表示；在G的网络快照图g^t与g^t+1中，节点i的相似性定义为节点i在g^t、g^t+1中保持稳定的程度，用表示；图g^t、g^t+1的相似性是图中各个节点相似性叠加的宏观表现，用S(g^t,g^t+1)表示，定义为：For a given network G={g ¹ ,g ² ,g ³ ,…,g ⁿ }, the network snapshot at time t is represented by graph g ^t ; in the network snapshot graphs g ^t and g ^t+1 of G, nodes The similarity of i is defined as the degree to which node i remains stable in g ^t , g ^t+1 , with Represents; the similarity of graph g ^t and g ^t+1 is the macroscopic performance of the superposition of the similarity of each node in the graph, expressed by S(g ^t , g ^t+1 ), defined as:

$S S (({g g}^{t t},, {g g}^{t t + + 11})) = = \underset{i i &Element; &Element; {U u}_{t t,, t t + + 11}}{Σ Σ} s the s (({v v}_{i i}^{t t},, {v v}_{i i}^{t t + + 11})) \times \times \frac{11}{| | {U u}_{t t,, t t + + 11} | |};;$

其中，U_t,t+1＝g^t∪g^t+1；在t和t+1时刻的v_i看成不同的节点，分别表示为和 Among them, U _t,t+1 ＝g ^t ∪g ^t+1 ; v _i at time t and t+1 are regarded as different nodes, expressed as and

[t,t+1]时间段内网络的波动性用表示，定义为：The volatility of the network in the time period [t,t+1] is expressed by means, defined as:

$\overset{^^}{D D.} (({g g}^{t t + + 11} | | | | {g g}^{t t})) = = \frac{11}{S S (({g g}^{t t},, {g g}^{t t + + 11}))};;$

网络演化序列GraphS定义为各个时间段波动性的集合，如下式所示；The network evolution sequence GraphS is defined as a collection of volatility in each time period, as shown in the following formula;

$G G r r a a p p h h S S = = {{\overset{^^}{D D.} (({g g}^{22} | | | | {g g}^{11})),, \overset{^^}{D D.} (({g g}^{33} | | | | {g g}^{22})),, ... ...,, \overset{^^}{D D.} (({g g}^{n no} | | | | {g g}^{n no - - 11}))}} . .$

作为优选，所述节点i的相似性的计算过程是：将链路预测中的8种指标应用到相似性计算中，引入了一个虚拟节点V_virtual，称为“观察者”，V_virtual在网络中与所有的点都存在一条虚拟边，得出8种节点相似性计算指标，如表1所示；Preferably, the similarity of the node i The calculation process is: apply the 8 indicators in the link prediction to the similarity calculation, introduce a virtual node V _virtual , called "observer", V _virtual has a virtual edge with all nodes in the network , to obtain 8 kinds of node similarity calculation indexes, as shown in Table 1;

表1引入虚拟点的节点相似性计算Table 1 Computation of node similarity by introducing virtual points

对网络G进行链路预测，选取表现最好的AUC所对应的指标为最优指标O，指标O即反映了网络的演化机制；Carry out link prediction on the network G, select the index corresponding to the best AUC as the optimal index O, and the index O reflects the evolution mechanism of the network;

其中AUC作为衡量链路预测算法精确度的主要指标，具体定义为：Among them, AUC is used as the main index to measure the accuracy of the link prediction algorithm, which is specifically defined as:

$A A U u C C = = \frac{{n no}^{' '} + + 0.5 0.5 {n no}^{' '' '}}{n no};;$

n表示比较的次数，n'表示从测试集中随机选择边的得分大于从不存在边构成集合中随机选择边的得分的次数，n"表示相等的次数。n represents the number of comparisons, n' represents the number of times the score of randomly selected edges from the test set is greater than the score of randomly selected edges from the non-existing edge formation set, and n" represents the equal number of times.

作为优选，所述节点i的相似性的计算，最终是计算改进后的节点相似性其实现过程包括以下子步骤：Preferably, the similarity of the node i The calculation of , and finally the calculation of the improved node similarity Its implementation process includes the following sub-steps:

步骤1：将链路预测中的8种指标应用到相似性计算中，同时引入了一个虚拟节点V_virtual，也称为“观察者”，V_virtual在网络中与所有的点都存在一条虚拟边，得出8种节点相似性计算指标，如表1所示；Step 1: Apply the 8 indicators in the link prediction to the similarity calculation, and introduce a virtual node V _virtual , also called "observer". V _virtual has a virtual edge with all nodes in the network , to obtain 8 kinds of node similarity calculation indexes, as shown in Table 1;

$A A U u C C = = \frac{{n no}^{' '} + + 0.5 0.5 {n no}^{' '' '}}{n no};;$

n表示比较的次数，n'表示从测试集中随机选择边的得分大于从不存在边构成集合中随机选择边的得分的次数，n"表示相等的次数；n represents the number of comparisons, n' represents the number of times that the score of randomly selected edges from the test set is greater than the number of times that the score of randomly selected edges is formed from the set of non-existing edges, and n" represents the number of times that they are equal;

步骤2：计算节点的演化权重 Step 2: Calculate the evolution weight of the node

步骤3：计算改进后的节点相似性 Step 3: Calculate the improved node similarity

作为优选，所述节点的演化权重由与随机演化值0.5的比值确定，即：Preferably, the evolution weight of the node Depend on The ratio to the random evolution value of 0.5 is determined, namely:

$w w (({v v}_{i i}^{t t},, {v v}_{i i}^{t t + + 11})) = = \frac{{VAUC VAUC}_{i i}^{t t,, t t + + 11}}{0.5 0.5};;$

其中，表示节点v_i的链路预测精度，定义为v_i在g^t+1中新增加边的链路预测精度的平均值，in, Indicates the link prediction accuracy of node v _i , which is defined as the average value of the link prediction accuracy of newly added edges of v _i in g ^t+1 ,

${VAUC VAUC}_{i i}^{t t,, t t + + 11} = = \{\begin{matrix} \frac{\underset{e e &Element; &Element; {NE NE}_{i i}^{t t,, t t + + 11}}{Σ Σ} {EAUC EAUC}_{e e}^{t t,, t t + + 11}}{| | {NE NE}_{i i}^{t t,, t t + + 11} | |},, & | | {NE NE}_{i i}^{t t,, t t + + 11} | | > > 00 \\ 0.5 0.5,, & | | {NE NE}_{i i}^{t t,, t t + + 11} | | = = 00 \end{matrix};;$

其中，表示v_i在g^t+1中新增加的边的集合，反映了节点变化与演化规律的契合程度，其值越大表明节点的变化越符合演化规律。in, Indicates the set of newly added edges of v _i in g ^t+1 , It reflects the degree of fit between the node change and the evolution law, and the larger the value, the more the node change conforms to the evolution law.

作为优选，所述节点的演化权重的计算公式为：Preferably, the evolution weight of the node The calculation formula is:

$w w (({v v}_{i i}^{t t},, {v v}_{i i}^{t t + + 11})) = = \{\begin{matrix} \frac{{VAUC VAUC}_{i i}^{t t,, t t + + 11}}{α α},, & {VAUC VAUC}_{i i}^{t t,, t t + + 11} < < 0.5 0.5 \\ {VAUC VAUC}_{i i}^{t t,, t t + + 11},, & 0.5 0.5 < < = = {VAUC VAUC}_{i i}^{t t,, t t + + 11} < < 0.8 0.8 \\ α α \times \times {VAUC VAUC}_{i i}^{t t,, t t + + 11},, & 0.8 0.8 < < = = {VAUC VAUC}_{i i}^{t t,, t t + + 11} < < = = 1.0 1.0 \end{matrix};;$

其中，α表示缩放因子，旨在更明显地区分三个等级；表示节点v_i的链路预测精度，定义为v_i在g^t+1中新增加边的链路预测精度的平均值，Among them, α represents the scaling factor, which aims to distinguish the three levels more clearly; Indicates the link prediction accuracy of node v _i , which is defined as the average value of the link prediction accuracy of newly added edges of v _i in g ^t+1 ,

作为优选，步骤2的具体实现包括以下子步骤：Preferably, the specific realization of step 2 includes the following sub-steps:

步骤2.1:根据既有事件计算事件发生值区间；Step 2.1: Calculate the event occurrence value interval according to the existing event;

步骤2.2:确定事件发生阈值T；Step 2.2: determine event occurrence threshold T;

步骤2.3:遍历GraphS，输出事件序列EventS。Step 2.3: traverse GraphS and output event sequence EventS.

作为优选，步骤2的具体实现过程是：根据已经发生的事件点组成的事件序列EventO＝{k|t＝k时发生事件，k∈[1,m],m≤n}，分析演化序列GraphS，学习得到事件发生值区间[L,H]，其中L为事件发生下边界，H为上边界；选取T∈[L,H]为发生阈值，阈值T由人工确定；在k＝t时，若则网络在时间段[k,k+1]处于事件状态，反之则为相对平稳状态；分析完毕，最终输出事件序列EventS：Preferably, the specific implementation process of step 2 is: according to the event sequence EventO={k|t=k, an event occurs when k ∈ [1, m], m ≤ n}, the evolution sequence GraphS is analyzed. , learn to get the event occurrence value interval [L,H], where L is the lower boundary of event occurrence, H is the upper boundary; T∈[L,H] is selected as the occurrence threshold, and the threshold T is determined manually; when k=t, like Then the network is in the event state in the time period [k, k+1], otherwise it is in a relatively stable state; after the analysis is completed, the event sequence EventS is finally output:

$E E. v v e e n no t t S S = = {{k k | | \overset{^^}{D D.} (({g g}^{k k + + 11} | | | | {g g}^{k k})) > > T T,, k k &Element; &Element; [[11,, n no - - 11]]}} . .$

作为优选，事件检测方法的评价标准为：Preferably, the evaluation criteria of the event detection method are:

假定k₁，k₂为网络G中紧邻的两个事件点，k₁+1<k₂；G在[k₁+1,k₂]时间段处于相对平稳状态，[k₂,k₂+1]处于事件状态，定义事件敏感表现：Assume that k ₁ and k ₂ are two adjacent event points in the network G, k ₁ +1<k ₂ ; G is in a relatively stable state during the [k ₁ +1,k ₂ ] time period, [k ₂ ,k ₂ + 1] In event state, define event-sensitive performance:

$P P e e r r = = \frac{\overset{^^}{D D.} (({g g}^{{k k}_{22} + + 11} | | | | {g g}^{{k k}_{22}})) - - \frac{{Σ Σ}_{i i = = {k k}_{11} + + 11}^{{k k}_{22} - - 11} \overset{^^}{D D.} (({g g}^{i i + + 11} | | | | {g g}^{i i}))}{{k k}_{22} - - {k k}_{11} - - 22}}{\frac{{Σ Σ}_{i i = = {k k}_{11} + + 11}^{{k k}_{22} - - 11} \overset{^^}{D D.} (({g g}^{i i + + 11} | | | | {g g}^{i i}))}{{k k}_{22} - - {k k}_{11} - - 22}};;$

其中， in,

事件敏感表现Per是网络事件段波动性与平稳段平均波动性的比值，比值越大表明事件越易被检测出。The event sensitivity performance Per is the ratio of the volatility of the network event segment to the average volatility of the stable segment. The larger the ratio, the easier it is for the event to be detected.

本发明的有益效果是：有助于分析社会网络演化，检测网络中发生的重大事件，进而引导网络正常演化，避免恶性群体事件的发生。The beneficial effect of the invention is that it is helpful to analyze the evolution of the social network, detect major events in the network, and then guide the normal evolution of the network to avoid the occurrence of vicious group events.

附图说明Description of drawings

图1是本发明实施例的流程示意图；Fig. 1 is a schematic flow chart of an embodiment of the present invention;

图2是本发明实施例的一个简单网络从t到t+3时间段内的演化过程示意图；Fig. 2 is a schematic diagram of the evolution process of a simple network in the time period from t to t+3 according to an embodiment of the present invention;

图3是本发明实施例的非连通网络中虚拟节点的引入演化情况示意图。Fig. 3 is a schematic diagram of the introduction and evolution of virtual nodes in a disconnected network according to an embodiment of the present invention.

具体实施方式detailed description

为了便于本领域普通技术人员理解和实施本发明，下面结合附图及实施例对本发明作进一步的详细描述，应当理解，此处所描述的实施示例仅用于说明和解释本发明，并不用于限定本发明。In order to facilitate those of ordinary skill in the art to understand and implement the present invention, the present invention will be described in further detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the implementation examples described here are only used to illustrate and explain the present invention, and are not intended to limit this invention.

本发明提出的方法基于对网络演化序列的相似性分析，利用链路预测的相关指标设计出高效合理的算法，从而描述网络的演化趋势，检测网络中的事件。The method proposed by the invention is based on the similarity analysis of the network evolution sequence, and uses the relevant indicators of link prediction to design an efficient and reasonable algorithm, thereby describing the evolution trend of the network and detecting events in the network.

请见图1，本发明提供的一种基于链路预测的社会网络事件检测方法(以下简称LinkEvent方法)，基于对网络演化序列的相似性分析，利用链路预测的相关指标设计出高效合理的算法，从而描述网络的演化趋势，检测网络中的事件。Please see Figure 1, a social network event detection method based on link prediction (hereinafter referred to as the LinkEvent method) provided by the present invention, based on the similarity analysis of the network evolution sequence, uses the relevant indicators of link prediction to design an efficient and reasonable Algorithms to describe the evolution trend of the network and detect events in the network.

LinkEvent方法包括如下两个步骤：The LinkEvent method includes the following two steps:

步骤1：对输入网络G采用算法SimC计算各个时间段的相似性，并根据计算结果，得出网络演化序列GraphS(Graph evolution sequence)；Step 1: Use the algorithm SimC to calculate the similarity of each time period for the input network G, and obtain the network evolution sequence GraphS (Graph evolution sequence) according to the calculation results;

步骤2：在GraphS上结合阈值T采用事件检测算法EventD，输出事件序列EventS(Event Sequence)。Step 2: Combine the threshold T with the event detection algorithm EventD on GraphS, and output the event sequence EventS (Event Sequence).

步骤1中，对输入网络G采用算法SimC计算各个时间段的相似性，其具体实现过程是：In step 1, the algorithm SimC is used to calculate the similarity of each time period for the input network G, and the specific implementation process is as follows:

对于给定的网络G＝{g¹,g²,g³,…,gⁿ}，t时刻的网络快照可用图g^t表示，g^t与g^t+1之间的相似程度受到如下三个因素影响：For a given network G={g ¹ ,g ² ,g ³ ,…,g ⁿ }, the network snapshot at time t can be represented by a graph g ^t , and the similarity between g ^t and g ^t+1 is affected by the following three Factors affecting:

(1)相对于g^t，g^t+1中新点的增加以及因此带来的新边的引入；(1) Relative to g ^t , the addition of new points in g ^t+1 and the introduction of new edges;

(2)相对于g^t，g^t+1中旧点的消失以及相应边的消失；(2) Relative to g ^t , the disappearance of the old point in g ^t+1 and the disappearance of the corresponding side;

(3)g^t，g^t+1中点保持稳定，边的单纯增加或减少。(3) g ^t , the midpoint of g ^t+1 remains stable, and the side simply increases or decreases.

以上三个因素相互叠加，各个节点及其关联的边随时间不断变化，在宏观上就表现为整体网络的波动。如何描述网络的波动程度，为事件检测提供分析基础，成为当下面临的首要问题。为此本发明提出了SimC算法，并针对相关问题进行了探讨。The above three factors are superimposed on each other, and each node and its associated edges are constantly changing with time, which is manifested as fluctuations in the overall network at a macro level. How to describe the fluctuation degree of the network and provide an analysis basis for event detection has become the primary problem facing the moment. For this reason, the present invention proposes a SimC algorithm, and discusses related issues.

在G的网络快照图g^t与g^t+1中，节点i的相似性定义为节点i在g^t，g^t+1中保持稳定的程度，用表示；在t和t+1时刻的v_i看成不同的节点，分别表示为和 In the network snapshot graph g ^t and g ^t+1 of G, the similarity of node i is defined as the degree to which node i maintains stability in g ^t and g ^t+1 , using Indicates; v _i at time t and t+1 are regarded as different nodes, expressed as and

图g^t，g^t+1的相似性是图中各个节点相似性叠加的宏观表现，用S(g^t,g^t+1)表示，定义The similarity of graph g ^t , g ^t+1 is the macroscopic expression of the superposition of the similarity of each node in the graph, represented by S(g ^t , g ^t+1 ), defined

$S S (({g g}^{t t},, {g g}^{t t + + 11})) = = \underset{i i &Element; &Element; {U u}_{t t,, t t + + 11}}{Σ Σ} s the s (({v v}_{i i}^{t t},, {v v}_{i i}^{t t + + 11})) \times \times \frac{11}{| | {U u}_{t t,, t t + + 11} | |} - - - - - - ((11));;$

其中，U_t,t+1＝g^t∪g^t+1。Among them, U _t,t+1 =g ^t ∪g ^t+1 .

g^t，g^t+1的相似性反映的是两个图之间的近似程度，其值越大表示网络在[t,t+1]时间段内变化越小，网络的波动程度越小。[t,t+1]时间段内网络的波动性用表示，定义：The similarity of g ^t and g ^t+1 reflects the degree of approximation between the two graphs. The larger the value, the smaller the change of the network in the [t,t+1] time period, and the smaller the fluctuation of the network. The volatility of the network in the time period [t,t+1] is expressed by Express, define:

$\overset{^^}{D D.} (({g g}^{t t + + 11} | | | | {g g}^{t t})) = = \frac{11}{S S (({g g}^{t t},, {g g}^{t t + + 11}))} - - - - - - ((22));;$

网络演化序列GraphS定义为各个时间段波动性的集合，如下式(3)所示。The network evolution sequence GraphS is defined as a collection of volatility in each time period, as shown in the following formula (3).

$G G r r a a p p h h S S = = {{\overset{^^}{D D.} (({g g}^{22} | | | | {g g}^{11})),, \overset{^^}{D D.} (({g g}^{33} | | | | {g g}^{22})),, ... ...,, \overset{^^}{D D.} (({g g}^{n no} | | | | {g g}^{n no - - 11}))}} - - - - - - ((33));;$

下面详细阐述的计算过程；elaborated below calculation process;

(1)传统的计算方法；(1) Traditional calculation methods;

链路预测中节点相似性指标是衡量图中两个不同节点的相似程度，核心思想是两个节点的相似性取决于其拓扑结构信息(包括共同邻居数量、度的大小等)。借鉴这一思想，网络G中的节点i在g^t，g^t+1中可看做两个不同的节点两者的相似性也可以用拓扑结构来描述。The node similarity index in link prediction is to measure the similarity between two different nodes in the graph. The core idea is that the similarity of two nodes depends on their topology information (including the number of common neighbors, the size of the degree, etc.). Drawing on this idea, the node i in the network G can be regarded as two different nodes in g ^t and g ^t+1 The similarity between the two can also be used To describe the topological structure.

例如，链路预测中的Jaccard指标表示v_i与v_j的相似性由他们共同的邻居决定，相应地，与的相似性可用式(4)描述，记为JAS指标。For example, the Jaccard metric in link prediction Indicates that the similarity between v _i and v _j is determined by their common neighbors, correspondingly, and The similarity of can be described by formula (4), which is recorded as JAS index.

$s the s (({v v}_{i i}^{t t},, {v v}_{i i}^{t t + + 11})) = = \frac{| | Γ Γ (({v v}_{i i}^{t t})) \cap \cap Γ Γ (({v v}_{i i}^{t t + + 11})) | |}{| | Γ Γ (({v v}_{i i}^{t t})) \cup \cup Γ Γ (({v v}_{i i}^{t t + + 11})) | |} - - - - - - ((44));;$

按此方式，链路预测中的PA指标应用到相似性计算中，可得式(5)，记为PAS指标。In this way, the PA metrics in link prediction are applied to In the similarity calculation, formula (5) can be obtained, which is recorded as PAS index.

$s the s (({v v}_{i i}^{t t},, {v v}_{i i}^{t t + + 11})) = = | | Γ Γ (({v v}_{i i}^{t t})) | | \times \times | | Γ Γ (({v v}_{i i}^{t t + + 11})) | | - - - - - - ((55));;$

结合式(4)、式(1)或式(5)、式(1)可以计算出图g^t，g^t+1的相似性。Combining formula (4), formula (1) or formula (5), formula (1) can calculate the similarity of graph g ^t , g ^t+1 .

相较于本发明根据节点相似性累加得出图的相似性，[文献1]给出了社区演化中同一社区前后状态的相似重叠度(relative overlap)计算方式，应用到g^t，g^t+1中，也可以描述g^t，g^t+1的相似程度，如式(6)所示，记为ROS。Compared with the present invention based on the accumulation of node similarity to obtain the similarity of graphs, [Document 1] gives the calculation method of the similarity overlap (relative overlap) of the same community before and after the state in the community evolution, which is applied to g ^t , g ^t+ In ¹ , the similarity between g ^t and g ^t+1 can also be described, as shown in formula (6), which is recorded as ROS.

$S S (({g g}^{t t},, {g g}^{t t + + 11})) = = \frac{| | A A (({g g}^{t t})) \cap \cap A A (({g g}^{t t + + 11})) | |}{| | A A (({g g}^{t t})) \cup \cup A A (({g g}^{t t + + 11})) | |} - - - - - - ((66));;$

其中，A(g^t)表示g^t中所有节点的集合。Among them, A(g ^t ) represents the set of all nodes in g ^t .

以下通过举例分析说明本发明提出的相似性计算方法的优越性；The superiority of the similarity calculation method proposed by the present invention is illustrated by example analysis below;

图2示例了一个简单网络从t到t+3时间段内的演化过程，每1步均只增加一个节点，时间窗口设定为1。Figure 2 illustrates the evolution process of a simple network from t to t+3. Only one node is added in each step, and the time window is set to 1.

分别利用JAS、PAS以及ROS计算g^t，g^t+1以及g^t+1，g^t+2的相似性，结果见表2。The similarity of g ^t , g ^t+1 and g ^t+1 , g ^t+2 is calculated by JAS, PAS and ROS respectively, and the results are shown in Table 2.

表2三种相似性指标计算结果Table 2 Calculation results of three similarity indexes

为了提高事件检测的敏感性，网络波动性的变化幅度应尽可能地大，也即S(g^t,g^t ⁺¹)变化应尽可能地明显。由表2可知，JAS表现优于ROS，PAS表现优于JAS。实际上，ROS只在宏观上考虑了节点的变化，并没有考虑边的变化，故效果最差。JAS虽然具体到每一个节点及其关联边的拓扑结构变化，却并不能体现网络的演化规律。图2所示的网络按照优先链接方式演化，因此基于优先连接机制的PAS效果也最好。In order to improve the sensitivity of event detection, the change range of network volatility should be as large as possible, that is, the change of S(g ^t , g ^t ⁺¹ ) should be as obvious as possible. It can be seen from Table 2 that JAS outperforms ROS, and PAS outperforms JAS. In fact, ROS only considers the changes of nodes macroscopically, and does not consider the changes of edges, so the effect is the worst. Although JAS is specific to the topology changes of each node and its associated edges, it cannot reflect the evolution law of the network. The network shown in Figure 2 evolves according to the priority connection mode, so the PAS based on the priority connection mechanism has the best effect.

采用PAS，JAS计算时，相对于g^t，g^t+1中新增节点的相似性计算值为0，消失节点的相似性计算值也为0。按此计算方式，网络中孤立节点(非连通网络的子图)的整体消失或者增加对相似性计算没有任何影响，这明显不符合实际情况。When PAS and JAS are used to calculate, relative to g ^t , the calculated similarity value of newly added nodes in g ^t+1 is 0, and the calculated similarity value of disappeared nodes is also 0. According to this calculation method, the overall disappearance or increase of isolated nodes (subgraphs of non-connected networks) in the network has no effect on the similarity calculation, which is obviously not in line with the actual situation.

(2)本发明引入了一个虚拟节点V_virtual后的计算方法；(2) the present invention has introduced the computing method after a virtual node V _virtual ;

为了解决这一问题，本发明在网络中引入了一个虚拟节点V_virtual，也称为“观察者”，可理解为从该节点的视角来“观察”整个网络的变化。V_virtual在网络中与所有的点都存在一条虚拟边，如图3所示。In order to solve this problem, the present invention introduces a virtual node V _virtual in the network, also called "observer", which can be understood as "observing" changes of the entire network from the perspective of this node. V _virtual has a virtual edge with all nodes in the network, as shown in Figure 3.

图3所示为一个非连通网络的演化情况。在t+1时刻，原来存在的子图CDE消失，而子图FG产生。若没有引入V_virtual，则计算相似性时这些子图的消失和产生对于网络的影响均无法体现。引入V_virtual后，则可以通过V_virtual的相似性计算体现出来。Figure 3 shows the evolution of a disconnected network. At time t+1, the original subgraph CDE disappears, and the subgraph FG is generated. If V _virtual is not introduced, the influence of disappearance and generation of these subgraphs on the network cannot be reflected when calculating the similarity. After V _virtual is introduced, it can be reflected through the similarity calculation of V _virtual .

引入虚拟点后，节点相似性计算可以完整地描述网络波动性。将链路预测中的8种指标应用到相似性计算中来，可得出8种节点相似性计算指标，如3所示。After introducing virtual points, node similarity calculation can fully describe network volatility. Applying the 8 indicators in the link prediction to the similarity calculation, we can get 8 node similarity calculation indicators, as shown in 3.

表3引入虚拟点的节点相似性计算Table 3 Computation of node similarity by introducing virtual points

某种演化机制的网络采用相同机制的指标计算表现优于其他指标。对于未知演化机制的网络，应首先利用链路预测推断其演化机制，再采用对应机制的指标来计算相似性。对网络G进行链路预测，选取表现最好的AUC所对应的指标为最优指标(也称演化指标)O。指标O即反映了网络的演化机制。A network with a certain evolution mechanism performs better than other indicators using the index calculation of the same mechanism. For a network with an unknown evolution mechanism, link prediction should be used to infer its evolution mechanism first, and then the index of the corresponding mechanism should be used to calculate the similarity. Perform link prediction on the network G, and select the index corresponding to the best AUC as the optimal index (also called evolution index) O. Index O reflects the evolution mechanism of the network.

$A A U u C C = = \frac{{n no}^{' '} + + 0.5 0.5 {n no}^{' '' '}}{n no} - - - - - - ((77));;$

根据以上分析，本发明提出相似性计算算法SimC，具体见算法1。According to the above analysis, the present invention proposes a similarity calculation algorithm SimC, see Algorithm 1 for details.

(3)本发明在算法1的基础上进一步引入节点演化权重后的计算方法；(3) The present invention further introduces the calculation method after the node evolution weight on the basis of Algorithm 1;

原始SimC算法将所有节点平等看待，节点相似性直接累加得出图的相似性，并没有考虑节点的微观差异。实际上，某个节点及其周围拓扑结构变化若符合网络演化规律，可以看做是正常的演化，其对网络的波动性影响较小。而节点的变化不符合演化规律，极有可能是事件的发生导致内在演化原则被打破，对网络的波动性影响是较大的。因此，不同节点在计算相似性时应该区别对待。为此，本发明引入节点演化权重的概念。The original SimC algorithm treats all nodes equally, and the similarity of nodes is directly accumulated to obtain the similarity of the graph, without considering the microscopic differences of nodes. In fact, if the topological structure change of a node and its surroundings conforms to the law of network evolution, it can be regarded as a normal evolution, which has little impact on the volatility of the network. However, the change of nodes does not conform to the law of evolution. It is very likely that the occurrence of events leads to the breaking of the internal evolution principle, which has a greater impact on the volatility of the network. Therefore, different nodes should be treated differently when calculating similarity. For this reason, the present invention introduces the concept of node evolution weight.

定义节点的演化权重w为该节点及其周围拓扑结构变化与网络演化规律的契合程度。w越大表示节点的变化越符合演化规律。表示节点v_i在[t,t+1]时间段的演化权重。The evolution weight w of a node is defined as the degree of conformity between the node and its surrounding topological structure changes and the evolution law of the network. The larger w is, the more the change of the node conforms to the evolution law. Indicates the evolution weight of node v _i in the [t,t+1] time period.

将链路预测精度具体到微观层面分析，假设网络G通过链路预测推断其最优指标(也即演化指标)为O，g^t中节点v_i、v_j不存在边，g^t+1中两者之间产生了一条边e_ij，把e_ij作为测试集，g^t中v_i与其他节点不存在的边作为随机选择集，每次将e_ij在O指标下的分数值与随机选择集中的边进行比较。比较n次后，所得AUC定义为边e_ij的链路预测精度表示边e_ij的产生与演化规律的契合度，其值越大越说明e_ij的产生越符合O所对应的演化机制。Link Prediction Accuracy Specific to the analysis at the micro level, assuming that the network G infers its optimal index (that is, the evolution index) to be O through link prediction, there is no edge between nodes v _i and v _j in g ^t , and there is an edge between them in g ^t+1 An edge e _ij is selected, and e _ij is used as a test set, and the edge between v _i and other nodes in g ^t that does not exist is used as a random selection set, and the score value of e _ij under the O index is compared with the edge in the random selection set each time . After comparing n times, the resulting AUC is defined as the link prediction accuracy of edge e _ij Indicates the degree of fit between the generation of edge e _ij and the evolution law. The larger the value, the more consistent the generation of e _ij is with the evolution mechanism corresponding to O.

进一步，定义节点v_i的链路预测精度为v_i在g^t+1中新增加边的链路预测精度的平均值，如式(8)所示。Further, define the link prediction accuracy of node v _i is the average value of the link prediction accuracy of the newly added edge of v _i in g ^t+1 , as shown in formula (8).

${VAUC VAUC}_{i i}^{t t,, t t + + 11} = = \{\begin{matrix} \frac{\underset{e e &Element; &Element; {NE NE}_{i i}^{t t,, t t + + 11}}{Σ Σ} {EAUC EAUC}_{e e}^{t t,, t t + + 11}}{| | {NE NE}_{i i}^{t t,, t t + + 11} | |},, & | | {NE NE}_{i i}^{t t,, t t + + 11} | | > > 00 \\ 0.5 0.5,, & | | {NE NE}_{i i}^{t t,, t t + + 11} | | = = 00 \end{matrix} - - - - - - ((88));;$

其中，表示v_i在g^t+1中新增加的边的集合。反映了节点变化与演化规律的契合程度，其值越大表明节点的变化越符合演化规律。in, Indicates the set of newly added edges of v _i in g ^t+1 . It reflects the degree of fit between the node change and the evolution law, and the larger the value, the more the node change conforms to the evolution law.

根据可以计算节点的演化权重本发明提出了两种权重策略。according to The evolution weight of the node can be calculated The present invention proposes two weighting strategies.

第一种策略的由与随机演化值0.5的比值确定，如式(9)所示。first strategy Depend on The ratio to the random evolution value of 0.5 is determined, as shown in formula (9).

$w w (({v v}_{i i}^{t t},, {v v}_{i i}^{t t + + 11})) = = \frac{{VAUC VAUC}_{i i}^{t t,, t t + + 11}}{0.5 0.5} - - - - - - ((99));;$

第二种策略将节点的链路预测精度分级，表示不符合演化规律，0.8表示较符合，1.0表示完全符合。每个等级对应的演化权重计算方式如式(10)。The second strategy grades the link prediction accuracy of nodes, It means that it does not conform to the evolution law, 0.8 means it is more consistent, and 1.0 means it is completely consistent. The calculation method of evolution weight corresponding to each level is shown in formula (10).

$w w (({v v}_{i i}^{t t},, {v v}_{i i}^{t t + + 11})) = = \{\begin{matrix} \frac{{VAUC VAUC}_{i i}^{t t,, t t + + 11}}{α α},, & {VAUC VAUC}_{i i}^{t t,, t t + + 11} < < 0.5 0.5 \\ {VAUC VAUC}_{i i}^{t t,, t t + + 11},, & 0.5 0.5 < < = = {VAUC VAUC}_{i i}^{t t,, t t + + 11} < < 0.8 0.8 \\ α α \times \times {VAUC VAUC}_{i i}^{t t,, t t + + 11},, & 0.8 0.8 < < = = {VAUC VAUC}_{i i}^{t t,, t t + + 11} < < = = 1.0 1.0 \end{matrix} - - - - - - ((1010));;$

其中，α表示缩放因子，旨在更明显地区分三个等级。Among them, α represents the scaling factor, which aims to distinguish the three levels more clearly.

考虑节点演化权重后，得到改进的wSimC算法如算法2。After considering the node evolution weight, the improved wSimC algorithm such as Algorithm 2 is obtained.

原始的SimC算法中，对于网络G(取n个时间快照，平均节点数为N，平均度为k)，需要计算相似性的节点数为(n-1)N。改进的wSimC算法中，在计算某一节点的相似性时，还需要计算该点所有边的链路预测精度，时间复杂度为O((N-k)k)，故整个算法的时间复杂度为O((n-1)N*(N-k)k)。Barabasi与Alber在研究真实网络时发现其大多遵守幂律分布，大部分网络是稀疏图，其节点的度都比较小，故k可认为是一个常数。由于数据集采集原因，网络的时间快照数n也可认为是一个常数。因此，对于稀疏的大型网络而言，wSimC算法的时间复杂度为O(N²)。In the original SimC algorithm, for the network G (taking n time snapshots, the average number of nodes is N, and the average degree is k), the number of nodes that need to calculate the similarity is (n-1)N. In the improved wSimC algorithm, when calculating the similarity of a node, it is also necessary to calculate the link prediction accuracy of all edges at this point, and the time complexity is O((Nk)k), so the time complexity of the entire algorithm is O ((n-1)N*(Nk)k). When Barabasi and Alber studied real networks, they found that most of them obeyed the power law distribution, most of the networks were sparse graphs, and the degrees of their nodes were relatively small, so k can be considered as a constant. Due to the collection of data sets, the number of time snapshots n of the network can also be considered as a constant. Therefore, for large sparse networks, the time complexity of the wSimC algorithm is O(N ² ).

通过对网络进行相似性计算，可以得到网络演化序列GraphS。GraphS描述了网络的演化趋势，如平稳、发展、衰减等。分析序列的各个阶段，基于已经发生的事件信息，可以检测出新事件的发生。By calculating the similarity of the network, the network evolution sequence GraphS can be obtained. GraphS describes the evolution trend of the network, such as stability, development, decay, etc. By analyzing the various stages of the sequence, the occurrence of new events can be detected based on information about events that have already occurred.

网络平稳与事件Network Stability and Events

网络演化情况可按照波动性分为三种状态：Network evolution can be divided into three states according to volatility:

(1)若网络G在t和t+1时刻的网络完全相同，则称G在时间段[t,t+1]处于绝对平稳状态。此时，规定绝对平稳状态是一种理想状态，在真实网络中几乎不存在。(1) If the networks of the network G at time t and t+1 are exactly the same, then G is said to be in an absolutely stable state in the time period [t, t+1]. At this time, it is stipulated that The absolute stationary state is an ideal state, which hardly exists in real networks.

(2)若网络G在t，t+1时刻的波动性小于阈值T，则称G在时间段[t,t+1]处于相对平稳状态，时间段[t,t+1]称为平稳段。相对平稳状态反映了网络在演化规律作用下的正常波动。(2) If the volatility of the network G at time t, t+1 is less than the threshold T, then G is said to be in a relatively stable state in the time period [t, t+1], and the time period [t, t+1] is called stable part. The relatively stable state reflects the normal fluctuation of the network under the action of evolution law.

(3)若网络G在t，t+1时刻的波动性超出阈值T，则称网络G在时间段[t,t+1]处于事件状态，t称为为事件点，时间段[t,t+1]称为事件段。事件可定义为干扰网络正常演化的事情，它通过改变具体点或边的拓扑结构来影响网络演化。(3) If the volatility of the network G at time t, t+1 exceeds the threshold T, it is said that the network G is in the event state in the time period [t, t+1], t is called the event point, and the time period [t, t+1] is called the event point. t+1] is called the event segment. An event can be defined as something that interferes with the normal evolution of the network, and it affects the network evolution by changing the topology of specific nodes or edges.

真实网络演化时，长期处于相对平稳状态，事件的发生导致其进入事件状态，事件影响消失后又恢复到新的相对平稳状态，依次交替。When the real network evolves, it is in a relatively stable state for a long time. The occurrence of an event causes it to enter the event state, and then returns to a new relatively stable state after the impact of the event disappears, alternating in turn.

步骤2中事件检测算法的具体实现过程是：The specific implementation process of the event detection algorithm in step 2 is:

基于已有事件点组成的集合，分析GraphS，即可判定出网络演化状态，检测事件的发生。本发明提出的事件检测算法EventD基本思想如下：Based on the set of existing event points, analyzing GraphS can determine the network evolution status and detect the occurrence of events. The event detection algorithm EventD basic thought that the present invention proposes is as follows:

根据已经发生的事件点组成的事件序列EventO＝{k|t＝k时发生事件，分析演化序列GraphS，学习得到事件发生值区间[L,H](L为事件发生下边界，H为上边界)，选取T∈[L,H]为发生阈值。为了提高时间检测的灵活性，阈值T由人工确定。在k＝t时，若则网络在时间段[k,k+1]处于事件状态。反之则为相对平稳状态。分析完毕，最终输出事件序列EventS，如式(11)所示。An event occurs when the event sequence EventO={k|t=k is formed according to the event points that have occurred, Analyze the evolution sequence GraphS, and learn the event occurrence value interval [L,H] (L is the lower boundary of event occurrence, H is the upper boundary), and select T∈[L,H] as the occurrence threshold. In order to improve the flexibility of time detection, the threshold T is manually determined. When k=t, if Then the network is in the event state during the time period [k,k+1]. On the contrary, it is a relatively stable state. After the analysis is completed, the event sequence EventS is finally output, as shown in formula (11).

$E E. v v e e n no t t S S = = {{k k | | \overset{^^}{D D.} (({g g}^{k k + + 11} | | | | {g g}^{k k})) > > T T,, k k &Element; &Element; [[11,, n no - - 11]]}} - - - - - - ((1111));;$

具体过程如算法3。The specific process is as Algorithm 3.

本发明还提出了一种简单的事件检测方法评价标准；The invention also proposes a simple event detection method evaluation standard;

假定k₁，k₂为网络G中紧邻的两个事件点(k₁+1<k₂)，G在[k₁+1,k₂]时间段处于相对平稳状态，[k₂,k₂+1]处于事件状态，定义事件敏感表现Assume that k ₁ and k ₂ are two adjacent event points in the network G (k ₁ +1<k ₂ ), G is in a relatively stable state during the [k ₁ +1,k ₂ ] time period, [k ₂ ,k ₂ +1] In event state, define event sensitive representation

$P P e e r r = = \frac{\overset{^^}{D D.} (({g g}^{{k k}_{22} + + 11} | | | | {g g}^{{k k}_{22}})) - - \frac{{Σ Σ}_{i i = = {k k}_{11} + + 11}^{{k k}_{22} - - 11} \overset{^^}{D D.} (({g g}^{i i + + 11} | | | | {g g}^{i i}))}{{k k}_{22} - - {k k}_{11} - - 22}}{\frac{{Σ Σ}_{i i = = {k k}_{11} + + 11}^{{k k}_{22} - - 11} \overset{^^}{D D.} (({g g}^{i i + + 11} | | | | {g g}^{i i}))}{{k k}_{22} - - {k k}_{11} - - 22}} - - - - - - ((1212))$

其中，这是因为事件段的波动性必然大于平稳段。in, This is because the event segment is necessarily more volatile than the stationary segment.

事件敏感表现Per是网络事件段波动性与平稳段平均波动性的比值，比值越大表明事件越易被检测出，可用来评价事件检测方法的表现。在实际应用中，可针对相似性指标、权重策略等参数设计不同的事件检测算法，根据Per的评价结果，选取最优的参数配置。The event sensitivity performance Per is the ratio of the volatility of the network event segment to the average volatility of the stable segment. The larger the ratio, the easier it is for the event to be detected, and it can be used to evaluate the performance of the event detection method. In practical applications, different event detection algorithms can be designed for parameters such as similarity index and weight strategy, and the optimal parameter configuration can be selected according to the evaluation results of Per.

应当理解的是，本说明书未详细阐述的部分均属于现有技术。It should be understood that the parts not described in detail in this specification belong to the prior art.

应当理解的是，上述针对较佳实施例的描述较为详细，并不能因此而认为是对本发明专利保护范围的限制，本领域的普通技术人员在本发明的启示下，在不脱离本发明权利要求所保护的范围情况下，还可以做出替换或变形，均落入本发明的保护范围之内，本发明的请求保护范围应以所附权利要求为准。It should be understood that the above-mentioned descriptions for the preferred embodiments are relatively detailed, and should not therefore be considered as limiting the scope of the patent protection of the present invention. Within the scope of protection, replacements or modifications can also be made, all of which fall within the protection scope of the present invention, and the scope of protection of the present invention should be based on the appended claims.

Claims

1. A social network event detection method based on link prediction, characterized in that, comprising the following steps:

Step 1: Use the algorithm SimC to calculate the similarity of each time period for the input network G, and obtain the network evolution sequence GraphS according to the calculation results;

Step 2: Combine the threshold T with the event detection algorithm EventD on GraphS, and output the event sequence EventS.

2. the social network event detection method based on link prediction according to claim 1, is characterized in that, the concrete realization process of step 1 is:

For a given network G={g ¹ ,g ² ,g ³ ,…,g ⁿ }, the network snapshot at time t is represented by graph g ^t ; in the network snapshot graphs g ^t and g ^t+1 of G, nodes The similarity of i is defined as the degree to which node i remains stable in g ^t , g ^t+1 , with Represents; the similarity of graph g ^t and g ^t+1 is the macroscopic performance of the superposition of the similarity of each node in the graph, expressed by S(g ^t , g ^t+1 ), defined as:

S S (({g g}^{t t},, {g g}^{t t + + 11})) = = \underset{i i &Element; &Element; {U u}_{t t,, t t + + 11}}{Σ Σ} s the s (({v v}_{i i}^{t t},, {v v}_{i i}^{t t + + 11})) \times \times \frac{11}{| | {U u}_{t t,, t t + + 11} | |};;

Among them, U _t,t+1 ＝g ^t ∪g ^t+1 ; v _i at time t and t+1 are regarded as different nodes, expressed as and

The volatility of the network in the time period [t,t+1] is expressed by means, defined as:

\overset{^^}{D D.} (({g g}^{t t + + 11} | | | | {g g}^{t t})) = = \frac{11}{S S (({g g}^{t t},, {g g}^{t t + + 11}))};;

The network evolution sequence GraphS is defined as a collection of volatility in each time period, as shown in the following formula;

G G r r a a p p h h S S = = {{\overset{^^}{D D.} (({g g}^{22} | | | | {g g}^{11})),, \overset{^^}{D D.} (({g g}^{33} | | | | {g g}^{22})),, ... ...,, \overset{^^}{D D.} (({g g}^{n no} | | | | {g g}^{n no - - 11}))}} . .

3. the social network event detection method based on link prediction according to claim 2, is characterized in that, the similarity of described node i The calculation process is: apply the 8 indicators in the link prediction to the similarity calculation, introduce a virtual node V _virtual , called "observer", V _virtual has a virtual edge with all nodes in the network , to obtain 8 kinds of node similarity calculation indexes, as shown in Table 1;

Table 1 Computation of node similarity by introducing virtual points

Carry out link prediction on the network G, select the index corresponding to the best AUC as the optimal index O, and the index O reflects the evolution mechanism of the network;

Among them, AUC is used as the main index to measure the accuracy of the link prediction algorithm, which is specifically defined as:

A A U u C C = = \frac{{n no}^{' '} + + 0.5 0.5 {n no}^{' '' '}}{n no};;

n represents the number of comparisons, n' represents the number of times the score of randomly selected edges from the test set is greater than the score of randomly selected edges from the non-existent edge formation set, and n" represents the equal number of times.

4. the social network event detection method based on link prediction according to claim 2, is characterized in that, the similarity of described node i The calculation of , and finally the calculation of the improved node similarity Its implementation process includes the following sub-steps:

Step 1: Apply the 8 indicators in the link prediction to the similarity calculation, and introduce a virtual node V _virtual , also called "observer". V _virtual has a virtual edge with all nodes in the network , to obtain 8 kinds of node similarity calculation indexes, as shown in Table 1;

Table 1 Computation of node similarity by introducing virtual points

A A U u C C = = \frac{{n no}^{' '} + + 0.5 0.5 {n no}^{' '' '}}{n no};;

n represents the number of comparisons, n' represents the number of times that the score of randomly selected edges from the test set is greater than the number of times that the score of randomly selected edges is formed from the set of non-existing edges, and n" represents the number of times that they are equal;

Step 2: Calculate the evolution weight of the node

Step 3: Calculate the improved node similarity

5. the social network event detection method based on link prediction according to claim 4, is characterized in that, the evolution weight of described node Depend on The ratio to the random evolution value of 0.5 is determined, namely:

w w (({v v}_{i i}^{t t},, {v v}_{i i}^{t t + + 11})) = = \frac{{VAUC VAUC}_{i i}^{t t,, t t + + 11}}{0.5 0.5};;

in, Indicates the link prediction accuracy of node v _i , which is defined as the average value of the link prediction accuracy of newly added edges of v _i in g ^t+1 ,

{VAUC VAUC}_{i i}^{t t,, t t + + 11} = = \{\begin{matrix} \frac{\underset{e e &Element; &Element; {NE NE}_{i i}^{t t,, t t + + 11}}{Σ Σ} {EAUC EAUC}_{e e}^{t t,, t t + + 11}}{| | {NE NE}_{i i}^{t t,, t t + + 11} | |},, & | | {NE NE}_{i i}^{t t,, t t + + 11} | | > > 00 \\ 0.5 0.5,, & | | {NE NE}_{i i}^{t t,, t t + + 11} | | = = 00 \end{matrix};;

in, Indicates the set of newly added edges of v _i in g ^t+1 , It reflects the degree of fit between the node change and the evolution law, and the larger the value, the more the node change conforms to the evolution law.

6. the social network event detection method based on link prediction according to claim 4, is characterized in that, the evolution weight of described node The calculation formula is:

w w (({v v}_{i i}^{t t},, {v v}_{i i}^{t t + + 11})) = = \{\begin{matrix} \frac{{VAUC VAUC}_{i i}^{t t,, t t + + 11}}{α α},, & {VAUC VAUC}_{i i}^{t t,, t t + + 11} < < 0.5 0.5 \\ {VAUC VAUC}_{i i}^{t t,, t t + + 11},, & 0.5 0.5 < < = = {VAUC VAUC}_{i i}^{t t,, t t + + 11} < < 0.8 0.8 \\ α α \times \times {VAUC VAUC}_{i i}^{t t,, t t + + 11},, & 0.8 0.8 < < = = {VAUC VAUC}_{i i}^{t t,, t t + + 11} < < = = 1.0 1.0 \end{matrix};;

Among them, α represents the scaling factor, which aims to distinguish the three levels more clearly; Indicates the link prediction accuracy of node v _i , which is defined as the average value of the link prediction accuracy of newly added edges of v _i in g ^t+1 ,

{VAUC VAUC}_{i i}^{t t,, t t + + 11} = = \{\begin{matrix} \frac{\underset{e e &Element; &Element; {NE NE}_{i i}^{t t,, t t + + 11}}{Σ Σ} {EAUC EAUC}_{e e}^{t t,, t t + + 11}}{| | {NE NE}_{i i}^{t t,, t t + + 11} | |},, & | | {NE NE}_{i i}^{t t,, t t + + 11} | | > > 00 \\ 0.5 0.5,, & | | {NE NE}_{i i}^{t t,, t t + + 11} | | = = 00 \end{matrix};;

7. the social network event detection method based on link prediction according to claim 2, is characterized in that, the concrete realization of step 2 comprises the following substeps:

Step 2.1: Calculate the event occurrence value interval according to the existing event;

Step 2.2: determine event occurrence threshold T;

Step 2.3: traverse GraphS and output event sequence EventS.

8. The social network event detection method based on link prediction according to claim 2, characterized in that, the specific implementation process of step 2 is: according to the event sequence EventO={k|t=k formed by the event points that have taken place When an event occurs, k∈[1,m],m≤n}, analyze the evolution sequence GraphS, and learn the event occurrence value interval [L,H], where L is the lower boundary of event occurrence, and H is the upper boundary; select T∈ [L, H] is the occurrence threshold, and the threshold T is determined manually; when k=t, if Then the network is in the event state in the time period [k, k+1], otherwise it is in a relatively stable state; after the analysis is completed, the event sequence EventS is finally output:

E E. v v e e n no t t S S = = {{k k | | \overset{^^}{D D.} (({g g}^{k k + + 11} | | | | {g g}^{k k})) > > T T,, k k &Element; &Element; [[11,, n no - - 11]]}} . .

9. The social network event detection method based on link prediction according to any one of claims 2-8, wherein the evaluation criteria of the event detection method are:

Assume that k ₁ and k ₂ are two adjacent event points in the network G, k ₁ +1<k ₂ ; G is in a relatively stable state during the [k ₁ +1,k ₂ ] time period, [k ₂ ,k ₂ + 1] In event state, define event-sensitive performance:

P P e e r r = = \frac{\overset{^^}{D D.} (({g g}^{{k k}_{22} + + 11} | | | | {g g}^{{k k}_{22}})) - - \frac{{Σ Σ}_{i i = = {k k}_{11} + + 11}^{{k k}_{22} - - 11} \overset{^^}{D D.} (({g g}^{i i + + 11} | | | | {g g}^{i i}))}{{k k}_{22} - - {k k}_{11} - - 22}}{\frac{{Σ Σ}_{i i = = {k k}_{11} + + 11}^{{k k}_{22} - - 11} \overset{^^}{D D.} (({g g}^{i i + + 11} | | | | {g g}^{i i}))}{{k k}_{22} - - {k k}_{11} - - 22}};;

in,

The event sensitivity performance Per is the ratio of the volatility of the network event segment to the average volatility of the stable segment. The larger the ratio, the easier it is for the event to be detected.