CN111126437A - Abnormal group detection method based on weighted dynamic network representation learning - Google Patents

Abnormal group detection method based on weighted dynamic network representation learning Download PDF

Info

Publication number
CN111126437A
CN111126437A CN201911155412.1A CN201911155412A CN111126437A CN 111126437 A CN111126437 A CN 111126437A CN 201911155412 A CN201911155412 A CN 201911155412A CN 111126437 A CN111126437 A CN 111126437A
Authority
CN
China
Prior art keywords
abnormal
network
node
link
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911155412.1A
Other languages
Chinese (zh)
Other versions
CN111126437B (en
Inventor
冯昊
刘琰
周资乔
钟凤喆
王博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information Engineering University of PLA Strategic Support Force
Original Assignee
Information Engineering University of PLA Strategic Support Force
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information Engineering University of PLA Strategic Support Force filed Critical Information Engineering University of PLA Strategic Support Force
Priority to CN201911155412.1A priority Critical patent/CN111126437B/en
Publication of CN111126437A publication Critical patent/CN111126437A/en
Application granted granted Critical
Publication of CN111126437B publication Critical patent/CN111126437B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to the technical field of dynamic network anomaly detection, and discloses an anomaly group detection method based on weighted dynamic network representation learning, which comprises the following steps: step 1: constructing a weighted dynamic network representation learning model based on a deep self-coding neural network; step 2: performing abnormal link identification based on the constructed weighted dynamic network representation learning model to obtain an abnormal link set; and step 3: and constructing a full-connection neural network model based on the abnormal link set, and detecting abnormal groups through the full-connection neural network model. The invention combines the abnormal link with the fully-connected neural network abnormality detection model, expands the application range of the invention based on the abnormal link, and carries out experimental verification on the safe mail data set and the AS-level Internet data set, and the experimental result shows that the invention has better abnormal group detection effect.

Description

Abnormal group detection method based on weighted dynamic network representation learning
Technical Field
The invention belongs to the technical field of dynamic network anomaly detection, and particularly relates to an anomaly group detection method based on weighted dynamic network representation learning.
Background
With the rapid development of network technology and the wide popularization of computers and mobile intelligent terminals, networks greatly change the work and life of people, and meanwhile, the network scale becomes larger and larger, and the structure becomes more and more complex. Therefore, anomaly detection on a dynamic network becomes more and more difficult, structural features in the graph are difficult to comprehensively capture based on the existing graph structural feature statistical method, and how to effectively identify an anomaly group in a changing network is a current research hotspot.
The basic idea of network representation learning is to change nodes in a network into multi-dimensional vector representation through a series of conversions, and to require to retain structural information in an original network as much as possible in the conversion process, so that tasks such as link prediction, node multi-label classification, community division and the like can be realized more conveniently by using the existing method. In currently known dynamic network representation learning methods, when a weighted network is faced, a random walk-based method increases or decreases the selection probability of a node according to the weight of a degree node when selecting a next hop node. The method can effectively reduce the distance between the nodes corresponding to the high-weight edges after representation learning, however, in an abnormal link detection task, whether the links between the nodes in the next time slice network are normal or not needs to be judged by using the representation of the nodes in the historical network, structural information between the nodes is learned by the method, and the weight information of the edges is not learned. Therefore, if the links exist among the nodes to be detected but the weights of the links are obviously larger or smaller than the weights of the links in the past, the method cannot successfully identify the weight abnormality of the links. Meanwhile, The duration of an abnormal event in a dynamic network is long or short, which is often difficult to be captured by a single time slice network, and an abnormal detection model based on a fully-connected neural network is provided in a paper (Miz V, Riccaud B, Benzi K, et al. However, in this paper, the node anomaly is defined as a sudden increase of traffic of a node within a certain time, and the change of the communication structure between nodes is not considered. Therefore, the invention constructs a weighted dynamic network representation learning model, performs abnormal link detection on the whole network on the basis of the model, finally constructs a full-link neural network based on abnormal links, and detects and determines an abnormal node set.
Disclosure of Invention
The invention provides an abnormal group detection method based on weighted dynamic network representation learning, aiming at the problems that the existing network representation learning method cannot well learn the corresponding relation between edges and weights when facing a weighted dynamic network and cannot effectively identify weight abnormality when abnormal link detection is carried out.
In order to achieve the purpose, the invention adopts the following technical scheme:
an abnormal group detection method based on weighted dynamic network representation learning comprises the following steps:
step 1: constructing a weighted dynamic network representation learning model based on a deep self-coding neural network;
step 2: performing abnormal link identification based on the constructed weighted dynamic network representation learning model to obtain an abnormal link set;
and step 3: and constructing a full-connection neural network model based on the abnormal link set, and detecting abnormal groups through the full-connection neural network model.
Further, the step 1 comprises:
step 1.1: for dynamic networks G ═ G1,G2,…,Gt,Gt+1,…,GnEach edge e in }iE, collecting the weight values of the E in different time slice networks, and collecting the edge EiIs marked as wei={w1,w2,...,wmFor sequence weiDiscretizing the same;
step 1.2: in each time slice network, a random walk path set is constructed based on each node in each time slice network, and givenNetwork G ═ V, E, W, for any V1E to V, and constructing a random walk path set omegav1={(v1,v2,...,vl,w12,w23,...,w(l-1)l),...|(vi,vi+1)∈E∩wi(i+1)E.g. W, wherein l is the length of the constructed random walk path, Wi(i+1)Is an edge (v)i,vi+1) The weight of (c);
step 1.3: and (3) regarding the weight of the edge as a special node, coding each node in the random walk path as an input layer and an output layer of the deep self-coding neural network in a one-hot coding mode, learning the network structure and the weight information of the edge through a minimum loss function in an intermediate layer, and simultaneously compressing the dimension represented by each node vector to a preset vector representation dimension d.
Further, the step 1.3 includes:
step 1.3.1: minimizing the difference between the input layer and the output layer by optimizing a first objective equation:
Figure RE-GDA0002402836200000021
wherein, | Ω | is the number of random walk paths, and l is the length of the random walk paths;
Figure RE-GDA0002402836200000031
is the output of the nl-th layer, i.e. the output layer,
Figure RE-GDA0002402836200000032
W(nl-1)is the nl-1 th layer weight, b(nl-1)Represents the nl-1 th layer bias;
Figure RE-GDA0002402836200000033
for the ith random walk path
Figure RE-GDA0002402836200000034
Any node of
Figure RE-GDA0002402836200000035
The one-hot code of (1), which is the input of the 0 th layer, i.e., the input layer,
Figure RE-GDA0002402836200000036
for the ith random walk path edge (v)l-1,vl) The weight of (c);
step 1.3.2: in the middle layer, for the random walk path (v)1,v2,...,vl,w12,w23,...,w(l-1)l) Minimizing the first half (v) of the path by optimizing a second objective equation1,v2,...,vl) The distance between the nodes, the second objective equation is:
Figure RE-GDA0002402836200000037
wherein ,
Figure RE-GDA0002402836200000038
and
Figure RE-GDA0002402836200000039
coding one-hot of two adjacent nodes in the random walk path;
step 1.3.3: minimizing the distance between the edge and the weight node by optimizing a third objective equation, the third objective equation being:
Figure RE-GDA00024028362000000310
wherein
Figure RE-GDA00024028362000000311
Is (v)1,v2,...,vr) Edge (e) between nodes12,e23,...,e(r-1)r) Any one side e(j-1)jA vector representation of (a);
step 1.3.4: sparsity of input-output vectors is limited by KL divergence:
Figure RE-GDA00024028362000000312
wherein d is the dimension represented by the vector, p is the sparsity parameter,
Figure RE-GDA00024028362000000313
is the mean degree of activation of the layer τ neurons,
Figure RE-GDA00024028362000000314
is the degree of activation of the i-dimensional neuron,
Figure RE-GDA00024028362000000315
is the average activation degree of the i-dimension neuron, and is tau epsilon [1, nl ∈];
Step 1.3.5: and (3) synthesizing the formula 1, the formula 2, the formula 3 and the formula 4, constructing a loss function, and finishing the construction of a weighted dynamic network representation learning model:
Figure RE-GDA00024028362000000316
wherein
Figure RE-GDA00024028362000000317
Representing the weight decay function, W(τ)For the τ -th layer weight, F denotes the norm.
Further, the step 2 comprises:
step 2.1: dynamically updating vector representation of nodes, and setting sampling probability s for random walk path of 1 st to t th time slice networki
Figure RE-GDA0002402836200000041
Wherein i is a time value;
step 2.2: acquiring a random walk path set by integrating a current time slice network and a historical time slice network, sequentially sending the random walk paths into a constructed weighted dynamic network representation learning model, and obtaining low-dimensional vector representation of nodes by minimizing a loss function;
step 2.3: and after the vector representation of each node of the t-th time slice network is obtained, abnormal link detection is carried out on each link of the t + 1-th time slice network based on the vector representation of the current node, and an abnormal link set is obtained.
Further, the step 2.3 comprises:
step 2.3.1: and link exception identification:
average distance between all edge-connected node pairs in 1 st to t th time slice network
Figure RE-GDA0002402836200000043
As a reference, node vi,vjThe degree of closeness between is defined as:
Figure RE-GDA0002402836200000042
wherein ,dijIs a node vi,vjThe Euclidean distance between them;
setting an abnormal link judgment threshold k, and when two nodes with the similarity smaller than k in the network are linked in a time slice t +1, determining that the node pair has link abnormality in the time slice t +1, wherein the link is an abnormal link;
step 2.3.2: weight anomaly identification:
by a pair of nodes vi,vjThe vector representation of (a) is subjected to a Hadamard product operation to obtain an edge eijBy computing the edge eijPredicting the weight of the edge in a t +1 time slice by the Euclidean distance of each weight node in a d-dimensional space; if the predicted weight value does not match the actual weight value, determining the edge eijA weight exception occurs at the t +1 th time slice, and the link is an exception link;
step 2.3.3: the abnormal link set is obtained through the step 2.3.1 and the step 2.3.2.
Further, the step 3 comprises:
step 3.1: regarding all abnormal links in the abnormal link set as edges among nodes, and constructing a full-connection neural network model based on the abnormal link set, thereby outputting a plurality of abnormal subgraphs and obtaining an abnormal subgraph set;
step 3.2: and taking the maximum connected subgraph in the abnormal subgraph set to output as a final abnormal group.
Compared with the prior art, the invention has the following beneficial effects:
1. according to the invention, the vector representation of the nodes and the edges is obtained by learning the structural information and the weight information of the edges in the dynamic network, and the abnormal node set is obtained by using the full-connection neural network model on the basis of abnormal link detection.
2. The invention designs a weighted dynamic network representation learning model, which learns the dynamic network structure information more comprehensively, considers the weight as a special node, synthesizes the node representation to obtain the vector representation of the edge, and minimizes the distance between the edge and the 'weight node' thereof, thereby learning the weight information in the network. After the node vector representation is obtained, the real dynamic network data set is used for carrying out abnormal link detection, and the effectiveness of the method is verified through experiments.
3. The invention combines the abnormal link with the fully-connected neural network abnormality detection model, expands the application range of the invention based on the abnormal link, and carries out experimental verification on the safe mail data set and the AS-level Internet data set, and the experimental result shows that the invention has better abnormal group detection effect.
Drawings
FIG. 1 is a basic flowchart of an abnormal group detection method based on weighted dynamic network representation learning according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a weighted dynamic network representation learning model architecture of an abnormal group detection method based on weighted dynamic network representation learning according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of the positions of the edges and the weighted nodes of a t-time slice network of an abnormal group detection method based on weighted dynamic network representation learning according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a dynamic network link structure change of an abnormal group detection method based on weighted dynamic network representation learning according to an embodiment of the present invention;
FIG. 5 is a second basic flowchart of an abnormal group detection method based on weighted dynamic network representation learning according to an embodiment of the present invention;
FIG. 6 is a diagram of the detection result of the anomaly in the Anran mail data set of the anomaly group detection method based on weighted dynamic network representation learning according to the embodiment of the present invention;
FIG. 7 is a diagram of the detection result of an abnormal group of Internet at level AS of Ribayone and Venezuela according to an abnormal group detection method of weighted dynamic network representation learning in the embodiment of the present invention;
fig. 8 is a statistical result diagram of abnormal link numbers of a libaran abnormal node set in an abnormal group detection method based on weighted dynamic network representation learning according to an embodiment of the present invention;
fig. 9 is a graph illustrating the evolution of the abnormal link number of the abnormal libaran node set according to the abnormal group detection method based on weighted dynamic network representation learning in the embodiment of the present invention;
FIG. 10 is a graph showing the statistical result of abnormal link numbers of Venezuela abnormal node set according to the abnormal group detection method based on weighted dynamic network representation learning in the embodiment of the present invention;
fig. 11 is a graph illustrating the evolution of abnormal link numbers in venezuelan abnormal node set according to the abnormal group detection method based on weighted dynamic network representation learning in the embodiment of the present invention.
Detailed Description
For a better understanding of the present invention, the meanings of some of the nouns appearing in the present invention are explained:
weighted dynamic network: the weighted dynamic network is a time-varying weighted network, and a dynamic network comprising n time slices is denoted by G ═ G1,G2,…,Gt,Gt+1,…,GnT-th time slice network Gt=(Vt,Et,Wt),VtFor the set of vertices in the network, EtRepresenting relationships between vertices for sets of edges, WtIs a set of edge weights.
Weight exception: given dynamic network G ═ G1,G2,…,Gt,Gt+1,…,GnNetwork G for any of its time slicest=(Vt,Et,Wt) For any one edge ei∈Et,ei={frm,to,wiWhere frm, to are the two end points of the edge, wtThe weight of the current edge is within the normal range of the weight [ w ] of the edge with frm and to as the end points in the range of n time slicesl,wh]If w isi<wlOr wi>whThen consider eiA weight anomaly occurs at time slice t.
Link exception: the link exception comprises link exception connection and link exception disconnection, and the given dynamic network G is { G ═ G1,G2,…,Gt,Gt+1,…,GnAnd after vector representation of each node of the t-1 time slice network is obtained, if two nodes v with low link probability occuri、vjIf the link occurs at a certain time t, the link behavior is called as abnormal link, and similarly, if two nodes v with high link probability occuri、vjAt some time t, the disconnection is said to be an abnormal disconnection of the link.
Synchronization exception chaining: given dynamic network G ═ G1,G2,…,Gt,Gt+1,…,GnAnd when abnormal links appear from a plurality of nodes in the dynamic network from the t-th time slice network to the s-th time slice network, consistency and unification are presented, so that the node set is called to have synchronous abnormal link behavior in the t-s time slice.
The invention is further illustrated by the following examples in conjunction with the accompanying drawings:
a network anomaly is defined as a group of nodes synchronously having abnormal link behavior over a period of time. Given a weighted dynamic network G ═ G1,G2,…,Gt,Gt+1,…,GnThe goal of our is to obtain a set of nodes with synchronized abnormal link behavior over a specified time period of the weighted dynamic network, for which the abnormal link set in the dynamic network is identified through network representation learning. For any time slice network t in dynamic networkAnd (3) learning the structural information of the 1 st to the tth time slice networks to detect abnormal links of the t +1 th time slice network, wherein the abnormal links comprise link weight abnormality and link abnormality.
After acquiring the whole weighted dynamic network abnormal link set, the method aims to acquire a node set synchronously generating abnormal behaviors in a certain time period, accordingly, the node set with connecting edges and synchronously generating the abnormal behaviors is searched on the basis of the abnormal link set, weights among nodes are acquired by comparing the abnormal behavior similarity of each node in the period, low-weight edges are pruned by setting a weight threshold value, and finally, a maximum connected subgraph (maximum sub-network output) is taken as the abnormal node set of the current time period.
In order to effectively detect abnormal groups of a weighted dynamic network, the invention discloses an abnormal group detection method based on weighted dynamic network representation learning, as shown in figure 1, a weighted dynamic network representation learning model is firstly established based on a deep self-coding neural network, abnormal link identification is carried out on the current dynamic network, an abnormal link set is fused with a fully-connected neural network, and finally an abnormal group (abnormal node set) is obtained. The three sections are described in detail below.
Step S11: constructing a weighted dynamic network representation learning model Weiightwalk based on a depth self-coding neural network; the WeightWalk model can effectively learn the network structure information and the edge weight information, and is described in the following three parts of weight discretization, weighted random walk path generation and representation learning respectively.
Step S11.1: weight discretization:
in the weighted dynamic network, the weight between the nodes is a continuous value, however, the continuous value is not beneficial to the representation learning of the node, and the continuous value needs to be dispersed. For dynamic networks G ═ G1,G2,…,Gt,Gt+1,…,GnEach edge e in }iE.g. E, and the weight values of the E in different time slice networks, edge E, are collectediIs marked as wei={w1,w2,...,wmFor sequence weiIt can be discretized by various methods, such as equal frequency partition, equidistant partition, clustering partition, etc., where we assume that the sequence satisfies normal distribution, and calculate its mean μ and variance σ2
Figure RE-GDA0002402836200000071
Given a threshold α, for any wi∈weiIf w isiValues of (d) fall in [ mu- α, mu + α]The weight of the region other than the region is set to 1, and if the value falls within [ mu- α, [ mu + α ]]Then set its weight to 0.α is usually taken to be 3 σ because if the sequence w iseiIf the normal distribution is satisfied, the probability that the value falls outside the region is only 0.3%, which is a small probability event, and the value of α can be determined according to the actual situation.
Step S11.2: generating a weighted random walk path:
in each time slice network, a set of random walk paths is constructed based on each node in each time slice network, given that network G ═ V, E, W, for any V1E to V, and constructing a random walk path set omegav1={(v1,v2,...,vl,w12,w23,...,w(l-1)l),...|(vi,vi+1)∈E∩wi(i+1)E.g. W, wherein l is the length of the constructed random walk path, Wi(i+1)Is an edge (v)i,vi+1) In order to learn the corresponding relationship between the edges and the weights in the network, the edge weights and the nodes need to be transmitted into the model, and the weights of the nodes and the edges need to be separated in the model learning stage.
Step S11.3: deep self-coding neural network representation learning:
the purpose of network representation learning is to learn a mapping function f to map each node in the network into a low-dimensional vector: v → RdWhere d is the dimension of the vector representation. The existing NetWalk dynamic network representation learning algorithm uses a self-coding neural network to learn to perform representation learning on each time slice network, but has two problems: first, there is no learning in dynamic network representationConsidering the condition of the decay of the importance of the historical path, for example, when the node representation of the nth time slice is learned, the node link importance of the (n-100) th time slice is obviously far lower than that of the node link of the nth time slice. Secondly, the weights of the edges are not learned when the weighted network is processed, so that the method cannot successfully identify the weight abnormality if the links exist among the nodes to be detected but the weights of the nodes to be detected are obviously larger or smaller than usual when an abnormal link detection task is performed. In order to solve the two problems, the invention provides a weight learning model for a weighted dynamic network, namely, weight walk, a model framework is shown in fig. 2, the model input is a weighted random walk path, the weights of edges in the model are regarded as special nodes, each node in the random walk path is coded by a one-hot coding mode to be used as an input layer and an output layer of a self-coding neural network, the weight information of a network structure and the edges is learned by a minimum loss function in the middle layer, and meanwhile, the dimension represented by each node vector is compressed to a preset vector representation dimension d.
In this model, assuming that the model has nl layers in total, the input layer is designated as layer0The output layer is marked as layernlThe intermediate layer is collectively called layerml. Giving ith random walk path
Figure RE-GDA0002402836200000081
For any node
Figure RE-GDA0002402836200000082
Its one-hot code is described as
Figure RE-GDA0002402836200000083
The whole random walk path is recorded as
Figure RE-GDA0002402836200000084
Given a layer weight matrix W(τ)The τ th layer bias matrix b(τ),τ∈[1,nl],f(τ)(.) represents the output of the layer # 0 of the model with the input of layer 0 being
Figure RE-GDA0002402836200000091
The nth layer output is
Figure RE-GDA0002402836200000092
For self-coding neural networks, it is desirable to minimize the difference between the input and output of the model, using l2Regularization to minimize this difference, the target equation is written as:
Figure RE-GDA0002402836200000093
wherein, | Ω | is the number of random walk paths, l is the length of the random walk paths,
Figure RE-GDA0002402836200000094
W(nl-1)is the nl-1 th layer weight, b(nl-1)Indicating the nl-1 th layer bias.
In the middle layermlFor a random walk path (v)1,v2,...,vl,w12,w23,...,w(l-1)l) It is desirable to minimize the first half (v) of the path1,v2,...,vl) The distance between nodes, the target equation is:
Figure RE-GDA0002402836200000095
wherein ,
Figure RE-GDA0002402836200000096
and
Figure RE-GDA0002402836200000097
and coding the one-hot of two adjacent nodes in the ith random walk path.
We obtain a vector representation of the edges by merging vector representations of the nodes, (v)1,v2,...,vr) The edges between nodes can be represented as (e)12,e23,...,e(r-1)r) Wherein for any edge e(j-1)jObtaining a vector representation of the edge by performing a Hadamard product (Hadamard product) operation on the vector representation of the node,
Figure RE-GDA0002402836200000098
in order to learn the weights of the edges, it is necessary to minimize the distance between the edges and the weight nodes, and the objective equation is expressed as:
Figure RE-GDA0002402836200000099
to guarantee sparsity of the input-output vectors, KL divergence is used for limiting:
Figure RE-GDA00024028362000000910
wherein d is the dimension represented by the vector, p is the sparsity parameter,
Figure RE-GDA00024028362000000911
is the mean degree of activation of the layer τ neurons,
Figure RE-GDA00024028362000000912
is the degree of activation of the i-dimensional neuron,
Figure RE-GDA00024028362000000913
is the average activation degree of the i-dimension neuron.
To prevent overfitting, weight attenuation is added, and in summary, the final loss function is defined as:
Figure RE-GDA00024028362000000914
wherein
Figure RE-GDA0002402836200000101
Representing the weight decay function, W(τ)For the τ -th layer weight, F denotes the norm.
The weighted dynamic network representation learning model construction is completed by steps S11.1 to S11.3.
Step S12: performing abnormal link identification based on the constructed weighted dynamic network representation learning model to obtain an abnormal link set; after vector representations of nodes of a t-th time slice network are obtained, abnormal link detection is carried out on all links of the t + 1-th time slice network based on the vector representations of the current nodes, and the method comprises a link abnormality and weight abnormality identification method.
Step S12.1: dynamically updating vector representation of nodes by adopting a reservoir sampling strategy of a NetWalk model, considering the importance attenuation condition of historical paths, setting sampling probability s for random walk paths of the 1 st to t th time slice networks, wherein the influence of the paths farther away from the current time t on the current time slice network is smalleri
Figure RE-GDA0002402836200000102
Where i is the time value.
Step S12.2: acquiring a random walk path set by integrating a current time slice network and a historical time slice network, sequentially sending the random walk paths into a constructed weighted dynamic network representation learning model, and obtaining low-dimensional vector representation of nodes by minimizing a loss function;
step S12.3: and after the vector representation of each node of the t-th time slice network is obtained, abnormal link detection is carried out on each link of the t + 1-th time slice network based on the vector representation of the current node, and an abnormal link set is obtained.
Further, said step S12.3 comprises:
step S12.3.1: and link exception identification:
by computing node vi,vjThe Euclidean distance in d-dimensional space is taken as the distance between two nodes, the t-th time slice network representation is actually the vector representation of all the nodes which appear in the 1 st to t-th time slice network after learning, and therefore the average distance between all the connected edge node pairs appearing in the 1 st to t-th time slice network
Figure RE-GDA0002402836200000104
As a reference, node vi,vjThe degree of closeness between is defined as:
Figure RE-GDA0002402836200000103
wherein ,dijIs a node vi,vjThe euclidean distance between them.
Each link in the t +1 time slice network sets an abnormal link judgment threshold value k after acquiring the proximity degree of each node pair of the t time slice network, and when two nodes with the proximity degree smaller than k are linked in the t +1 time slice, the node pair is abnormal in the t +1 time slice, and the link is considered to be an abnormal link; or simultaneously setting a link abnormal disconnection judgment threshold value h, and when two nodes with the similarity degree larger than h have no link relation in the time slice of t +1, determining that the node pair has link abnormal disconnection in the time slice of t + 1. In general, we do not need to consider abnormal disconnection of links, and are only applicable to dynamic networks with highly consistent network nodes and links in time slices, such AS routing networks and road traffic networks.
Step S12.3.2: weight anomaly identification:
by a pair of nodes vi,vjThe vector representation of (a) is subjected to a Hadamard product operation to obtain an edge eijBy computing the edge eijPredicting the weight of the edge in a t +1 time slice by the Euclidean distance of each weight node in a d-dimensional space; if the predicted weight value does not match the actual weight value, determining the edge eijA weight exception occurs at the t +1 th time slice, and the link is an exception link;
by a pair of nodes vi,vjIs subjected to a Hadamard product (Hadamard product) operation to obtain an edge eijBy computing the edge eijThe Euclidean distance from each weight node in d-dimensional space is used for predicting the weight of the edge in t +1 time slices. Assume that the weights are simply set to two classes, 0 and 1, through [0, t []After learning of the time slice dynamic network representation, the edges effectively form a plurality of clusters around the weights, edge eijIs actually weightedAt the middle position of two clustering centers with weight 0 and 1, respectively calculating edge eijPredicting t +1 time slice edge e by distance to two weighted centersijIf the predicted weight value does not match the actual weight value, the edge e is determinedijWeight anomalies occur at time slice t + 1. As shown in FIG. 3, each point in FIG. 3 is the position relationship between each edge and the weight node in the tth time slice network, and edge eijThe weight of (c) is determined by the closest weight node.
Step S12.3.3: the set of abnormal links is obtained through steps S12.3.1 and S12.3.2.
Step S13: and constructing a fully-connected neural network based on the abnormal link set, and detecting abnormal groups through the fully-connected neural network.
When an abnormal event occurs in a dynamic network, abnormal behaviors often occur among a series of node sets, the number of communication among nodes is suddenly reduced, and abnormal links appear or disappear. The traditional anomaly detection method usually focuses on an anomaly time point, and then searches for an anomaly node after the anomaly time point is determined, and if the duration of an anomaly event is longer, the method cannot completely detect the anomaly. The dynamic network can be converted into a static network containing dynamic network structure information and time information by using a fully-connected neural network, and the abnormal detection of the dynamic network is converted into searching a connected subgraph on the static graph, wherein the connected subgraph contains the structure information and the time information. The method is based on maximizing the weight of edges among interconnected nodes with abnormal synchronization, the connection among the nodes with similar activities is enhanced, then the edges with low weight are cut off, and the fully-connected neural network is converted into one or a plurality of sub-network sets (sub-network sets) with similar behaviors. The sub-networks may be isolated from each other or connected into a whole, and the nodes of the sub-networks reserved after detection are output as a final abnormal node set.
However, in the method, the anomaly in the dynamic network is defined as sudden increase of the communication quantity of the nodes, sudden decrease of the communication quantity between the nodes is not considered, and also because only the communication quantity of the nodes is considered and the abnormality of the link structure between the nodes is not considered, as shown in fig. 4, the communication quantity of each node in the graph at the time point T0 and the time point T1 is 2, and no change occurs in view of the communication quantity, but the link structure between the nodes is changed greatly.
The method of the invention can be used for effectively detecting the link structure abnormality appearing in the graph 4 on the basis of the abnormal link detection, edges v1-v4 and v2-v3 at the time T1 in the graph 4 can be regarded as abnormal links, and the link (structure) abnormality and the link weight abnormal node set, namely the abnormal link set, can be effectively detected on the basis of the abnormal links. The invention fuses the abnormal link with the fully-connected neural network (abnormal detection) model for the first time, and the flow chart of the method is shown in figure 5. Obtaining a dynamic network exception link set denoted as omicron, omicron { (t)1,v1,v2),(t1,v1,v4),...,(tn,vx,vy) Where for arbitrary (t)i,vx,vy)∈ο,tiTime-of-occurrence for abnormal links, vx∈V、vyAnd E is V, and V is a node set existing in the dynamic network. The abnormal links in the dynamic network are regarded as edges among the nodes, the fully-connected neural network is constructed based on the abnormal link set, N nodes are totally arranged in the constructed fully-connected neural network model, the nodes correspond to all the nodes corresponding to the abnormal links in the dynamic network (the nodes which do not appear in the abnormal links in the V need to be discarded), and if the abnormal links exist between any two nodes in the N nodes, the connecting edges exist between the nodes. After learning of a fully-connected neural network (measuring node similarity, increasing the weight of edges among nodes with abnormal synchronization and pruning edges with low weight), an abnormal sub-graph set (abnormal sub-network set) is obtained.
Without verifying the effect of the invention, the following experiments were set up:
in order to verify the effectiveness of the weighted dynamic network representation learning model WeightWalk in weight learning, an abnormal link detection experiment is adopted for proving.
(a) Baseline method:
to verify the validity of the model, 5 current and up-to-date baseline methods were employed:
according to the method, a node sequence is generated through a random walk strategy, and then a skipgram model is adopted to learn vector representation of nodes.
And node2vec, wherein the method gives consideration to depth-first traversal and breadth-first traversal in random walk, so that the network structure can be learned more flexibly.
And LINE, optimizing the representation of the nodes by considering the first-order and second-order similarities of the nodes, and learning and representing by adopting the second-order similarity in a comparison test.
SDNE is a Deep learning-based Network representation model that uses self-encoders and local reservation constraints to learn the representation of nodes.
NetWalk, the method adopts a random walk and reservoir algorithm to dynamically update a random walk path, and is a dynamic network representation learning model based on a deep self-coding neural network.
Experimental data:
uci (uc irvine messages): the network provides for an on-line student community of users to communicate with each other. The nodes represent users and the edges represent messages sent.
DNC-the DNC data set is a leaked mail network, nodes in the network correspond to users, and the nodes are emails sent among the users.
Subreddit: the data contains discussions of 25000 reddit users for different topics, and nodes in the network correspond to the reddit users or the topics, and the edges represent one speech of the users on a certain topic.
(b) The experimental steps are as follows:
the Weightwalk model sets the length of a random walk path to be 3, the number of paths from each node to be 20, the number of layers of the self-coding neural network to be 5, and the dimensionality represented by the intermediate layer vector to be 100 and 20 respectively. In the experiment, the data set is sliced according to the day, and the data set is converted into a weighted dynamic network. On each data, 10000 edges are randomly selected as positive samples, 5000 linked negative sample edges are taken (namely, the two nodes have no linked edge relationship in the data set), and 5000 weighted negative sample edges are taken (namely, the two nodes have links in the data set, but the weights are different). After the vector representation of each node is obtained by the method, 20000 samples are detected, and training prediction is performed by using a logistic regression model, so as to finally obtain the Macro F1-score result listed in Table 1. F1-score, which considers the accuracy and recall of the classification model at the same time, can be regarded as a harmonic value of the accuracy and recall, and the calculation formula is as follows:
Figure RE-GDA0002402836200000141
where Precision is the accuracy and Recall is the Recall.
TABLE 1 abnormal chaining detection
UCI DNC Subreddit
LINE 0.581 0.516 0.597
DeepWalk 0.567 0.52 0.495
node2vec 0.57 0.523 0.582
SDNE 0.691 0.776 0.604
NetWalk 0.609 0.665 0.576
WeightWalk 0.776 0.8 0.789
As can be seen from Table 1, the Weightwalk model performs best on the data sets, and the abnormal links can be effectively detected through the node vector representation learned by the method. On the contrary, the other methods can not effectively detect the abnormal weight, which shows that the Weightwalk model has stronger applicability in the aspect of abnormal link detection when the model is oriented to the weighted dynamic network.
In order to verify the abnormal group detection effect of the invention, the accuracy of abnormal detection is evaluated by injecting abnormality into a real data set, and the method is used for an AS-level Internet data set to perform experimental verification.
Comparing the invention with a source method (see Miz V, Ricaud B, Benzi K, et al. analog detection of the dynamics of Web and social network using abnormal memory [ C ]// the world Wide Web conference. ACM,2019: 1290-.
And the experimental data set adopts UCI and DNC data sets, the data sets are sliced according to days, and the data sets are converted into a weighted dynamic network. And randomly extracting a certain time slice network on each data, selecting 25% of nodes to increase the traffic of the nodes in the current time slice network, then selecting 25% of nodes to change the communication structure of the nodes on the premise of not changing the traffic of the current time slice network, and taking the nodes as an abnormal node set to be detected. The Weightwalk model sets the length of a random walk path to be 3, the number of paths from each node to be 20, the number of layers of a self-coding neural network to be 5, the dimensions represented by vectors of middle layers to be 100 and 20 respectively, after the vectors of the nodes are obtained to be represented, the abnormal node set is detected based on abnormal links, and finally the Macro F1-score result listed in the table 2 is obtained.
TABLE 2 comparison of abnormal population detection experiments
DNC UCI
WeightWalk_Anomaly 0.652 0.550
Method 0.584 0.316
As can be seen from Table 2, the method of the invention has better performance on the data set, and can effectively identify the node traffic sudden increase abnormity and the communication structure abnormity. Meanwhile, in view of the variability and complexity of the dynamic network, the method is slightly inferior in the UCI data set, and has a certain relationship with the loose network structure and the loose connection.
In order to further verify the detection effect of the abnormal group, experimental comparison is carried out on the real data set.
We adopt the secure mail data set and the AS-level Internet dynamic network data set to evaluate the detection effect of the invention, meanwhile, due to AS-level Internet dynamic network data set abnormal events (disconnection and power failure of submarine optical cables), the communication traffic of Internet operators in related countries is reduced and the communication structure is changed, therefore, The abnormal group detection method based on The traffic "sudden increase" in The paper (see Miz V, Ricaud B, Benzi K, et al. analog detection in The dynamic of Web and social network using The abnormal memory [ C ]// The World Wide Web conference. ACM,2019:1290-, the weighting dynamic network abnormal group detection model based on the abnormal link can better detect the abnormal events, this also shows that our dynamic network abnormal group detection model based on abnormal link set has better applicability.
(c) Anran mail data set experiment
The Aniran mail data set is the incoming and outgoing mail of hundreds of high-level managers in a certain company for years and is disclosed. Since the data set not only contains the intercommunication between the members of the company, but there is also a lot of communication with the personnel outside the company. Therefore, in the experiment, users who send less than 3 mails in the last 5 years in the Anran mail network are firstly filtered, certain cleaning is carried out on data, and finally only the communication data among the users who send the redundant 3 mails is reserved. The mailbox addresses and the sending time of the sender and the receiver in the mail record are extracted from the safe mail data set and used for constructing a mail network, nodes in the network represent communication members, if a member a sends a mail to a member b, an edge is added between ab, the mail communication record of 1999/1/4-2001/12/31 three years in total is divided into 1092 time slices by taking one day as a unit, and the number of the mails sent in one day between a and b is used as the weight of the ab edge.
In the experiment, the Anran data set is detected by taking 12 months in 1999, 4 months, 5 months and 8 months in 2001 as abnormal detection intervals, and 23, 50, 92 and 12 nodes are respectively related to the maximum connected subgraphs. The node sets are respectively used as detection groups, the change of the number of abnormal links is compared within the range from 1999 to 2001 by 3, as shown in fig. 6, 4 months to be detected are respectively identified by black wide lines, the number of the abnormal links of the node sets is respectively standardized to be 0-100, as can be seen from fig. 6, the node sets respectively obtain the maximum value in each incident month, especially in 2001 by 4 months, 5 months and 8 months, the number of the abnormal links is respectively improved by 50% -300% compared with other months, and the effectiveness of the method is proved to a certain extent. In 2001, anomaly detection in month 5 involves 92 nodes in total, and meanwhile, the number of abnormal links in the node set is increased by 2 to 3 times in month 5 compared with the number of abnormal links in the rest of months, which shows that events occurring in month 5 have a large influence on the peace company.
(d) AS-level Internet data set experiment
At a specific time t, the AS-level Internet of a certain country refers to a network snapshot composed of all ases directly connected to the AS belonging to the country, and is denoted AS G ═ G1,G2,…,Gt,Gt+1,…,Gn}. Wherein the t-th time slice network Gt=(Vt,Et,Wt),VtFor the national AS autonomous domain and other national AS autonomous domains directly connected to the national AS autonomous domain, EtFor edges between autonomous domains of AS, WtIs a set of edge weights. During a period of time, from Gt=(Vt,Et,Wt) The formed dynamic network G can reflect the evolution trend of the state network communication state. In general, the normal change of G reflects the gradual evolution law of the AS-level Internet scale and topological relation, but the drastic change of the large-scale Internet is usually caused by network abnormal events, such AS router mis-configuration, physical link failure, and network failureAttack, etc., can cause the topological structure of AS level Internet in the country to change.
In this embodiment, AS-level Internet of libaran and venezuela is selected for experimental verification, and AS-level Internet dynamic networks of libaran and venezuela can be obtained by analyzing the public routing table data of the RouteViews project, with the number of AS pairs appearing in the routing table of the relevant country AS side weights. The sampling interval of the routing table of the Route Views item is 2h, so the time interval of the adjacent network snapshots in the dynamic network is also 2h, and meanwhile, the accuracy of the anomaly detection is also 2 h. The AS-level Internet data set for Ribes and Venezuela is shown in Table 3.
TABLE 3 statistical information for AS-level Internet data sets of Ribarengto and Venezuela
State of the country Starting time End time Number of snapshots
Lebane tender 2012/6/1 00:00 2012/7/31 22:00 727
Venezuela 2019/2/1 00:00 2019/3/31 22:00 706
According to BGPMon report (see BGPMon [ EB/OL ]. https:// www.bgpmon.net/internet output-in-lebanon-continuees-for-days /), from 16 minutes 16/7/4/2012, the ocean fiber of the Libayone is cut off, the Internet service is interrupted for several days, wherein the network of the operators such AS Liban Telecom (AS42020) of the maximum Libayone Internet operator is most seriously affected. Since the Route table sampling interval of the Route Views entry is 2h, the time point reflected on the Route table is 2012, 7, 4, and 18.
The detection interval is selected from 7/1/2012 to 7/9/7, and an abnormal subgraph obtained by abnormal group detection is shown in a graph G1 in fig. 7, where a node is an abnormal node set after detection, the weight of an edge is an abnormal value of a current connected edge, and a larger weight of an edge indicates that the current edge is more abnormal. Fig. 8 visually shows the evolution of the abnormal node set in the entire dynamic network, where the abscissa in fig. 8 is the time slice of the current dynamic network, and there are 727 time slices in total, the ordinate is the abnormal node set, and in the diagram, the number of abnormal links of the current node on the current time slice is indicated by using the lightness of the color (the darker the color is, the larger the number of abnormal links is, and the pure white represents no abnormal link), and the abnormal event occurrence time point 2012 is identified by a black straight-line segment at 7 month, 4 day 18. AS can be seen from fig. 8, after the black straight line segment is identified, the number of abnormal links in the node set increases sharply, and part of the abnormal links in the node set lasts until 7/30/2012, which indicates that the AS still does not return to normal until 7/30/2012.
Meanwhile, in order to further understand the behavior change of the abnormal node set when the abnormal event occurs, 7 nodes are selected from the node set, and the behavior of the abnormal node set when the abnormal event occurs is analyzed. As shown in fig. 9, a part in fig. 9 is statistics of the total number of abnormal links of the node set between 7 month 1 and 7 month 9, and b part in fig. 9 is variation of the number of abnormal links of the node set between 7 month 1 and 7 month 9. As can be seen from fig. 9, the abnormal link counts of the nodes all change significantly at 7/month/4/18/2012, and are not alleviated until 7/month/7/day later. The method also provides certain basis for analyzing the occurrence time point of the abnormal event and analyzing the influence caused by the abnormal event to a certain extent.
On 8 months of Union 3 (see CNN [ EB/OL ]. https:// edition. CNN. com/2019/03/08/americas/venezuelalackout-power-intl/index. htm), the power outage crises were encountered in most of the 7 th evening of Venezuela until 8 th morning, where many places were still in the dark. Although no official publishes the specific number of blackout cities, the local media has statistics that 22 of the 23 states in the country have blacked out.
The detection interval is selected from 3/2019 to 11/3, and the abnormal subgraph obtained by detecting the abnormal population is shown as a graph G2 in FIG. 7. Also, we use fig. 10 to visually display the evolution of the above abnormal node set in the whole dynamic network, and the black straight line segment is 3, 7 and 22 of 2019 (UTC) when the abnormal event occurs. As can be seen from fig. 10, the abnormal link number of the node set increases sharply after the black straight line segment, and gradually decreases after the black straight line segment lasts for several days. In order to further understand the behavior change of the abnormal node set when the abnormal event occurs, 7 nodes are also selected from the node set for analysis. As shown in fig. 11, a part in fig. 11 is statistics of the total number of abnormal links of the node set between 3 months and 3 days, and 11 days, and b part in fig. 11 is variation of the number of abnormal links of the node set between 3 months and 3 days, and 11 days. As can be seen from fig. 11, the number of abnormal links of the node changes significantly in 2019 at 3, 7 and 22, and the node is not alleviated until 3, 11 days. This indicates that venezuela AS level Internet routing fluctuates significantly and does not recover completely until 3 months and 11 days.
The experimental result proves the effectiveness of the method in detecting the abnormal group, the method can reveal the occurrence time of the abnormal event to a certain extent, and meanwhile, the influence degree of the current event on the individual can be evaluated by analyzing the evolution of the abnormal link number of a single node, so that a certain reference is provided for the influence analysis of the abnormal event.
According to the invention, the vector representation of the nodes and the edges is obtained by learning the structural information and the weight information of the edges in the dynamic network, and the abnormal node set is obtained by using the full-connection neural network model on the basis of abnormal link detection. The invention designs a weighted dynamic network representation learning model, which learns the dynamic network structure information more comprehensively, considers the weight as a special node, synthesizes the node representation to obtain the vector representation of the edge, and minimizes the distance between the edge and the 'weight node' thereof, thereby learning the weight information in the network. After the node vector representation is obtained, the real dynamic network data set is used for carrying out abnormal link detection, and the effectiveness of the method is verified through experiments. The invention combines the abnormal link with the fully-connected neural network abnormality detection model, expands the application range of the invention based on the abnormal link, and performs experimental verification on the safe mail data set and the AS-level Internet data set.
The above shows only the preferred embodiments of the present invention, and it should be noted that it is obvious to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and these modifications and improvements should also be considered as the protection scope of the present invention.

Claims (6)

1. An abnormal group detection method based on weighted dynamic network representation learning is characterized by comprising the following steps:
step 1: constructing a weighted dynamic network representation learning model based on a deep self-coding neural network;
step 2: performing abnormal link identification based on the constructed weighted dynamic network representation learning model to obtain an abnormal link set;
and step 3: and constructing a full-connection neural network model based on the abnormal link set, and detecting abnormal groups through the full-connection neural network model.
2. The abnormal group detection method based on weighted dynamic network representation learning according to claim 1, wherein the step 1 comprises:
step 1.1: for dynamic networks G ═ G1,G2,…,Gt,Gt+1,…,GnEach edge e in }iE, collecting the weight values of the E in different time slice networks, and collecting the edge EiWeight value sequence ofIs marked as wei={w1,w2,...,wmFor sequence weiDiscretizing the same;
step 1.2: in each time slice network, a set of random walk paths is constructed based on each node in each time slice network, given that network G ═ V, E, W, for any V1E to V, and constructing a random walk path set omegav1={(v1,v2,...,vl,w12,w23,...,w(l-1)l),...|(vi,vi+1)∈E∩wi(i+1)E.g. W, wherein l is the length of the constructed random walk path, Wi(i+1)Is an edge (v)i,vi+1) The weight of (c);
step 1.3: and (3) regarding the weight of the edge as a special node, coding each node in the random walk path as an input layer and an output layer of the deep self-coding neural network in a one-hot coding mode, learning the network structure and the weight information of the edge through a minimum loss function in an intermediate layer, and simultaneously compressing the dimension represented by each node vector to a preset vector representation dimension d.
3. The abnormal group detection method based on weighted dynamic network representation learning according to claim 2, wherein the step 1.3 comprises:
step 1.3.1: minimizing the difference between the input layer and the output layer by optimizing a first objective equation:
Figure FDA0002284670440000011
wherein, | Ω | is the number of random walk paths, and l is the length of the random walk paths;
Figure FDA0002284670440000012
is the output of the nl-th layer, i.e. the output layer,
Figure FDA0002284670440000021
W(nl-1)is the nl-1 th layer weight, b(nl-1)Represents the nl-1 th layer bias;
Figure FDA0002284670440000022
for the ith random walk path
Figure FDA0002284670440000023
Any node of
Figure FDA0002284670440000024
The one-hot code of (1), which is the input of the 0 th layer, i.e., the input layer,
Figure FDA0002284670440000025
for the ith random walk path edge (v)l-1,vl) The weight of (c);
step 1.3.2: in the middle layer, for the random walk path (v)1,v2,...,vl,w12,w23,...,w(l-1)l) Minimizing the first half (v) of the path by optimizing a second objective equation1,v2,...,vl) The distance between the nodes, the second objective equation is:
Figure FDA0002284670440000026
wherein ,
Figure FDA0002284670440000027
and
Figure FDA0002284670440000028
coding one-hot of two adjacent nodes in the random walk path;
step 1.3.3: minimizing the distance between the edge and the weight node by optimizing a third objective equation, the third objective equation being:
Figure FDA0002284670440000029
wherein
Figure FDA00022846704400000210
Is (v)1,v2,...,vr) Edge (e) between nodes12,e23,...,e(r-1)r) Any one side e(j-1)jA vector representation of (a);
step 1.3.4: sparsity of input-output vectors is limited by KL divergence:
Figure FDA00022846704400000211
wherein d is the dimension represented by the vector, p is the sparsity parameter,
Figure FDA00022846704400000212
is the mean degree of activation of the layer τ neurons,
Figure FDA00022846704400000213
is the degree of activation of the i-dimensional neuron,
Figure FDA00022846704400000214
is the average activation degree of the i-dimension neuron, and is tau epsilon [1, nl ∈];
Step 1.3.5: and (3) synthesizing the formula 1, the formula 2, the formula 3 and the formula 4, constructing a loss function, and finishing the construction of a weighted dynamic network representation learning model:
Figure FDA00022846704400000215
wherein
Figure FDA00022846704400000216
Representing the weight decay function, W(τ)For the τ -th layer weight, F denotes the norm.
4. The abnormal group detection method based on weighted dynamic network representation learning according to claim 3, wherein the step 2 comprises:
step 2.1: dynamically updating vector representation of nodes, and setting sampling probability s for random walk path of 1 st to t th time slice networki
Figure FDA0002284670440000031
Wherein i is a time value;
step 2.2: acquiring a random walk path set by integrating a current time slice network and a historical time slice network, sequentially sending the random walk paths into a constructed weighted dynamic network representation learning model, and obtaining low-dimensional vector representation of nodes by minimizing a loss function;
step 2.3: and after the vector representation of each node of the t-th time slice network is obtained, abnormal link detection is carried out on each link of the t + 1-th time slice network based on the vector representation of the current node, and an abnormal link set is obtained.
5. The abnormal group detection method based on weighted dynamic network representation learning according to claim 1, wherein the step 2.3 comprises:
step 2.3.1: and link exception identification:
average distance between all edge-connected node pairs in 1 st to t th time slice network
Figure FDA0002284670440000033
As a reference, node vi,vjThe degree of closeness between is defined as:
Figure FDA0002284670440000032
wherein ,dijIs a node vi,vjThe Euclidean distance between them;
setting an abnormal link judgment threshold k, and when two nodes with the similarity smaller than k in the network are linked in a time slice t +1, determining that the node pair has link abnormality in the time slice t +1, wherein the link is an abnormal link;
step 2.3.2: weight anomaly identification:
by a pair of nodes vi,vjThe vector representation of (a) is subjected to a Hadamard product operation to obtain an edge eijBy computing the edge eijPredicting the weight of the edge in a t +1 time slice by the Euclidean distance of each weight node in a d-dimensional space; if the predicted weight value does not match the actual weight value, determining the edge eijA weight exception occurs at the t +1 th time slice, and the link is an exception link;
step 2.3.3: the abnormal link set is obtained through the step 2.3.1 and the step 2.3.2.
6. The abnormal group detection method based on weighted dynamic network representation learning according to claim 1, wherein the step 3 comprises:
step 3.1: regarding all abnormal links in the abnormal link set as edges among nodes, and constructing a full-connection neural network model based on the abnormal link set, thereby outputting a plurality of abnormal subgraphs and obtaining an abnormal subgraph set;
step 3.2: and taking the maximum connected subgraph in the abnormal subgraph set to output as a final abnormal group.
CN201911155412.1A 2019-11-22 2019-11-22 Abnormal group detection method based on weighted dynamic network representation learning Active CN111126437B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911155412.1A CN111126437B (en) 2019-11-22 2019-11-22 Abnormal group detection method based on weighted dynamic network representation learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911155412.1A CN111126437B (en) 2019-11-22 2019-11-22 Abnormal group detection method based on weighted dynamic network representation learning

Publications (2)

Publication Number Publication Date
CN111126437A true CN111126437A (en) 2020-05-08
CN111126437B CN111126437B (en) 2023-05-02

Family

ID=70496400

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911155412.1A Active CN111126437B (en) 2019-11-22 2019-11-22 Abnormal group detection method based on weighted dynamic network representation learning

Country Status (1)

Country Link
CN (1) CN111126437B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112445957A (en) * 2020-11-05 2021-03-05 西安电子科技大学 Social network abnormal user detection method, system, medium, equipment and terminal
CN112650968A (en) * 2020-11-18 2021-04-13 天津大学 Abnormal subgraph detection method based on abnormal alignment model for multiple networks
CN114880314A (en) * 2022-05-23 2022-08-09 烟台聚禄信息科技有限公司 Big data cleaning decision-making method applying artificial intelligence strategy and AI processing system
CN115114488A (en) * 2022-07-15 2022-09-27 中国西安卫星测控中心 Dynamic information network abnormal evolution node detection method based on role discovery

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018126984A2 (en) * 2017-01-06 2018-07-12 江南大学 Mea-bp neural network-based wsn abnormality detection method
CN108540327A (en) * 2018-04-19 2018-09-14 中国人民解放军战略支援部队信息工程大学 A kind of dynamic network is abnormal to link behavior detection method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018126984A2 (en) * 2017-01-06 2018-07-12 江南大学 Mea-bp neural network-based wsn abnormality detection method
CN108540327A (en) * 2018-04-19 2018-09-14 中国人民解放军战略支援部队信息工程大学 A kind of dynamic network is abnormal to link behavior detection method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王文涛;吴淋涛;黄烨;朱容波;: "基于密集连接卷积神经网络的链路预测模型" *
赵文清;沈哲吉;李刚;: "基于深度学习的用户异常用电模式检测" *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112445957A (en) * 2020-11-05 2021-03-05 西安电子科技大学 Social network abnormal user detection method, system, medium, equipment and terminal
CN112650968A (en) * 2020-11-18 2021-04-13 天津大学 Abnormal subgraph detection method based on abnormal alignment model for multiple networks
CN114880314A (en) * 2022-05-23 2022-08-09 烟台聚禄信息科技有限公司 Big data cleaning decision-making method applying artificial intelligence strategy and AI processing system
CN115114488A (en) * 2022-07-15 2022-09-27 中国西安卫星测控中心 Dynamic information network abnormal evolution node detection method based on role discovery
CN115114488B (en) * 2022-07-15 2024-03-26 中国西安卫星测控中心 Dynamic information network abnormal evolution node detection method based on role discovery

Also Published As

Publication number Publication date
CN111126437B (en) 2023-05-02

Similar Documents

Publication Publication Date Title
CN111126437B (en) Abnormal group detection method based on weighted dynamic network representation learning
CN110147911B (en) Social influence prediction model and prediction method based on content perception
CN112217674B (en) Alarm root cause identification method based on causal network mining and graph attention network
CN112580902B (en) Object data processing method and device, computer equipment and storage medium
Ankaiah et al. A novel soft computing hybrid for data imputation
CN117591944B (en) Learning early warning method and system for big data analysis
CN111967011B (en) Interpretable internal threat assessment method
CN114266455A (en) Knowledge graph-based visual enterprise risk assessment method
CN114296975A (en) Distributed system call chain and log fusion anomaly detection method
CN110286668A (en) A kind of rail friendship signal system VIM board faults prediction technique based on big data
Du et al. Detection of key figures in social networks by combining harmonic modularity with community structure-regulated network embedding
Zhang et al. Anomaly detection of periodic multivariate time series under high acquisition frequency scene in IoT
AU2021102006A4 (en) A system and method for identifying online rumors based on propagation influence
Haroon et al. Application of machine learning in forensic science
CN116545679A (en) Industrial situation security basic framework and network attack behavior feature analysis method
Alyousifi et al. New application of fuzzy Markov chain modeling for air pollution index estimation
CN114265954B (en) Graph representation learning method based on position and structure information
Maeno et al. Stable deterministic crystallization for discovering hidden hubs
CN112580992B (en) Illegal fund collecting risk monitoring system for financial-like enterprises
Xu et al. Power distribution systems fault cause identification using logistic regression and artificial neural network
Khomonenko et al. Approach to processing of data from social networks for detecting public opinion on quality of educational services
Costa et al. Vote-and-comment: Modeling the coevolution of user interactions in social voting web sites
Liu et al. Federated Graph Learning with Cross-subgraph Missing Links Recovery
Afkhamiaghda et al. The application of using supervised classification techniques in selecting the most optimized temporary house type in post-disaster situations
CN114553497B (en) Internal threat detection method based on feature fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant