CN114443909A - Dynamic graph anomaly detection method based on community structure - Google Patents

Dynamic graph anomaly detection method based on community structure Download PDF

Info

Publication number
CN114443909A
CN114443909A CN202210019006.8A CN202210019006A CN114443909A CN 114443909 A CN114443909 A CN 114443909A CN 202210019006 A CN202210019006 A CN 202210019006A CN 114443909 A CN114443909 A CN 114443909A
Authority
CN
China
Prior art keywords
community
block
vertex
embedding
dynamic graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210019006.8A
Other languages
Chinese (zh)
Inventor
方凯彬
李俊杰
包先雨
蔡伊娜
林伟钦
王歆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen University
Shenzhen Academy of Inspection and Quarantine
Shenzhen Customs Information Center
Original Assignee
Shenzhen University
Shenzhen Academy of Inspection and Quarantine
Shenzhen Customs Information Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen University, Shenzhen Academy of Inspection and Quarantine, Shenzhen Customs Information Center filed Critical Shenzhen University
Priority to CN202210019006.8A priority Critical patent/CN114443909A/en
Publication of CN114443909A publication Critical patent/CN114443909A/en
Priority to PCT/CN2022/112656 priority patent/WO2023130728A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Abstract

The invention discloses a dynamic graph anomaly detection method based on a community structure, which comprises the following steps: s1: firstly, defining the specific definition of abnormal edge detection of the dynamic graph; s2: the CmaGraph is composed of a C-Block, an M-Block and an A-Block, the C-Block detects an evolutionary community of the dynamic graph, the M-Block reconstructs distances between vertexes in the community and the community, so that the vertex embedding distances in the same community are close to each other, and the vertex embedding distances in different communities are far away from each other; s3: vertex embedding is finally input to A-Block for anomaly detection. The method has better effect on detecting the abnormal data and the effectiveness of the community structure in abnormal detection. Aiming at the research blank of carrying out abnormity detection by utilizing a community structure based on a graph embedding method, the invention fills the research blank in the aspect.

Description

Dynamic graph anomaly detection method based on community structure
Technical Field
The invention belongs to the technical field of dynamic graph abnormity detection, and particularly relates to a dynamic graph abnormity detection method based on a community structure.
Background
Dynamic map anomaly detection is an important research direction in the map field. Anomalies for the dynamic graph include: vertex exceptions, edge exceptions, and subgraph exceptions. Many applications of dynamic graphs use edges to represent complex topologies and timing characteristics. Therefore, the detection of abnormal edges is a key part of the dynamic graph abnormality detection technology. The detection of abnormal edges in the dynamic graph has wide application, such as intrusion detection systems, social network fraud detection and the like. By mining the anomalies in the dynamic graph, some safety accidents can be avoided, and economic losses are avoided or reduced.
Graph-embedded models are models that can map vertices, edges, or subgraphs on a graph to a new vector space. In the new vector space, embedding can express different attributes according to different methods, and the embedded learning is free from manual intervention. In large complex graphs, the graph embedding model has better performance than the traditional heuristic method. Since the graph embedding model has excellent performance, there are many studies to extract features of a graph based on a graph embedding method and to perform anomaly detection using the extracted features. The method is also based on a graph embedding technology, utilizes the community structure to extract the characteristics of the dynamic graph, and carries out anomaly detection on the extracted characteristics.
NetWalk is one of the classic and commonly used graph-embedding-based dynamic graph anomaly detection algorithms. NetWalk may dynamically update the network representation as the dynamic graph is updated and use the updated network representation for dynamic graph anomaly detection. NetWalk first encodes the vertices of the dynamic graph as vectors by a self-encoder with blob embedding, then minimizes the vertex-embedded distance in random walks, and reconstructs the error from the encoder as a global regularization term. After learning vertex embedding, a clustering-based technique is used to incrementally and dynamically detect network anomalies.
NetWalk is an important research on the anomaly detection direction of the dynamic graph, but does not solve the problem of anomaly detection by using a community structure under the condition that the community structure of the dynamic graph is obviously divided.
The existing method does not consider the abnormal detection based on the community structure in the graph embedding method. In general, a community is defined as a set of vertices with similar relationships, and such relationships are different from other relationships of the network. In real life, there are many anomalies that occur from community to community. For example, in a computer network, a certain terminal often sends and receives data to and from a fixed certain terminal, and a relationship of sending and receiving data is formed between the certain terminal and the fixed certain terminal. The terminal and the terminal set which receives and transmits data frequently can be considered to belong to the same community. However, when a hacker attacks the terminal and uses the terminal as a springboard, a large amount of abnormal data is sent to the network to attack other terminals. This is manifested as the node suddenly sending a large amount of data to nodes within and outside the community, whereas the node would normally only send data to nodes within the community previously. This behavior may be considered an abnormal behavior. Therefore, we propose a dynamic graph anomaly detection method based on community structure to solve the above mentioned problems in the background art.
Disclosure of Invention
The invention aims to provide a dynamic graph abnormity detection method based on a community structure, so as to solve the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme: a dynamic graph anomaly detection method based on community structure includes the following steps:
s1: firstly, defining the specific definition of abnormal edge detection of the dynamic graph:
dynamic graph
Figure BDA0003461692850000021
Is a sequence of graphs, GtThe representation is a graph under a time stamp t, Gt=(Vt,Et) (ii) a With the graph updated, the updated edge set uses EtIs represented bytSet V for all vertices in (1)tRepresenting;
Figure BDA0003461692850000022
n=|Vt|,mt=|Etl, |; at time stamp t, AtRepresents GtThe adjacency matrix of (a); given GtThe detection of abnormal edge of dynamic graph aims to find EtThe abnormal edge of (1);
s2: the CmaGraph is composed of a C-Block, an M-Block and an A-Block, the C-Block detects an evolutionary community of the dynamic graph, the M-Block reconstructs distances between vertexes in the community and the community, so that the vertex embedding distances in the same community are close to each other, and the vertex embedding distances in different communities are far away from each other;
s3: vertex embedding is finally input to A-Block for anomaly detection.
The C-Block aims to detect an evolved community, wherein the evolved community refers to a community which changes along with the time on the dynamic graph;
using the adjacency matrix as an input from the encoder to obtain initial vertex embeddings and applying k-means on the vertex embeddings for community detection;
the self-encoder uses a sparse evolution self-encoder (SeAutoencor) to obtain stable vertex embedding, so that k-means obtains a stable community label;
formally, at timestamp t, G can be foundtAdjacent matrix A oftAnd setting the number k of communities, which is a hyper-parameter;
by asEmbedding the constructed vertex of the layer full-connection network SeAutoencor, wherein the forward propagation formula of the SeAutoencor is as follows:
Figure BDA0003461692850000031
wherein l is 1, …, ls-1,
Figure BDA0003461692850000032
And
Figure BDA0003461692850000033
respectively are a l-th layer weight matrix and a bias vector of the SeAutoencoder, and sigma is a sigmoid function;
is provided with
Figure BDA0003461692850000034
Applying k-means to HtThus, a community label vector c containing each vertex can be obtainedtHere, Ht∈Rn×dD is the dimension of vertex embedding, Ct∈Rn
The reconstruction loss function of SeAutoencorder is:
Figure BDA0003461692850000035
wherein F is frobenius norm, introducing sparse constraint, and the sparse penalty term of the SeAutoencoder neuron is defined by Kullback-Leibler divergence:
Figure BDA0003461692850000036
where p is a sparse parameter, where p is,
Figure BDA0003461692850000037
is the average activation of the jth neuron in layer i,
Figure BDA0003461692850000038
at HtAnd Ht-1Introduces timing loss J therebetweent
Figure BDA0003461692850000041
When t is 1, JT=0;
Use ofsSeAutoencorder for a layer, the loss function is:
Figure BDA0003461692850000042
where β and λ control the weight of the sparsity constraint and timing loss, respectively.
The M-Block aims to reconstruct the distance between communities and the vertexes between the communities, so that the vertexes in the same community are close to each other in Euclidean distance, and the Euclidean distances between the vertexes in different communities are far away from each other;
vertex embedding and community label vector are input of M-Block, and output of M-Block is community measurement enhancement vertex embedding; M-Block uses the community metric enhancement network (CenNet), a Siamese network, to enhance vertex embedding, a method of depth metric learning that reconstructs the distance between vertices in the evolved community
Formally, at timestamp t, H is derived from C-BlocktAnd ct(ii) a Use ofcA fully connected network of layers CenNet, each layer having d neurons, wherein the forward propagation formula is:
Figure BDA0003461692850000043
wherein l is 1, …, lc-1,
Figure BDA0003461692850000044
And
Figure BDA0003461692850000045
respectively is a weight matrix and a bias vector of the first layer of the CenNet; loss of CenNetThe function is a contrast loss function, which is:
Figure BDA0003461692850000046
wherein
Figure BDA0003461692850000047
Representing the euclidean distance between samples i and j,
Figure BDA0003461692850000048
is a matrix OtIf samples i and j are in the same community, then y ij1, otherwise yij0, b is the interval; when the data set is too large, JCenNetIs high, for a given sample i, the index J is obtained using negative sampling, which reduces the complexity, i.e. some random samples J can be obtained from the data set to approximate JCenNet
The goal of the A-Block is to obtain Et(ii) anomaly scores for all edges in CmaGraph, using OC-NN as an anomaly detector;
A-Block to OtUsing an edge encoder to obtain edge embedding, using for vertex u and vertex v
Figure BDA0003461692850000051
Embedding and inputting edges into an One-Class neural network (OCNN) by using the embedded distance information, wherein the edge embedding is an anomaly detection model;
formally, at timestamp t, get O for M-BlocktAnd EtThe encoder phi and OtAnd EtConversion to edge-embedded PtUsing a 1aThe OCNN of the layer is a fully connected network, each hidden layer is provided with d neurons, and the output layer is provided with only 1 neuron and represents abnormal score; the forward propagation formula of the neural network is as follows:
Figure BDA0003461692850000052
wherein l is 1, …, l a2, the last layer does not apply an activation function, i.e.
Figure BDA0003461692850000053
Figure BDA0003461692850000054
Wherein
Figure BDA0003461692850000055
And
Figure BDA0003461692850000056
the weight matrix and the bias vector of the OCNN l-th layer and the abnormal score vector are respectively
Figure BDA0003461692850000057
The loss function of OCNN is:
Figure BDA0003461692850000058
where r is the bias of the hyperplane, v controls the number of data points allowed to pass through the hyperplane, v is the percentage of anomalies, and s is the final resulttAnd the abnormal edges can be classified by setting a threshold value;
formally, at timestamp t, an updated edge set E can be obtainedtAccording to EtUpdating adjacency matrix AtAnd use of AtAs input to the CmaGraph, the autoancoder, CenNet, and OCNN were then trained with the learning rate α and previous weights.
Compared with the prior art, the invention has the beneficial effects that: the invention provides a dynamic graph anomaly detection method based on a community structure, which uses an evolutionary sparse self-encoder and K-Means to detect the evolutionary community structure on a dynamic graph, and uses a Siamese network to reconstruct the vertex distance between communities and community intervals, so that the European distances between the vertices in the same community are close to each other, and the European distances between the vertices in different communities are far away from each other. Vertex embedding is learned based on community structure and used in anomaly detection. The method has better effect on detecting the abnormal data and the effectiveness of the community structure in abnormal detection. Aiming at the research blank of carrying out abnormity detection by utilizing a community structure based on a graph embedding method, the invention fills the research blank in the aspect.
Drawings
FIG. 1 is a schematic flow chart of a dynamic graph anomaly detection method based on a community structure according to the present invention;
FIG. 2 is a diagram of the input diagram G of the present inventiont-1A schematic diagram of (a);
FIG. 3 shows (b) G of the present inventiont-1A schematic diagram of the visual output at C-Block;
FIG. 4 is (c) input diagram G of the present inventiontA schematic diagram of (a);
FIG. 5 shows (d) G of the present inventiontA schematic diagram of the visual output at C-Block;
FIG. 6 shows (a) G of the present inventiont-1A schematic of the visual input at M-Block;
FIG. 7 shows (b) G of the present inventiont-1Schematic diagram of visual output at M-Block.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The CmaGraph performs detection of abnormal edges based on community structures on the dynamic graph. The CmaGraph detects the evolutionary community of the dynamic graph, and rebuilds the distance between the vertexes in the community and the community through a Siamese network, so that the vertexes in the same community are close to each other in Euclidean distance, and the Euclidean distances between the vertexes in different communities are far away from each other. The distance between vertices implicitly preserves the distance relationship between intra-community and inter-community. The distance between the vertexes is used for coding edges, the obtained edges are embedded and input into the existing abnormal edge detection algorithm for abnormal edge detection.
Anomaly detection is the identification of events or observations in the data that do not match the expected pattern. A dynamic graph is a sequence of graphs in which the graphs change over time. Dynamic graph anomaly detection is the identification of anomalous data in a dynamic graph. The invention provides a dynamic graph anomaly detection method based on a community structure, and aims to solve the technical problem of fully utilizing the community structure in a dynamic graph to mine anomalous data when the community structure in the dynamic graph is obviously divided. The ultimate goal addressed by the present invention is to mine anomalous data in a dynamic graph given the dynamic graph.
The purpose of the invention is as follows:
aiming at the research blank of carrying out anomaly detection by utilizing a community structure based on a graph embedding method, the invention provides a dynamic graph anomaly detection method CmaGraph based on the community structure, and fills the research blank in the aspect.
The invention provides a dynamic graph abnormity detection method based on a community structure as shown in figures 1-7, which comprises the following steps:
firstly, the specific definition of the detection of the abnormal edge of the dynamic graph is defined:
dynamic graph
Figure BDA0003461692850000071
Is a sequence of graphs. GtThe representation is a graph under a time stamp t, Gt=(Vt,Et). With the graph updated, the updated edge set uses EtIs represented bytSet V for all vertices in (1)tAnd (4) showing.
Figure BDA0003461692850000072
n=|Vt|,mt=|EtL. At time stamp t, AtRepresents GtOf the adjacent matrix. Given GtThe detection of abnormal edge of dynamic graph aims to find EtAn abnormal edge in (1).
The CmaGraph is composed of three blocks, as shown in FIG. 1, including C-Block, M-Block, and A-Block.
FIG. 1 is a CmaGraph flow diagram in which (a) the dynamic graph, (b) the adjacency matrix, (C) C-Block: an evolutionary community detection Block of a dynamic graph, (d) M-Block: an embedding distance reconstruction Block, (e) A-Block: and an anomaly detection block.
C-Block detects the evolutionary community of the kinetic graph. M-Block reconstructs the distance between the vertexes in the communities, so that the vertex embedding distances in the same community are close to each other, and the vertex embedding distances in different communities are far away from each other. Vertex embedding is finally input to A-Block for anomaly detection.
The goal of C-Block (a) is to detect evolving communities. Evolutionary communities, refer to communities on a dynamic graph that change over time. For example, in a certain enterprise, workers within the enterprise may form a community. Over time, there is a flow of people within the enterprise, such as new employees, and employees who are out of business. But this community still exists, only the structure has changed. Therefore, in the dynamic graph, vertices in the community and the structure of the community change with time. In many real dynamic graphs, the structure of the community changes over time, but the changes are not very drastic. Therefore, it is first necessary to detect evolving communities on the kinetic graph, and the variation of these communities cannot be too drastic.
The method comprises the following steps: initial vertex embeddings are obtained using the adjacency matrix as input from the encoder, and k-means are applied on the vertex embeddings for community detection. In particular, the present invention uses a sparse evolution self-encoder (SeAutoencoder) which can result in stable vertex embedding, and thus k-means can result in stable community tags. The input and output of C-Block in a composite dynamic graph and the corresponding visualization information are shown in fig. 2-5, and fig. 2-5 show that the embedded moving distance is small and the community variation is small as the graph is updated.
FIGS. 2-5 are generally the inputs and outputs of C-Block. FIG. 2 is (a) an input graph Gt-1FIG. 3 is (b) Gt-1A schematic diagram of visual output in C-Block, FIG. 4 is (C) an input diagram GtFIG. 5 is (d) GtSchematic diagram of visual output at C-Block. (b) C1 and c2 in (d) represent different communities, and the arrow in (d) represents the embedded update direction relative to (b).
Formally, at timestamp t, G can be foundtAdjacent matrix A oftAnd sets the number of communities k, which is a hyperparameter. For the inventionsThe full-connection network SeAutoencor structure vertex of the layer is embedded, and the forward propagation formula of the SeAutoencor is as follows
Figure BDA0003461692850000081
Wherein l is 1, …, ls-1,
Figure BDA0003461692850000082
And
Figure BDA0003461692850000083
respectively are a l-th layer weight matrix and an offset vector of the SeAutoencoder, and sigma is a sigmoid function. Is provided with
Figure BDA0003461692850000084
Applying k-means to HtThus, a community label vector c containing each vertex can be obtainedt. Here, Ht∈Rn×dD is the dimension of vertex embedding, Ct∈Rn. The reconstruction loss function of SeAutoencorder is:
Figure BDA0003461692850000091
wherein F is the frobenius norm. Introducing sparsity constraint, and defining a sparsity penalty term of the SeAutoencoder neuron by Kullback-Leibler divergence:
Figure BDA0003461692850000092
where p is a sparse parameter, where p is,
Figure BDA0003461692850000093
for the average activation of the jth neuron in layer i,
Figure BDA0003461692850000094
the vertex embedding and community label changes cannot be too drastic when the graph is updated. Thus, at HtAnd Ht-1Introduces timing loss J therebetweent
Figure BDA0003461692850000095
When t is 1, JT0. Use ofsSeAutoencorder for a layer, the loss function is:
Figure BDA0003461692850000096
where β and λ control the weight of the sparsity constraint and timing loss, respectively.
The aim of the M-Block is to reconstruct the distance between communities and the vertexes between the communities, so that the vertexes in the same community are close to each other in Euclidean distance, and the vertexes in different communities are far from each other in Euclidean distance. The vertex embedding and community label vector are the inputs to M-Block, the output of which is community metric enhanced vertex embedding. M-Block uses community metric enhancement network (CenNet), which is a Siamese network, to enhance vertex embedding, which is a method of depth metric learning. It reconstructs the distance between vertices in the evolving community. As shown in fig. 6-7, enhanced vertex embedding is more indicative than original vertex embedding, because the euclidean distance between vertices implicitly preserves the intra-community and inter-community distance information.
FIGS. 6-7 are generally Gt-1And (4) displaying input and output of M-Block. FIG. 6 shows (a)Gt-1FIG. 7 is a schematic diagram of visual input at M-Block (b) Gt-1Schematic diagram of visual output at M-Block.
Formally, at timestamp t, H is derived from C-BlocktAnd ct. Use of the inventioncA fully connected network of layers CenNet, each layer having d neurons, wherein the forward propagation formula is:
Figure BDA0003461692850000101
wherein l is 1, …, lc-1,
Figure BDA0003461692850000102
And
Figure BDA0003461692850000103
respectively, a weight matrix and an offset vector of the l-th layer of the CenNet. The loss function for CenNet is the comparative loss function:
Figure BDA0003461692850000104
wherein
Figure BDA0003461692850000105
Representing the euclidean distance between samples i and j,
Figure BDA0003461692850000106
is a matrix OtIf samples i and j are in the same community, then y ij1, otherwise yijAnd b is an interval 0. When the data set is too large, JCenNetIs high, for a given sample i, the index j is obtained using negative sampling, which may reduce complexity. That is, some random samples J can be obtained from the data set to approximate JCenNet
(III) A-Block aimed at obtaining EtThe anomaly score of all edges in. In CmaGraph, the invention uses OC-NN as an anomaly detector.
As shown in FIGS. 2-5, A-Block vs. OtAn edge encoder is applied to obtain edge embedding. For vertex u and vertex v, the invention uses
Figure BDA0003461692850000107
It can make better use of the embedded distance information. The edge embedding is then input into One-Class neural network (OCNN), which is an anomaly detection model.
Formally, at timestamp t, O for M-Block can be obtainedtAnd Et. Edge encoder φ to OtAnd EtConversion to edge-embedded Pt. The invention uses aaAnd in the fully connected network OCNN of the layers, each hidden layer has d neurons, and the output layer has only 1 neuron and represents abnormal score. The forward propagation formula of the neural network is as follows:
Figure BDA0003461692850000111
wherein l is 1, …, l a2, the last layer does not apply an activation function, i.e.
Figure BDA0003461692850000112
Figure BDA0003461692850000113
Wherein
Figure BDA0003461692850000114
And
Figure BDA0003461692850000115
the weight matrix and the bias vector of the OCNN l-th layer and the abnormal score vector are respectively
Figure BDA0003461692850000116
The loss function of OCNN is:
Figure BDA0003461692850000117
where r is the offset of the hyperplane. V controls the number of data points that are allowed to pass through the hyperplane, v being equivalent to the percentage of anomalies. Finally obtaining stAnd the abnormal edges can be classified by setting a threshold.
Formally, at timestamp t, an updated edge set E can be obtainedt. According to EtUpdating adjacency matrix AtAnd use of AtAs input to the CmaGraph. SeAutoencorder, CenNet, and OCNN are then trained with the learning rate α and previous weights.
The CmaGraph is summarized in algorithm 1.
Figure BDA0003461692850000118
Figure BDA0003461692850000121
In real life, many anomalies occur among communities, however, the community structure in the dynamic graph is not considered by the existing dynamic graph anomaly detection algorithm based on the graph embedding model. The method uses an evolution sparse self-encoder and K-Means to detect the evolution community structure on the dynamic graph, and uses a Siamese network to reconstruct the vertex distance between communities and the community, so that the European distances between the vertexes in the same community are close to each other, and the European distances between the vertexes in different communities are far away from each other. Vertex embedding is learned based on community structure and used in anomaly detection.
The method of the invention uses three real-world dynamic graph data sets as data to carry out repeated tests. The data set types encompass social networking graphs, paper author collaboration graphs, and computer networking graphs. Quantitative analysis is carried out through AUC indexes, on three data sets, the CmaGraph method is improved by 18% compared with the NetWalk method on average, and the CmaGraph method has a good effect on abnormal data detection and the effectiveness of community structures on abnormal detection.
In summary, compared with the prior art, the method uses the evolution sparse autoencoder and the K-Means to detect the evolution community structure on the dynamic graph, and uses the Siamese network to reconstruct the vertex distance between the communities and the community, so that the European distances between the vertexes in the same community are close to each other, and the European distances between the vertexes in different communities are far away from each other. Vertex embedding is learned based on community structure and used in anomaly detection. The method has better effect on detecting the abnormal data and the effectiveness of the community structure in abnormal detection. Aiming at the research blank of carrying out abnormity detection by utilizing a community structure based on a graph embedding method, the invention fills the research blank in the aspect.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments or portions thereof without departing from the spirit and scope of the invention.

Claims (4)

1. The dynamic graph anomaly detection method based on the community structure is characterized by comprising the following steps: the method comprises the following steps:
s1: firstly, defining the specific definition of abnormal edge detection of the dynamic graph:
dynamic graph
Figure FDA0003461692840000011
Is a sequence of graphs, GtThe representation is a graph under the time stamp t, Gt=(Vt,Et) (ii) a With the graph updated, the updated edge set uses EtIs represented bytSet V for all vertices in (1)tRepresents;
Figure FDA0003461692840000012
n=|Vt|,mt=|Etl, |; at time stamp t, AtRepresents GtThe adjacency matrix of (a); given GtThe detection of abnormal edge of dynamic graph aims to find EtThe abnormal edge of (1);
s2: the CmaGraph is composed of a C-Block, an M-Block and an A-Block, the C-Block detects an evolutionary community of the dynamic graph, the M-Block reconstructs distances between vertexes in the community and the community, so that the vertex embedding distances in the same community are close to each other, and the vertex embedding distances in different communities are far away from each other;
s3: vertex embedding is finally input to A-Block for anomaly detection.
2. The method for detecting the anomaly of the dynamic graph based on the community structure as claimed in claim 1, wherein: the C-Block aims to detect an evolved community, wherein the evolved community refers to a community which changes along with the time on the dynamic graph;
using the adjacency matrix as an input from the encoder to obtain initial vertex embeddings and applying k-means on the vertex embeddings for community detection;
the self-encoder obtains stable vertex embedding by using a sparse evolution self-encoder, namely a SeAutoencoder, so that k-means obtains a stable community label;
formally, at timestamp t, G can be foundtAdjacent matrix A oftAnd setting the number k of communities, which is a hyper-parameter;
by asThe full-connection network SeAutoencor of the layer is constructed with vertex embedding, and the forward propagation formula of SeAutoencor is as follows:
Figure FDA0003461692840000013
wherein l is 1, …, ls-1,
Figure FDA0003461692840000014
Figure FDA0003461692840000015
And
Figure FDA0003461692840000016
respectively are a l-th layer weight matrix and a bias vector of the SeAutoencoder, and sigma is a sigmoid function;
is provided with
Figure FDA0003461692840000021
Applying k-means to HtThus, a community label vector c containing each vertex can be obtainedtHere, Ht∈Rn×dD is the dimension of vertex embedding, Ct∈Rn(ii) a The reconstruction loss function of SeAutoencorder is:
Figure FDA0003461692840000022
wherein F is frobenius norm, introducing sparse constraint, and the sparse penalty term of the SeAutoencoder neuron is defined by Kullback-Leibler divergence:
Figure FDA0003461692840000023
where p is a sparse parameter, where p is,
Figure FDA0003461692840000024
for the average activation of the jth neuron in layer i,
Figure FDA0003461692840000025
at HtAnd Ht-1Introduces timing loss J therebetweent
Figure FDA0003461692840000026
When t is 1, JT=0;
Use ofsSeAutoencorder for a layer, the loss function is:
Figure FDA0003461692840000027
where β and λ control the weight of the sparsity constraint and timing loss, respectively.
3. The method for detecting the anomaly of the dynamic graph based on the community structure as claimed in claim 1, wherein: the M-Block aims to reconstruct the distance between communities and the vertexes between the communities, so that the vertexes in the same community are close to each other in Euclidean distance, and the Euclidean distances between the vertexes in different communities are far away from each other;
vertex embedding and community label vector are input of M-Block, and output of M-Block is community measurement enhancement vertex embedding; M-Block uses the community metric enhancement network, namely CenNet, to strengthen the vertex embedding, CenNet is a Siamese network, it is a method of depth metric learning, it has rebuilt the distance between vertexes in the evolution community;
formally, at timestamp t, H is derived from C-BlocktAnd ct(ii) a Use ofcA fully connected network of layers CenNet, each layer having d neurons, wherein the forward propagation formula is:
Figure FDA0003461692840000028
wherein l is 1, …, lc-1,
Figure FDA0003461692840000031
Figure FDA0003461692840000032
And
Figure FDA0003461692840000033
of the first layer of CenNet respectivelyA weight matrix and a bias vector; the loss function for CenNet is the comparative loss function, which is:
Figure FDA0003461692840000034
wherein
Figure FDA0003461692840000035
Representing the euclidean distance between samples i and j,
Figure FDA0003461692840000036
is a matrix OtIf samples i and j are in the same community, then yij1, otherwise yij0, b is the interval; when the data set is too large, JCenNetIs high, for a given sample i, the index J is obtained using negative sampling, which reduces the complexity, i.e. some random samples J can be obtained from the data set to approximate JCenNet
4. The method for detecting the anomaly of the dynamic graph based on the community structure as claimed in claim 1, wherein: the goal of the A-Block is to obtain Et(ii) anomaly scores for all edges in CmaGraph, using OC-NN as an anomaly detector;
A-Block to OtUsing an edge encoder to obtain edge embedding, using for vertex u and vertex v
Figure FDA0003461692840000037
Embedding the edges into an One-Class neural network (OCNN) by using the embedded distance information, wherein the edge is an abnormal detection model;
formally, at timestamp t, get O for M-BlocktAnd EtThe encoder phi and OtAnd EtConversion to edge-embedded PtUsing a 1aAn OCNN with d neurons in each hidden layer and an output layerOnly 1 neuron, representing an abnormal score; the forward propagation formula of the neural network is as follows:
Figure FDA0003461692840000038
wherein l is 1, …, la2, the last layer does not apply an activation function, i.e.
Figure FDA0003461692840000039
Figure FDA00034616928400000310
Wherein
Figure FDA00034616928400000311
And
Figure FDA00034616928400000312
the weight matrix and the bias vector of the OCNN l-th layer and the abnormal score vector are respectively
Figure FDA00034616928400000313
The loss function of OCNN is:
Figure FDA0003461692840000041
where r is the bias of the hyperplane, v controls the number of data points allowed to pass through the hyperplane, v is the percentage of anomalies, and s is the final resulttAnd the abnormal edges can be classified by setting a threshold value;
formally, at timestamp t, an updated edge set E can be obtainedtAccording to EtUpdating adjacency matrix AtAnd use of AtAs input to the CmaGraph, the autoancoder, CenNet, and OCNN were then trained with the learning rate α and previous weights.
CN202210019006.8A 2022-01-10 2022-01-10 Dynamic graph anomaly detection method based on community structure Pending CN114443909A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210019006.8A CN114443909A (en) 2022-01-10 2022-01-10 Dynamic graph anomaly detection method based on community structure
PCT/CN2022/112656 WO2023130728A1 (en) 2022-01-10 2022-10-11 Dynamic graph anomaly detection method based on block structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210019006.8A CN114443909A (en) 2022-01-10 2022-01-10 Dynamic graph anomaly detection method based on community structure

Publications (1)

Publication Number Publication Date
CN114443909A true CN114443909A (en) 2022-05-06

Family

ID=81366799

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210019006.8A Pending CN114443909A (en) 2022-01-10 2022-01-10 Dynamic graph anomaly detection method based on community structure

Country Status (2)

Country Link
CN (1) CN114443909A (en)
WO (1) WO2023130728A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023130728A1 (en) * 2022-01-10 2023-07-13 深圳市检验检疫科学研究院 Dynamic graph anomaly detection method based on block structure

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8090665B2 (en) * 2008-09-24 2012-01-03 Nec Laboratories America, Inc. Finding communities and their evolutions in dynamic social network
US9202052B1 (en) * 2013-06-21 2015-12-01 Emc Corporation Dynamic graph anomaly detection framework and scalable system architecture
CN106909948A (en) * 2017-03-10 2017-06-30 深圳大学 A kind of increment documents structured Cluster method and system on Dynamic Graph
CN111768618B (en) * 2020-06-04 2021-07-20 北京航空航天大学 Traffic jam state propagation prediction and early warning system and method based on city portrait
CN111709518A (en) * 2020-06-16 2020-09-25 重庆大学 Method for enhancing network representation learning based on community perception and relationship attention
CN114443909A (en) * 2022-01-10 2022-05-06 深圳市检验检疫科学研究院 Dynamic graph anomaly detection method based on community structure

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023130728A1 (en) * 2022-01-10 2023-07-13 深圳市检验检疫科学研究院 Dynamic graph anomaly detection method based on block structure

Also Published As

Publication number Publication date
WO2023130728A1 (en) 2023-07-13

Similar Documents

Publication Publication Date Title
Lv et al. Deep-learning-enabled security issues in the internet of things
Chen et al. Fast gradient attack on network embedding
Nicolau et al. Learning neural representations for network anomaly detection
CN109639710B (en) Network attack defense method based on countermeasure training
Zhao et al. A malware detection method of code texture visualization based on an improved faster RCNN combining transfer learning
US11699160B2 (en) Method, use thereof, computer program product and system for fraud detection
Chen et al. Link prediction adversarial attack
Chen et al. An efficient network behavior anomaly detection using a hybrid DBN-LSTM network
Irfan et al. A novel lifelong learning model based on cross domain knowledge extraction and transfer to classify underwater images
Jia et al. Predict land covers with transition modeling and incremental learning
CN113269228B (en) Method, device and system for training graph network classification model and electronic equipment
CN111861756A (en) Group partner detection method based on financial transaction network and implementation device thereof
CN111259264B (en) Time sequence scoring prediction method based on generation countermeasure network
CN114443909A (en) Dynamic graph anomaly detection method based on community structure
Nichol et al. Machine learning feature analysis illuminates disparity between E3SM climate models and observed climate change
Zhang et al. An intrusion detection method based on stacked sparse autoencoder and improved gaussian mixture model
CN115310589A (en) Group identification method and system based on depth map self-supervision learning
CN114445639A (en) Dual self-attention-based dynamic graph anomaly detection method
CN110688537A (en) Calculation graph node low-dimensional representation and related application method
CN107391443B (en) Sparse data anomaly detection method and device
CN116245645A (en) Financial crime partner detection method based on graph neural network
Kim et al. Network Intrusion Detection System using 2D Anomaly Detection
CN115982373A (en) Knowledge graph recommendation method combining multi-level interactive contrast learning
CN115965795A (en) Deep darknet group discovery method based on network representation learning
Zouinina et al. Efficient k-anonymization through constrained collaborative clustering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination