CN114443909A - Dynamic graph anomaly detection method based on community structure - Google Patents
Dynamic graph anomaly detection method based on community structure Download PDFInfo
- Publication number
- CN114443909A CN114443909A CN202210019006.8A CN202210019006A CN114443909A CN 114443909 A CN114443909 A CN 114443909A CN 202210019006 A CN202210019006 A CN 202210019006A CN 114443909 A CN114443909 A CN 114443909A
- Authority
- CN
- China
- Prior art keywords
- community
- block
- vertex
- embedding
- dynamic graph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9024—Graphs; Linked lists
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/906—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2216/00—Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
- G06F2216/03—Data mining
Abstract
The invention discloses a dynamic graph anomaly detection method based on a community structure, which comprises the following steps: s1: firstly, defining the specific definition of abnormal edge detection of the dynamic graph; s2: the CmaGraph is composed of a C-Block, an M-Block and an A-Block, the C-Block detects an evolutionary community of the dynamic graph, the M-Block reconstructs distances between vertexes in the community and the community, so that the vertex embedding distances in the same community are close to each other, and the vertex embedding distances in different communities are far away from each other; s3: vertex embedding is finally input to A-Block for anomaly detection. The method has better effect on detecting the abnormal data and the effectiveness of the community structure in abnormal detection. Aiming at the research blank of carrying out abnormity detection by utilizing a community structure based on a graph embedding method, the invention fills the research blank in the aspect.
Description
Technical Field
The invention belongs to the technical field of dynamic graph abnormity detection, and particularly relates to a dynamic graph abnormity detection method based on a community structure.
Background
Dynamic map anomaly detection is an important research direction in the map field. Anomalies for the dynamic graph include: vertex exceptions, edge exceptions, and subgraph exceptions. Many applications of dynamic graphs use edges to represent complex topologies and timing characteristics. Therefore, the detection of abnormal edges is a key part of the dynamic graph abnormality detection technology. The detection of abnormal edges in the dynamic graph has wide application, such as intrusion detection systems, social network fraud detection and the like. By mining the anomalies in the dynamic graph, some safety accidents can be avoided, and economic losses are avoided or reduced.
Graph-embedded models are models that can map vertices, edges, or subgraphs on a graph to a new vector space. In the new vector space, embedding can express different attributes according to different methods, and the embedded learning is free from manual intervention. In large complex graphs, the graph embedding model has better performance than the traditional heuristic method. Since the graph embedding model has excellent performance, there are many studies to extract features of a graph based on a graph embedding method and to perform anomaly detection using the extracted features. The method is also based on a graph embedding technology, utilizes the community structure to extract the characteristics of the dynamic graph, and carries out anomaly detection on the extracted characteristics.
NetWalk is one of the classic and commonly used graph-embedding-based dynamic graph anomaly detection algorithms. NetWalk may dynamically update the network representation as the dynamic graph is updated and use the updated network representation for dynamic graph anomaly detection. NetWalk first encodes the vertices of the dynamic graph as vectors by a self-encoder with blob embedding, then minimizes the vertex-embedded distance in random walks, and reconstructs the error from the encoder as a global regularization term. After learning vertex embedding, a clustering-based technique is used to incrementally and dynamically detect network anomalies.
NetWalk is an important research on the anomaly detection direction of the dynamic graph, but does not solve the problem of anomaly detection by using a community structure under the condition that the community structure of the dynamic graph is obviously divided.
The existing method does not consider the abnormal detection based on the community structure in the graph embedding method. In general, a community is defined as a set of vertices with similar relationships, and such relationships are different from other relationships of the network. In real life, there are many anomalies that occur from community to community. For example, in a computer network, a certain terminal often sends and receives data to and from a fixed certain terminal, and a relationship of sending and receiving data is formed between the certain terminal and the fixed certain terminal. The terminal and the terminal set which receives and transmits data frequently can be considered to belong to the same community. However, when a hacker attacks the terminal and uses the terminal as a springboard, a large amount of abnormal data is sent to the network to attack other terminals. This is manifested as the node suddenly sending a large amount of data to nodes within and outside the community, whereas the node would normally only send data to nodes within the community previously. This behavior may be considered an abnormal behavior. Therefore, we propose a dynamic graph anomaly detection method based on community structure to solve the above mentioned problems in the background art.
Disclosure of Invention
The invention aims to provide a dynamic graph abnormity detection method based on a community structure, so as to solve the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme: a dynamic graph anomaly detection method based on community structure includes the following steps:
s1: firstly, defining the specific definition of abnormal edge detection of the dynamic graph:
dynamic graphIs a sequence of graphs, GtThe representation is a graph under a time stamp t, Gt=(Vt,Et) (ii) a With the graph updated, the updated edge set uses EtIs represented bytSet V for all vertices in (1)tRepresenting;n=|Vt|,mt=|Etl, |; at time stamp t, AtRepresents GtThe adjacency matrix of (a); given GtThe detection of abnormal edge of dynamic graph aims to find EtThe abnormal edge of (1);
s2: the CmaGraph is composed of a C-Block, an M-Block and an A-Block, the C-Block detects an evolutionary community of the dynamic graph, the M-Block reconstructs distances between vertexes in the community and the community, so that the vertex embedding distances in the same community are close to each other, and the vertex embedding distances in different communities are far away from each other;
s3: vertex embedding is finally input to A-Block for anomaly detection.
The C-Block aims to detect an evolved community, wherein the evolved community refers to a community which changes along with the time on the dynamic graph;
using the adjacency matrix as an input from the encoder to obtain initial vertex embeddings and applying k-means on the vertex embeddings for community detection;
the self-encoder uses a sparse evolution self-encoder (SeAutoencor) to obtain stable vertex embedding, so that k-means obtains a stable community label;
formally, at timestamp t, G can be foundtAdjacent matrix A oftAnd setting the number k of communities, which is a hyper-parameter;
by asEmbedding the constructed vertex of the layer full-connection network SeAutoencor, wherein the forward propagation formula of the SeAutoencor is as follows:
wherein l is 1, …, ls-1,Andrespectively are a l-th layer weight matrix and a bias vector of the SeAutoencoder, and sigma is a sigmoid function;
is provided withApplying k-means to HtThus, a community label vector c containing each vertex can be obtainedtHere, Ht∈Rn×dD is the dimension of vertex embedding, Ct∈Rn;
The reconstruction loss function of SeAutoencorder is:
wherein F is frobenius norm, introducing sparse constraint, and the sparse penalty term of the SeAutoencoder neuron is defined by Kullback-Leibler divergence:
where p is a sparse parameter, where p is,is the average activation of the jth neuron in layer i,at HtAnd Ht-1Introduces timing loss J therebetweent,
Use ofsSeAutoencorder for a layer, the loss function is:
The M-Block aims to reconstruct the distance between communities and the vertexes between the communities, so that the vertexes in the same community are close to each other in Euclidean distance, and the Euclidean distances between the vertexes in different communities are far away from each other;
vertex embedding and community label vector are input of M-Block, and output of M-Block is community measurement enhancement vertex embedding; M-Block uses the community metric enhancement network (CenNet), a Siamese network, to enhance vertex embedding, a method of depth metric learning that reconstructs the distance between vertices in the evolved community
Formally, at timestamp t, H is derived from C-BlocktAnd ct(ii) a Use ofcA fully connected network of layers CenNet, each layer having d neurons, wherein the forward propagation formula is:
wherein l is 1, …, lc-1,Andrespectively is a weight matrix and a bias vector of the first layer of the CenNet; loss of CenNetThe function is a contrast loss function, which is:
whereinRepresenting the euclidean distance between samples i and j,is a matrix OtIf samples i and j are in the same community, then y ij1, otherwise yij0, b is the interval; when the data set is too large, JCenNetIs high, for a given sample i, the index J is obtained using negative sampling, which reduces the complexity, i.e. some random samples J can be obtained from the data set to approximate JCenNet。
The goal of the A-Block is to obtain Et(ii) anomaly scores for all edges in CmaGraph, using OC-NN as an anomaly detector;
A-Block to OtUsing an edge encoder to obtain edge embedding, using for vertex u and vertex vEmbedding and inputting edges into an One-Class neural network (OCNN) by using the embedded distance information, wherein the edge embedding is an anomaly detection model;
formally, at timestamp t, get O for M-BlocktAnd EtThe encoder phi and OtAnd EtConversion to edge-embedded PtUsing a 1aThe OCNN of the layer is a fully connected network, each hidden layer is provided with d neurons, and the output layer is provided with only 1 neuron and represents abnormal score; the forward propagation formula of the neural network is as follows:
wherein l is 1, …, l a2, the last layer does not apply an activation function, i.e. WhereinAndthe weight matrix and the bias vector of the OCNN l-th layer and the abnormal score vector are respectively
The loss function of OCNN is:
where r is the bias of the hyperplane, v controls the number of data points allowed to pass through the hyperplane, v is the percentage of anomalies, and s is the final resulttAnd the abnormal edges can be classified by setting a threshold value;
formally, at timestamp t, an updated edge set E can be obtainedtAccording to EtUpdating adjacency matrix AtAnd use of AtAs input to the CmaGraph, the autoancoder, CenNet, and OCNN were then trained with the learning rate α and previous weights.
Compared with the prior art, the invention has the beneficial effects that: the invention provides a dynamic graph anomaly detection method based on a community structure, which uses an evolutionary sparse self-encoder and K-Means to detect the evolutionary community structure on a dynamic graph, and uses a Siamese network to reconstruct the vertex distance between communities and community intervals, so that the European distances between the vertices in the same community are close to each other, and the European distances between the vertices in different communities are far away from each other. Vertex embedding is learned based on community structure and used in anomaly detection. The method has better effect on detecting the abnormal data and the effectiveness of the community structure in abnormal detection. Aiming at the research blank of carrying out abnormity detection by utilizing a community structure based on a graph embedding method, the invention fills the research blank in the aspect.
Drawings
FIG. 1 is a schematic flow chart of a dynamic graph anomaly detection method based on a community structure according to the present invention;
FIG. 2 is a diagram of the input diagram G of the present inventiont-1A schematic diagram of (a);
FIG. 3 shows (b) G of the present inventiont-1A schematic diagram of the visual output at C-Block;
FIG. 4 is (c) input diagram G of the present inventiontA schematic diagram of (a);
FIG. 5 shows (d) G of the present inventiontA schematic diagram of the visual output at C-Block;
FIG. 6 shows (a) G of the present inventiont-1A schematic of the visual input at M-Block;
FIG. 7 shows (b) G of the present inventiont-1Schematic diagram of visual output at M-Block.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The CmaGraph performs detection of abnormal edges based on community structures on the dynamic graph. The CmaGraph detects the evolutionary community of the dynamic graph, and rebuilds the distance between the vertexes in the community and the community through a Siamese network, so that the vertexes in the same community are close to each other in Euclidean distance, and the Euclidean distances between the vertexes in different communities are far away from each other. The distance between vertices implicitly preserves the distance relationship between intra-community and inter-community. The distance between the vertexes is used for coding edges, the obtained edges are embedded and input into the existing abnormal edge detection algorithm for abnormal edge detection.
Anomaly detection is the identification of events or observations in the data that do not match the expected pattern. A dynamic graph is a sequence of graphs in which the graphs change over time. Dynamic graph anomaly detection is the identification of anomalous data in a dynamic graph. The invention provides a dynamic graph anomaly detection method based on a community structure, and aims to solve the technical problem of fully utilizing the community structure in a dynamic graph to mine anomalous data when the community structure in the dynamic graph is obviously divided. The ultimate goal addressed by the present invention is to mine anomalous data in a dynamic graph given the dynamic graph.
The purpose of the invention is as follows:
aiming at the research blank of carrying out anomaly detection by utilizing a community structure based on a graph embedding method, the invention provides a dynamic graph anomaly detection method CmaGraph based on the community structure, and fills the research blank in the aspect.
The invention provides a dynamic graph abnormity detection method based on a community structure as shown in figures 1-7, which comprises the following steps:
firstly, the specific definition of the detection of the abnormal edge of the dynamic graph is defined:
dynamic graphIs a sequence of graphs. GtThe representation is a graph under a time stamp t, Gt=(Vt,Et). With the graph updated, the updated edge set uses EtIs represented bytSet V for all vertices in (1)tAnd (4) showing.n=|Vt|,mt=|EtL. At time stamp t, AtRepresents GtOf the adjacent matrix. Given GtThe detection of abnormal edge of dynamic graph aims to find EtAn abnormal edge in (1).
The CmaGraph is composed of three blocks, as shown in FIG. 1, including C-Block, M-Block, and A-Block.
FIG. 1 is a CmaGraph flow diagram in which (a) the dynamic graph, (b) the adjacency matrix, (C) C-Block: an evolutionary community detection Block of a dynamic graph, (d) M-Block: an embedding distance reconstruction Block, (e) A-Block: and an anomaly detection block.
C-Block detects the evolutionary community of the kinetic graph. M-Block reconstructs the distance between the vertexes in the communities, so that the vertex embedding distances in the same community are close to each other, and the vertex embedding distances in different communities are far away from each other. Vertex embedding is finally input to A-Block for anomaly detection.
The goal of C-Block (a) is to detect evolving communities. Evolutionary communities, refer to communities on a dynamic graph that change over time. For example, in a certain enterprise, workers within the enterprise may form a community. Over time, there is a flow of people within the enterprise, such as new employees, and employees who are out of business. But this community still exists, only the structure has changed. Therefore, in the dynamic graph, vertices in the community and the structure of the community change with time. In many real dynamic graphs, the structure of the community changes over time, but the changes are not very drastic. Therefore, it is first necessary to detect evolving communities on the kinetic graph, and the variation of these communities cannot be too drastic.
The method comprises the following steps: initial vertex embeddings are obtained using the adjacency matrix as input from the encoder, and k-means are applied on the vertex embeddings for community detection. In particular, the present invention uses a sparse evolution self-encoder (SeAutoencoder) which can result in stable vertex embedding, and thus k-means can result in stable community tags. The input and output of C-Block in a composite dynamic graph and the corresponding visualization information are shown in fig. 2-5, and fig. 2-5 show that the embedded moving distance is small and the community variation is small as the graph is updated.
FIGS. 2-5 are generally the inputs and outputs of C-Block. FIG. 2 is (a) an input graph Gt-1FIG. 3 is (b) Gt-1A schematic diagram of visual output in C-Block, FIG. 4 is (C) an input diagram GtFIG. 5 is (d) GtSchematic diagram of visual output at C-Block. (b) C1 and c2 in (d) represent different communities, and the arrow in (d) represents the embedded update direction relative to (b).
Formally, at timestamp t, G can be foundtAdjacent matrix A oftAnd sets the number of communities k, which is a hyperparameter. For the inventionsThe full-connection network SeAutoencor structure vertex of the layer is embedded, and the forward propagation formula of the SeAutoencor is as follows
Wherein l is 1, …, ls-1,Andrespectively are a l-th layer weight matrix and an offset vector of the SeAutoencoder, and sigma is a sigmoid function. Is provided withApplying k-means to HtThus, a community label vector c containing each vertex can be obtainedt. Here, Ht∈Rn×dD is the dimension of vertex embedding, Ct∈Rn. The reconstruction loss function of SeAutoencorder is:
wherein F is the frobenius norm. Introducing sparsity constraint, and defining a sparsity penalty term of the SeAutoencoder neuron by Kullback-Leibler divergence:
where p is a sparse parameter, where p is,for the average activation of the jth neuron in layer i,the vertex embedding and community label changes cannot be too drastic when the graph is updated. Thus, at HtAnd Ht-1Introduces timing loss J therebetweent:
When t is 1, JT0. Use ofsSeAutoencorder for a layer, the loss function is:
where β and λ control the weight of the sparsity constraint and timing loss, respectively.
The aim of the M-Block is to reconstruct the distance between communities and the vertexes between the communities, so that the vertexes in the same community are close to each other in Euclidean distance, and the vertexes in different communities are far from each other in Euclidean distance. The vertex embedding and community label vector are the inputs to M-Block, the output of which is community metric enhanced vertex embedding. M-Block uses community metric enhancement network (CenNet), which is a Siamese network, to enhance vertex embedding, which is a method of depth metric learning. It reconstructs the distance between vertices in the evolving community. As shown in fig. 6-7, enhanced vertex embedding is more indicative than original vertex embedding, because the euclidean distance between vertices implicitly preserves the intra-community and inter-community distance information.
FIGS. 6-7 are generally Gt-1And (4) displaying input and output of M-Block. FIG. 6 shows (a)Gt-1FIG. 7 is a schematic diagram of visual input at M-Block (b) Gt-1Schematic diagram of visual output at M-Block.
Formally, at timestamp t, H is derived from C-BlocktAnd ct. Use of the inventioncA fully connected network of layers CenNet, each layer having d neurons, wherein the forward propagation formula is:
wherein l is 1, …, lc-1,Andrespectively, a weight matrix and an offset vector of the l-th layer of the CenNet. The loss function for CenNet is the comparative loss function:
whereinRepresenting the euclidean distance between samples i and j,is a matrix OtIf samples i and j are in the same community, then y ij1, otherwise yijAnd b is an interval 0. When the data set is too large, JCenNetIs high, for a given sample i, the index j is obtained using negative sampling, which may reduce complexity. That is, some random samples J can be obtained from the data set to approximate JCenNet。
(III) A-Block aimed at obtaining EtThe anomaly score of all edges in. In CmaGraph, the invention uses OC-NN as an anomaly detector.
As shown in FIGS. 2-5, A-Block vs. OtAn edge encoder is applied to obtain edge embedding. For vertex u and vertex v, the invention usesIt can make better use of the embedded distance information. The edge embedding is then input into One-Class neural network (OCNN), which is an anomaly detection model.
Formally, at timestamp t, O for M-Block can be obtainedtAnd Et. Edge encoder φ to OtAnd EtConversion to edge-embedded Pt. The invention uses aaAnd in the fully connected network OCNN of the layers, each hidden layer has d neurons, and the output layer has only 1 neuron and represents abnormal score. The forward propagation formula of the neural network is as follows:
wherein l is 1, …, l a2, the last layer does not apply an activation function, i.e. WhereinAndthe weight matrix and the bias vector of the OCNN l-th layer and the abnormal score vector are respectivelyThe loss function of OCNN is:
where r is the offset of the hyperplane. V controls the number of data points that are allowed to pass through the hyperplane, v being equivalent to the percentage of anomalies. Finally obtaining stAnd the abnormal edges can be classified by setting a threshold.
Formally, at timestamp t, an updated edge set E can be obtainedt. According to EtUpdating adjacency matrix AtAnd use of AtAs input to the CmaGraph. SeAutoencorder, CenNet, and OCNN are then trained with the learning rate α and previous weights.
The CmaGraph is summarized in algorithm 1.
In real life, many anomalies occur among communities, however, the community structure in the dynamic graph is not considered by the existing dynamic graph anomaly detection algorithm based on the graph embedding model. The method uses an evolution sparse self-encoder and K-Means to detect the evolution community structure on the dynamic graph, and uses a Siamese network to reconstruct the vertex distance between communities and the community, so that the European distances between the vertexes in the same community are close to each other, and the European distances between the vertexes in different communities are far away from each other. Vertex embedding is learned based on community structure and used in anomaly detection.
The method of the invention uses three real-world dynamic graph data sets as data to carry out repeated tests. The data set types encompass social networking graphs, paper author collaboration graphs, and computer networking graphs. Quantitative analysis is carried out through AUC indexes, on three data sets, the CmaGraph method is improved by 18% compared with the NetWalk method on average, and the CmaGraph method has a good effect on abnormal data detection and the effectiveness of community structures on abnormal detection.
In summary, compared with the prior art, the method uses the evolution sparse autoencoder and the K-Means to detect the evolution community structure on the dynamic graph, and uses the Siamese network to reconstruct the vertex distance between the communities and the community, so that the European distances between the vertexes in the same community are close to each other, and the European distances between the vertexes in different communities are far away from each other. Vertex embedding is learned based on community structure and used in anomaly detection. The method has better effect on detecting the abnormal data and the effectiveness of the community structure in abnormal detection. Aiming at the research blank of carrying out abnormity detection by utilizing a community structure based on a graph embedding method, the invention fills the research blank in the aspect.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments or portions thereof without departing from the spirit and scope of the invention.
Claims (4)
1. The dynamic graph anomaly detection method based on the community structure is characterized by comprising the following steps: the method comprises the following steps:
s1: firstly, defining the specific definition of abnormal edge detection of the dynamic graph:
dynamic graphIs a sequence of graphs, GtThe representation is a graph under the time stamp t, Gt=(Vt,Et) (ii) a With the graph updated, the updated edge set uses EtIs represented bytSet V for all vertices in (1)tRepresents;n=|Vt|,mt=|Etl, |; at time stamp t, AtRepresents GtThe adjacency matrix of (a); given GtThe detection of abnormal edge of dynamic graph aims to find EtThe abnormal edge of (1);
s2: the CmaGraph is composed of a C-Block, an M-Block and an A-Block, the C-Block detects an evolutionary community of the dynamic graph, the M-Block reconstructs distances between vertexes in the community and the community, so that the vertex embedding distances in the same community are close to each other, and the vertex embedding distances in different communities are far away from each other;
s3: vertex embedding is finally input to A-Block for anomaly detection.
2. The method for detecting the anomaly of the dynamic graph based on the community structure as claimed in claim 1, wherein: the C-Block aims to detect an evolved community, wherein the evolved community refers to a community which changes along with the time on the dynamic graph;
using the adjacency matrix as an input from the encoder to obtain initial vertex embeddings and applying k-means on the vertex embeddings for community detection;
the self-encoder obtains stable vertex embedding by using a sparse evolution self-encoder, namely a SeAutoencoder, so that k-means obtains a stable community label;
formally, at timestamp t, G can be foundtAdjacent matrix A oftAnd setting the number k of communities, which is a hyper-parameter;
by asThe full-connection network SeAutoencor of the layer is constructed with vertex embedding, and the forward propagation formula of SeAutoencor is as follows:
wherein l is 1, …, ls-1, Andrespectively are a l-th layer weight matrix and a bias vector of the SeAutoencoder, and sigma is a sigmoid function;
is provided withApplying k-means to HtThus, a community label vector c containing each vertex can be obtainedtHere, Ht∈Rn×dD is the dimension of vertex embedding, Ct∈Rn(ii) a The reconstruction loss function of SeAutoencorder is:
wherein F is frobenius norm, introducing sparse constraint, and the sparse penalty term of the SeAutoencoder neuron is defined by Kullback-Leibler divergence:
where p is a sparse parameter, where p is,for the average activation of the jth neuron in layer i,at HtAnd Ht-1Introduces timing loss J therebetweent,
Use ofsSeAutoencorder for a layer, the loss function is:
3. The method for detecting the anomaly of the dynamic graph based on the community structure as claimed in claim 1, wherein: the M-Block aims to reconstruct the distance between communities and the vertexes between the communities, so that the vertexes in the same community are close to each other in Euclidean distance, and the Euclidean distances between the vertexes in different communities are far away from each other;
vertex embedding and community label vector are input of M-Block, and output of M-Block is community measurement enhancement vertex embedding; M-Block uses the community metric enhancement network, namely CenNet, to strengthen the vertex embedding, CenNet is a Siamese network, it is a method of depth metric learning, it has rebuilt the distance between vertexes in the evolution community;
formally, at timestamp t, H is derived from C-BlocktAnd ct(ii) a Use ofcA fully connected network of layers CenNet, each layer having d neurons, wherein the forward propagation formula is:
wherein l is 1, …, lc-1, Andof the first layer of CenNet respectivelyA weight matrix and a bias vector; the loss function for CenNet is the comparative loss function, which is:
whereinRepresenting the euclidean distance between samples i and j,is a matrix OtIf samples i and j are in the same community, then yij1, otherwise yij0, b is the interval; when the data set is too large, JCenNetIs high, for a given sample i, the index J is obtained using negative sampling, which reduces the complexity, i.e. some random samples J can be obtained from the data set to approximate JCenNet。
4. The method for detecting the anomaly of the dynamic graph based on the community structure as claimed in claim 1, wherein: the goal of the A-Block is to obtain Et(ii) anomaly scores for all edges in CmaGraph, using OC-NN as an anomaly detector;
A-Block to OtUsing an edge encoder to obtain edge embedding, using for vertex u and vertex vEmbedding the edges into an One-Class neural network (OCNN) by using the embedded distance information, wherein the edge is an abnormal detection model;
formally, at timestamp t, get O for M-BlocktAnd EtThe encoder phi and OtAnd EtConversion to edge-embedded PtUsing a 1aAn OCNN with d neurons in each hidden layer and an output layerOnly 1 neuron, representing an abnormal score; the forward propagation formula of the neural network is as follows:
wherein l is 1, …, la2, the last layer does not apply an activation function, i.e. WhereinAndthe weight matrix and the bias vector of the OCNN l-th layer and the abnormal score vector are respectively
The loss function of OCNN is:
where r is the bias of the hyperplane, v controls the number of data points allowed to pass through the hyperplane, v is the percentage of anomalies, and s is the final resulttAnd the abnormal edges can be classified by setting a threshold value;
formally, at timestamp t, an updated edge set E can be obtainedtAccording to EtUpdating adjacency matrix AtAnd use of AtAs input to the CmaGraph, the autoancoder, CenNet, and OCNN were then trained with the learning rate α and previous weights.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210019006.8A CN114443909A (en) | 2022-01-10 | 2022-01-10 | Dynamic graph anomaly detection method based on community structure |
PCT/CN2022/112656 WO2023130728A1 (en) | 2022-01-10 | 2022-10-11 | Dynamic graph anomaly detection method based on block structure |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210019006.8A CN114443909A (en) | 2022-01-10 | 2022-01-10 | Dynamic graph anomaly detection method based on community structure |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114443909A true CN114443909A (en) | 2022-05-06 |
Family
ID=81366799
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210019006.8A Pending CN114443909A (en) | 2022-01-10 | 2022-01-10 | Dynamic graph anomaly detection method based on community structure |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN114443909A (en) |
WO (1) | WO2023130728A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023130728A1 (en) * | 2022-01-10 | 2023-07-13 | 深圳市检验检疫科学研究院 | Dynamic graph anomaly detection method based on block structure |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8090665B2 (en) * | 2008-09-24 | 2012-01-03 | Nec Laboratories America, Inc. | Finding communities and their evolutions in dynamic social network |
US9202052B1 (en) * | 2013-06-21 | 2015-12-01 | Emc Corporation | Dynamic graph anomaly detection framework and scalable system architecture |
CN106909948A (en) * | 2017-03-10 | 2017-06-30 | 深圳大学 | A kind of increment documents structured Cluster method and system on Dynamic Graph |
CN111768618B (en) * | 2020-06-04 | 2021-07-20 | 北京航空航天大学 | Traffic jam state propagation prediction and early warning system and method based on city portrait |
CN111709518A (en) * | 2020-06-16 | 2020-09-25 | 重庆大学 | Method for enhancing network representation learning based on community perception and relationship attention |
CN114443909A (en) * | 2022-01-10 | 2022-05-06 | 深圳市检验检疫科学研究院 | Dynamic graph anomaly detection method based on community structure |
-
2022
- 2022-01-10 CN CN202210019006.8A patent/CN114443909A/en active Pending
- 2022-10-11 WO PCT/CN2022/112656 patent/WO2023130728A1/en unknown
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023130728A1 (en) * | 2022-01-10 | 2023-07-13 | 深圳市检验检疫科学研究院 | Dynamic graph anomaly detection method based on block structure |
Also Published As
Publication number | Publication date |
---|---|
WO2023130728A1 (en) | 2023-07-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lv et al. | Deep-learning-enabled security issues in the internet of things | |
Chen et al. | Fast gradient attack on network embedding | |
Nicolau et al. | Learning neural representations for network anomaly detection | |
CN109639710B (en) | Network attack defense method based on countermeasure training | |
Zhao et al. | A malware detection method of code texture visualization based on an improved faster RCNN combining transfer learning | |
US11699160B2 (en) | Method, use thereof, computer program product and system for fraud detection | |
Chen et al. | Link prediction adversarial attack | |
Chen et al. | An efficient network behavior anomaly detection using a hybrid DBN-LSTM network | |
Irfan et al. | A novel lifelong learning model based on cross domain knowledge extraction and transfer to classify underwater images | |
Jia et al. | Predict land covers with transition modeling and incremental learning | |
CN113269228B (en) | Method, device and system for training graph network classification model and electronic equipment | |
CN111861756A (en) | Group partner detection method based on financial transaction network and implementation device thereof | |
CN111259264B (en) | Time sequence scoring prediction method based on generation countermeasure network | |
CN114443909A (en) | Dynamic graph anomaly detection method based on community structure | |
Nichol et al. | Machine learning feature analysis illuminates disparity between E3SM climate models and observed climate change | |
Zhang et al. | An intrusion detection method based on stacked sparse autoencoder and improved gaussian mixture model | |
CN115310589A (en) | Group identification method and system based on depth map self-supervision learning | |
CN114445639A (en) | Dual self-attention-based dynamic graph anomaly detection method | |
CN110688537A (en) | Calculation graph node low-dimensional representation and related application method | |
CN107391443B (en) | Sparse data anomaly detection method and device | |
CN116245645A (en) | Financial crime partner detection method based on graph neural network | |
Kim et al. | Network Intrusion Detection System using 2D Anomaly Detection | |
CN115982373A (en) | Knowledge graph recommendation method combining multi-level interactive contrast learning | |
CN115965795A (en) | Deep darknet group discovery method based on network representation learning | |
Zouinina et al. | Efficient k-anonymization through constrained collaborative clustering |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |