CN116760583A

CN116760583A - Enhanced graph node behavior characterization and abnormal graph node detection method

Info

Publication number: CN116760583A
Application number: CN202310652286.0A
Authority: CN
Inventors: 周颖杰; 刘凡兴; 纪守领; 谢禹秦; 刘凌峤; 朱策
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2023-06-02
Filing date: 2023-06-02
Publication date: 2023-09-15
Anticipated expiration: 2043-06-02
Also published as: CN116760583B

Abstract

The invention discloses an enhanced graph node behavior characterization and an abnormal graph node detection method thereof, which relate to the technical field of network security and comprise the following steps: constructing and training an abnormal graph node detection model of graph node behavior characterization to obtain a corresponding trained abnormal graph node detection model; inputting all node attribute lists of the graph structure and an adjacency matrix representing the graph structure into a trained abnormal graph node detection model to obtain an abnormal score calculation result of a node to be detected in the graph; if the abnormal score of the node to be detected in the graph is greater than a threshold value, judging the node to be an abnormal graph node; otherwise, the node is judged to be a normal graph node. According to the method, the characteristic expression of the graph node behavior can be enhanced through the double random node behavior expression, the robust and effective expression of the graph node behavior is realized, and the capability of the characteristic extraction network for representing the graph node behavior is improved; the difference between the normal graph nodes and the abnormal graph nodes can be fully utilized, and an excellent abnormal detection effect is ensured.

Description

Enhanced graph node behavior characterization and abnormal graph node detection method

Technical Field

The invention relates to the technical field of network security, in particular to an enhanced graph node behavior characterization and an abnormal graph node detection method thereof.

Background

Attribute graph anomaly graph node detection is an important research content in the field of network security. The graph structure data widely exist in the Internet of things system, and the abnormal graph nodes in the graph correspond to hosts with abnormal behaviors in the Internet of things. When detecting the abnormal graph nodes in the attribute graph, the abnormal graph nodes can be directly detected according to the attribute characteristics of the node graph nodes, and the deeper characteristics can be extracted by combining the association between the abnormal graph nodes and other graph nodes to detect. Because the graph nodes of the attribute graph have higher common attribute dimension and have more complex intrinsic behavior patterns, a machine learning model is usually required to be constructed in an actual scene to complete the task of detecting the abnormal graph nodes, so that abnormal behaviors are timely discovered and timely processed to reduce or avoid loss. Existing systems are typically built based on either supervised or unsupervised methods. The system based on the supervision method generally requires more abnormal labels, the system performance can be drastically reduced under the condition that only a small number of abnormal graph nodes with labels exist in an actual scene, the number of the label graph nodes is too small, and the abnormal graph nodes are easily subjected to over-fitting in the learning process, so that the detection effect of the system is not ideal; the system based on the unsupervised method only learns normal graph nodes, detects abnormal graph nodes according to differences between the nodes of the graph to be detected and the node characteristics of the normal graph, does not fully utilize known abnormal samples with labels, and has low system performance based on the unsupervised method when processing actual abnormal detection data sets due to lack of corresponding label information.

Disclosure of Invention

Aiming at the defects in the prior art, the enhanced graph node behavior characterization and the abnormal graph node detection method thereof solve the problems that extremely small amount of marked abnormal graph node data and a large amount of unmarked graph node data are not fully utilized and the system detection effect is not ideal in the prior art.

In order to achieve the aim of the invention, the invention adopts the following technical scheme:

the method for enhancing the node behavior characterization of the graph and detecting the abnormal graph thereof comprises the following steps:

s1, constructing and training an abnormal graph node detection model of graph node behavior characterization, and obtaining the trained abnormal graph node detection model of graph node behavior characterization;

s2, inputting all node attribute lists of the graph structure and an adjacency matrix representing the graph structure into an abnormal graph node detection model of the trained graph node behavior characterization, and obtaining an abnormal score calculation result of the nodes to be detected in the graph;

s3, if the abnormal score of the node to be detected in the graph is greater than a threshold value, judging that the node is an abnormal graph node; otherwise, the node is judged to be a normal graph node.

Further, the abnormal graph node detection model of the graph node behavior characterization in the step S1 comprises a feature extraction network and an abnormal score calculation network; the feature extraction network comprises a node behavior expression enhancement module, a feature information and position information precoding module and a feature extraction module based on a graph self-encoder; the node behavior expression enhancement module comprises a random node selection operator, a random attribute selection operator and a disturbance adding operator; the characteristic information and position information precoding module comprises a characteristic information precoder and a position information precoder; the characteristic information precoder comprises a fully connected neural network; the position information precoder comprises a fully connected neural network; the feature extraction module based on the graph self-encoder comprises a graph convolution-based encoder and a graph convolution-based decoder; the graph convolution-based encoder includes a multi-layer graph convolution structure; the graph convolution-based decoder includes a multi-layer graph convolution structure; the graph convolution structure comprises a full connection layer and a matrix multiplier; the anomaly score computing network comprises a fully connected neural network; the fully-connected neural network comprises an input layer, an output layer and a plurality of hidden layers.

Further, the specific operation of the abnormal graph node detection model for training the graph node behavior characterization in step S1 is as follows:

s1-1, taking an attribute list X of all nodes in a graph structure and an adjacent matrix A formed among the graph nodes as training data; inputting training data to a node behavior expression enhancement module; randomly extracting normal graph nodes in the attribute list X to obtain graph nodes with a percent, wherein the selection probability of each graph node is subject to uniform distribution; randomly selecting all the attributes of each selected graph node to obtain b% of attributes, wherein the selection probability of each attribute is subject to uniform distribution; counting the average value of each selected attribute of all input normal unlabeled graph nodes, adding random disturbance obeying the normal distribution of the statistical average value with average value mu and standard deviation sigma to the value corresponding to the attribute of the selected graph node, and creating an indication vector with the dimension identical to the attribute dimension of the graph node to obtain a graph node attribute list X' with enhanced behavior expression and an indication vector list V corresponding to the graph node attribute list; wherein, a is 20 by default, b is 20 by default, μ is 0 by default, and σ is 0.1 by default;

s1-2, inputting an indication vector list V corresponding to the graph node attribute list X 'and the graph node attribute list after the behavior expression is enhanced into a characteristic information and position information precoding module, and respectively calculating the indication vector list V corresponding to the graph node attribute list X' after the behavior expression is enhanced through a forward propagation algorithm to obtain corresponding characteristic information precodingAnd position information precoding->According to the formula:

obtaining a spliced precoding result H output by the characteristic information and position information precoding module ₀ The method comprises the steps of carrying out a first treatment on the surface of the Wherein Concat (-) represents the vector splice operator;

s1-3, pre-coding result H after splicing ₀ The adjacency matrix A formed between the graph nodes is input to a feature extraction module based on a graph self-encoder for reconstruction, and a reconstructed graph node attribute list is obtainedThe characteristic vector H of the graph node in the hidden space, the reconstruction error vector R and the characteristic code R obtained by splicing a first norm value and a second norm value of the reconstruction error vector R;

s1-4, inputting a characteristic vector H and a characteristic code r of the graph node in the hidden space into an anomaly score calculation network, and according to the formula:

H ₁ ^l+1 ＝ReLu(H ₁ ^l )

obtaining an output characteristic vector H of the first hidden layer of the anomaly score computing network ₁ ^l+1 The method comprises the steps of carrying out a first treatment on the surface of the Wherein H is ₁ ^l An input feature vector representing the first hidden layer of the anomaly score computation network,weight representing the first hidden layer of the anomaly score computing network,/->Representing the bias of the first hidden layer of the anomaly score computing network, wherein Concat (-) represents a vector splicing operator, and ReLu (-) represents a nonlinear activation function;

s1-5, according to the formula:

res＝H ₁ ×W+b

obtaining an anomaly score calculation result res of the graph node; wherein H is ₁ The output characteristic vector of the last hidden layer of the abnormal score calculating network is represented, W represents the weight of the output layer of the abnormal score calculating network, and b represents the bias of the output layer of the abnormal score calculating network;

s1-6, according to the graph node attribute list X and the reconstructed graph node attribute listConstructing a first loss function; constructing a second loss function according to the abnormal score calculation result res of the graph node and the actual label of the graph node; adding the first loss function and the second loss function based on the corresponding preset weights to obtain a third loss function, and training an abnormal graph node detection model represented by graph node behaviors through the third loss function; obtaining an abnormal graph node detection model of the trained graph node behavior characterization; the actual label of the normal graph node is 0, and the actual label of the abnormal graph node is 1.

Further, the specific operation of step S1-3 is as follows:

s1-3-1, pre-coding result H after splicing ₀ Mapping to a low-dimensional hidden space, according to the formula:

obtaining a characteristic vector H of a graph node in a hidden space; wherein H is ₀ ' represents the pre-coding result H after splicing ₀ Or the output of the last fully connected layer,representing fully-connected layersEncoder parameters, f _ce (. Cndot.) represents the full-connection layer function of the encoder, H ₀ "means the output of the fully connected layer, H ₂ Representing the output of the last fully-connected layer, MM (·) representing the matrix multiplier;

s1-3-2, according to the formula:

r＝Concat(||R|| ₁ ,||R|| ₂ )

obtaining a reconstructed graph node attribute listThe reconstruction error vector R and a characteristic code R obtained by splicing a first norm value and a second norm value of the reconstruction error vector; wherein H 'represents the feature vector H of the graph node in the hidden space or the feature vector of the last full-connection layer, H' represents the feature vector of the full-connection layer, < >>Decoder parameters representing full connection layer, f _cd (. Cndot.) represents the full-connection layer function of the decoder, H _{_out} Representing the feature vector of the last fully connected layer, I R I ₁ A norm value representing the reconstructed error vector R, I R I ₂ Representing the two normals of the reconstructed error vector R.

Further, the specific operations of steps S1-6 are as follows:

s1-6-1, taking the difference between the output of the minimum feature extraction network and the node attribute of the input graph as an optimization target, and according to the formula:

obtaining a first loss functionWherein MSE (·) represents the calculated mean square error;

s1-6-2, performing end-to-end joint optimization on the feature extraction network and the anomaly score calculation network by minimizing the comprehensive loss based on the reconstruction error and the anomaly score calculation error according to the formula:

loss _c (res,y；t)＝(1-y)|res|+ymax(0,t-res)

obtaining a third loss functionWherein loss is _c (res, y; t) represents a second loss function, y represents an actual label, t represents a set scaling factor, α represents a constant, |·| represents an absolute value, and max (·) represents a maximum value;

s1-6-3 by a third loss functionAnd carrying out parameter updating on the abnormal graph node detection model of the graph node behavior characterization.

The beneficial effects of the invention are as follows:

1. according to the invention, the graph node behavior feature expression in the normal mode can be enriched in a double-random node behavior expression enhancement mode, and a feature expression hidden space which is convenient for distinguishing normal graph nodes from abnormal graph nodes is constructed based on the trained feature extraction network, so that the robust and effective expression of the graph node behavior is realized.

2. The method and the device can learn the differences between the attribute characteristics and the connection behavior characteristics of the normal graph nodes and the abnormal graph nodes under the condition of fully utilizing the labeling information of the abnormal graph nodes, and ensure excellent abnormal detection effect.

3. The invention can obtain the prompt information of the specific implementation mode of enhancing the graph node behavior expression through the feature information and position information precoding module, is favorable for obtaining robust and effective graph node behavior characterization, and improves the capability of the feature extraction network for the graph node behavior characterization.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a self-monitoring network anomaly graph node detection model graph based on node behavior characterization of the present invention;

fig. 3 is a block diagram of the anomaly score calculation network of the present invention.

Detailed Description

The following description of the embodiments of the present invention is provided to facilitate understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and all the inventions which make use of the inventive concept are protected by the spirit and scope of the present invention as defined and defined in the appended claims to those skilled in the art.

As shown in FIG. 1, the method for enhancing the node behavior characterization of the graph and detecting the nodes of the abnormal graph comprises the following steps:

As shown in fig. 2, the abnormal graph node detection model of the graph node behavior characterization in step S1 includes a feature extraction network and an abnormal score calculation network; the feature extraction network comprises a node behavior expression enhancement module, a feature information and position information precoding module and a feature extraction module based on a graph self-encoder; the node behavior expression enhancement module comprises a random node selection operator, a random attribute selection operator and a disturbance adding operator; the characteristic information and position information precoding module comprises a characteristic information precoder and a position information precoder; the characteristic information precoder comprises a fully connected neural network; the position information precoder comprises a fully connected neural network; the feature extraction module based on the graph self-encoder comprises a graph convolution-based encoder and a graph convolution-based decoder; the graph convolution-based encoder includes a multi-layer graph convolution structure; the graph convolution-based decoder includes a multi-layer graph convolution structure; the graph convolution structure includes a full join layer and a matrix multiplier.

As shown in fig. 3, the anomaly score computation network includes a fully connected neural network; the fully-connected neural network comprises an input layer, an output layer and a plurality of hidden layers, wherein the characteristic vector H of the graph node in the hidden space is input through the input layer, and the characteristic code r is directly input into each hidden layer.

In step S1, the specific operation of the abnormal graph node detection model for training the graph node behavior characterization is as follows:

H ₁ ^l+1 ＝ReLu(H ₁ ^l )

s1-5, according to the formula:

res＝H ₁ ×W+b

The specific operation of step S1-3 is as follows:

s1-3-1 pre-coding result H after splicing ₀ Mapping to a low-dimensional hidden space, according to the formula:

obtaining a characteristic vector H of a graph node in a hidden space; wherein H is ₀ ' represents the pre-coding result H after splicing ₀ Or the output of the last fully connected layer,encoder parameters representing full connection layer, f _ce (. Cndot.) represents the full-connection layer function of the encoder, H ₀ "means the output of the fully connected layer, H ₂ Representing the output of the last fully-connected layer, MM (·) representing the matrix multiplier;

s1-3-2, according to the formula:

r＝Concat(||R|| ₁ ,||R|| ₂ )

obtaining a reconstructed graph node attribute listReconstruction errorsThe difference vector R and a characteristic code R obtained by splicing a first norm value and a second norm value of the reconstruction error vector; wherein H 'represents the feature vector H of the graph node in the hidden space or the feature vector of the last full-connection layer, H' represents the feature vector of the full-connection layer, < >>Decoder parameters representing full connection layer, f _cd (. Cndot.) represents the full-connection layer function of the decoder, H _{_out} Representing the feature vector of the last fully connected layer, I R I ₁ A norm value representing the reconstructed error vector R, I R I ₂ Representing the two normals of the reconstructed error vector R.

The specific operation of steps S1-6 is as follows:

loss _c (res,y；t)＝(1-y)|res|+ymax(0,t-res)

obtaining a third loss functionWherein loss is _c (res, y; t) represents a second loss function,y represents an actual label, t represents a set scaling rate, alpha represents a constant, |·| represents an absolute value, and max (·) represents a maximum value;

In one embodiment of the invention, an anomaly graph node detection model for graph node behavior characterization is constructed and trained, the model comprising a feature extraction network and an anomaly score computation network. The feature extraction network comprises a node behavior expression enhancement module, a feature information and position information precoding module and a feature extraction module based on a graph self-encoder; the node behavior expression enhancement module comprises a random node selection operator, a random attribute selection operator and a disturbance adding operator; the characteristic information and position information precoding module comprises a characteristic information precoder and a position information precoder; the anomaly score computation network comprises a fully connected neural network.

And (3) the node attribute list of the graph to be tested and the adjacency matrix representing the graph structure are sent to a trained self-supervision network abnormal graph node detection model based on node behavior characterization. In the node behavior expression enhancement module, a random node selection operator and a random attribute selection operator sequentially sample a node attribute list of the graph to be tested randomly; the disturbance adding operator adds random disturbance of normal distribution of statistical mean to the value corresponding to the extracted graph node attribute, and creates a corresponding indication vector to obtain a graph node attribute list with enhanced behavior expression and an indication vector list corresponding to the graph node attribute list.

In a feature information and position information precoding module, mapping the graph node attribute subjected to behavior expression enhancement to a lower dimension by a feature information precoder to obtain a preliminary attribute feature; the position information precoder maps the indication vector to a lower dimension to obtain a preliminary indication vector, wherein the dimension is the same as the dimension of the preliminary attribute feature; vector splicing is carried out on the preliminary attribute characteristics and the preliminary indication vectors to obtain a spliced precoding result, wherein the spliced precoding result is the output of the characteristic information and position information precoding module.

In the feature extraction module, mapping the spliced pre-coding result to a low-dimensional hidden space through an encoder; obtaining a low-dimensional representation of the spliced precoding result; and mapping the low-dimensional representation of the spliced pre-coding result back to the original graph node attribute space through a decoder to obtain a reconstructed graph node attribute list, a characteristic vector of the graph node in the hidden space, a reconstruction error vector and a characteristic code obtained by splicing a first norm value and a second norm value of the reconstruction error vector, which are all the output of the characteristic extraction module. Wherein the encoder and decoder each comprise a multi-layer picture convolution structure, each layer of picture convolution structure comprising a fully concatenated layer and a matrix multiplier.

In an anomaly score computing network, computing feature vectors of the graph nodes in a hidden space, a reconstruction error vector, and feature codes obtained by splicing a first norm value and a second norm value of the reconstruction error vector to obtain an anomaly score result of each graph node; comparing the abnormal score result of each graph node with a set threshold value, and when the abnormal score result of each graph node is greater than the threshold value, determining that the graph node is an abnormal graph node; and when the abnormal score result of the graph node is smaller than the threshold value, the graph node is a normal graph node, and the detection of the network abnormal graph node is completed.

In summary, the graph node behavior feature expression in the normal mode can be enriched in a mode of enhancing the dual random node behavior expression, and a feature expression hidden space which is convenient for distinguishing normal graph nodes from abnormal graph nodes is constructed based on the trained feature extraction network, so that the robust and effective expression of the graph node behavior is realized, and the capability of the feature extraction network for representing the graph node behavior is improved; the method can learn the differences between the attribute characteristics and the connection behavior characteristics of the normal graph nodes and the abnormal graph nodes under the condition of fully utilizing the labeling information of the abnormal graph nodes, and ensures excellent abnormal detection effect.

Claims

1. The method for enhancing the behavior characterization of the graph nodes and detecting the abnormal graph nodes is characterized by comprising the following steps of: the method comprises the following steps:

2. The enhancement map node behavior characterization and anomaly map node detection method according to claim 1, wherein: the abnormal graph node detection model of the graph node behavior characterization in the step S1 comprises a feature extraction network and an abnormal score calculation network; the feature extraction network comprises a node behavior expression enhancement module, a feature information and position information precoding module and a feature extraction module based on a graph self-encoder; the node behavior expression enhancement module comprises a random node selection operator, a random attribute selection operator and a disturbance adding operator; the characteristic information and position information precoding module comprises a characteristic information precoder and a position information precoder; the characteristic information precoder comprises a fully connected neural network; the position information precoder comprises a fully connected neural network; the feature extraction module based on the graph self-encoder comprises a graph convolution-based encoder and a graph convolution-based decoder; the graph convolution-based encoder includes a multi-layer graph convolution structure; the graph convolution-based decoder includes a multi-layer graph convolution structure; the graph convolution structure comprises a full connection layer and a matrix multiplier; the anomaly score computing network comprises a fully connected neural network; the fully-connected neural network comprises an input layer, an output layer and a plurality of hidden layers.

3. The enhancement map node behavior characterization and anomaly map node detection method according to claim 2, wherein: the specific operation of the abnormal graph node detection model for training graph node behavior characterization in the step S1 is as follows:

s1-1, taking an attribute list X of all nodes in a graph structure and an adjacent matrix A formed among the nodes as training data; inputting training data to a node behavior expression enhancement module; randomly extracting normal graph nodes in the attribute list X to obtain graph nodes with a percent, wherein the selection probability of each graph node is subject to uniform distribution; randomly selecting all the attributes of each selected graph node to obtain b% of attributes, wherein the selection probability of each attribute is subject to uniform distribution; counting the average value of each selected attribute of all input normal unlabeled graph nodes, adding random disturbance obeying the normal distribution of the statistical average value with average value mu and standard deviation sigma to the value corresponding to the attribute of the selected graph node, and creating an indication vector with the dimension identical to the attribute dimension of the graph node to obtain a graph node attribute list X' with enhanced behavior expression and an indication vector list V corresponding to the graph node attribute list; wherein, a is 20 by default, b is 20 by default, μ is 0 by default, and σ is 0.1 by default;

precoding module for obtaining characteristic information and position informationOutput spliced precoding result H ₀ The method comprises the steps of carrying out a first treatment on the surface of the Wherein Concat (-) represents the vector splice operator;

s1-5, according to the formula:

res＝H ₁ ×W+b

4. The enhancement map node behavior characterization and anomaly map node detection method according to claim 3, wherein: the specific operation of the step S1-3 is as follows:

s1-3-2, according to the formula:

H”＝f _cd (H'；W _fcd )

r＝Concat(||R|| ₁ ,||R|| ₂ )

obtaining a reconstructed graph node attribute listThe reconstruction error vector R and a characteristic code R obtained by splicing a first norm value and a second norm value of the reconstruction error vector; wherein H 'represents the characteristic vector H of the graph node in the hidden space or the characteristic vector of the last full-connection layer, H' represents the characteristic vector of the full-connection layer, W _fcd Decoder parameters representing full connection layer, f _cd (. Cndot.) represents the full-connection layer function of the decoder, H _{_out} Representing the feature vector of the last fully connected layer, I R I ₁ A norm value representing the reconstructed error vector R, I R I ₂ Representing the two normals of the reconstructed error vector R.

5. The enhancement map node behavior characterization and anomaly map node detection method according to claim 3, wherein: the specific operation of the step S1-6 is as follows:

loss _c (res,y；t)＝(1-y)|res|+ymax(0,t-res)