CN116881916B - Malicious user detection method and device based on heterogeneous graph neural network - Google Patents

Malicious user detection method and device based on heterogeneous graph neural network Download PDF

Info

Publication number
CN116881916B
CN116881916B CN202311146509.2A CN202311146509A CN116881916B CN 116881916 B CN116881916 B CN 116881916B CN 202311146509 A CN202311146509 A CN 202311146509A CN 116881916 B CN116881916 B CN 116881916B
Authority
CN
China
Prior art keywords
user
node
representing
heterogeneous
meta
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311146509.2A
Other languages
Chinese (zh)
Other versions
CN116881916A (en
Inventor
姜宇泽
张媛媛
庞妺
朱广红
倪俊峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Academy of Information and Communications Technology CAICT
Original Assignee
China Academy of Information and Communications Technology CAICT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Academy of Information and Communications Technology CAICT filed Critical China Academy of Information and Communications Technology CAICT
Priority to CN202311146509.2A priority Critical patent/CN116881916B/en
Publication of CN116881916A publication Critical patent/CN116881916A/en
Application granted granted Critical
Publication of CN116881916B publication Critical patent/CN116881916B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Virology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a malicious user detection method and device based on a heterogeneous graph neural network, wherein the method comprises the following steps: selecting a user node to be tested; judging whether the user node to be tested is a malicious user node or not by utilizing a pre-trained user-level internal threat detection model based on a heterogeneous graph neural network; the user-level internal threat detection model comprises a relationship enhancement layer, a heterogeneous user embedding layer and a fusion layer. According to the malicious user detection method and device based on the heterogeneous graph neural network, the malicious user is detected by constructing the user-level internal threat detection model, and the detection accuracy of the malicious user is improved.

Description

Malicious user detection method and device based on heterogeneous graph neural network
Technical Field
The invention relates to the technical field of data security, in particular to a malicious user detection method and device based on a heterogeneous graph neural network.
Background
In existing internal threat detection methods, in addition to taking user behavior as the object of detection, some researchers take the source of the internal threat, i.e., the internal user, as the object of detection.
The user-level internal threat detection does not care whether a specific action of a user is malicious or not, but comprehensively analyzes the user by combining long-term behavior characteristics, social information, job information, psychological information and the like of the user, so that the malicious user is identified. Compared with the behavior-level internal threat detection method, the user-level internal threat detection method generally requires only a small amount of computing resources and marking data to achieve higher accuracy and lower false alarm rate.
However, the existing user-level internal threat detection method often only considers the information of the individual users, and ignores the relationship information between the users. In fact, such relationship information may provide support and assistance to some extent for internal threat detection, e.g., users of the same department and same role will typically have similar behavioral characteristics. Therefore, there is an urgent need for a method capable of detecting malicious users using relationship information between users.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a malicious user detection method and device based on a heterogeneous graph neural network.
The invention provides a malicious user detection method based on a heterogeneous graph neural network, which comprises the following steps:
selecting a user node to be tested;
judging whether the user node to be tested is a malicious user node or not by utilizing a pre-trained user-level internal threat detection model based on a heterogeneous graph neural network;
the user-level internal threat detection model comprises a relationship enhancement layer, a heterogeneous user embedding layer and a fusion layer; the relation enhancement layer is used for constructing a user heterogeneous graph by utilizing personal information, historical behavior data of all users in a training set and direct relations among the users, the heterogeneous user embedding layer is used for acquiring embedded representations of user nodes in the user heterogeneous graph under different views through neighbor aggregation and meta-path aggregation respectively, and the fusion layer is used for fusing the embedded representations of each user node under different views to obtain fused embedded representations and predicting whether the user node to be tested is a malicious user node or not based on the fused embedded representations.
According to the malicious user detection method based on the heterogeneous graph neural network, which is provided by the invention, the method further comprises the following steps:
and enhancing the association relation between the user nodes in the user heterogeneous graph based on the weighted feature similarity function.
According to the malicious user detection method based on the heterogeneous graph neural network, when the embedded representation of the user node in the user heterogeneous graph is acquired through neighbor aggregation, two different attention mechanisms of a node level and a type level are used for layering aggregation of the embedded representation of the user node.
According to the malicious user detection method based on the heterogeneous graph neural network, when the embedded representation of the user node in the user heterogeneous graph is acquired through meta-path aggregation, the information for the user node and the neighbor node based on the meta-path is aggregated.
According to the malicious user detection method based on the heterogeneous graph neural network, the loss function of the user-level internal threat detection model is a cross entropy function with weight.
According to the malicious user detection method based on the heterogeneous graph neural network, the dimension of the user embedding vector in the heterogeneous user embedding layer is 256, the super parameter in the weighted feature similarity function is 0.6, and the weight ratio of the positive sample and the negative sample of the cross entropy function with weight is 12.
A malicious user detection apparatus based on a heterograph neural network, comprising:
the selecting module is used for selecting the user node to be tested;
the judging module is used for judging whether the user node to be tested is a malicious user node or not by utilizing a pre-trained user-level internal threat detection model based on the heterogeneous graph neural network;
the user-level internal threat detection model comprises a relationship enhancement layer, a heterogeneous user embedding layer and a fusion layer; the relation enhancement layer is used for constructing a user heterogeneous graph by utilizing personal information, historical behavior data of all users in a training set and direct relations among the users, the heterogeneous user embedding layer is used for acquiring embedded representations of user nodes in the user heterogeneous graph under different views through neighbor aggregation and meta-path aggregation respectively, and the fusion layer is used for fusing the embedded representations of each user node under different views to obtain fused embedded representations and predicting whether the user node to be tested is a malicious user node or not based on the fused embedded representations.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps of the malicious user detection method based on the heterogeneous graph neural network when executing the program.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of any of the heterogeneous graphical neural network-based malicious user detection methods described above.
The present invention also provides a computer program product comprising computer programs/instructions which when executed by a processor implement the steps of a method of malicious user detection based on a heterograph neural network as described in any one of the above.
The malicious user detection method based on the heterogeneous graph neural network, provided by the invention, realizes the detection of the malicious user by constructing a user-level internal threat detection model, and the specific construction process of the model is as follows: the relationship enhancement layer uses a weighted feature similarity function to balance the original connection relationship and feature similarity between users, so that the relationship between users is enhanced and potential user association is mined. On the basis, the heterogeneous user embedding layer acquires embedded representations of the user under different views through neighbor aggregation and meta-path aggregation respectively so as to more comprehensively capture structural information and semantic information in the heterogeneous graph of the user. Finally, the fusion layer fuses the relevant and complementary embedded representations of each user under the two views and predicts whether the embedded representations are malicious users.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow diagram of a malicious user detection method based on a heterogeneous graph neural network provided by the invention;
FIG. 2 is a schematic diagram of a user-level internal threat detection model based on a heterogeneous graph neural network provided by the invention;
FIG. 3 is a graphical illustration of a user's heterogeneous graph before and after reinforcement based on a weighted feature similarity function; wherein (a) is a user heterogeneous diagram constructed by a direct relationship; (b) The user heterogeneous graph is enhanced by using a weighted feature similarity function;
FIG. 4 is a schematic diagram of the Accuracy (Accuracy) and F1 Score (Score) at different dimensions (d) provided by the present invention;
FIG. 5 is a graph showing the Accuracy (Accuracy) and F1 Score (Score) at different hyper parameters (hyperparameters) provided by the present invention;
FIG. 6 is a graph showing the Accuracy (Accuracy) and F1 Score (Score) at different Weight (Weight) ratios provided by the present invention;
FIG. 7 is a schematic diagram of the architecture of a user-level internal threat detection model provided by the present invention;
fig. 8 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In user-level internal threat detection, the detected object is no longer a behavior or sequence of behaviors, but rather the source of the internal threat, i.e., the user itself. Therefore, the user-level internal threat detection task in the present invention is: given u= { U 1 , u 2 , . . . , u N And the model can predict whether the user U epsilon U is a malicious user according to historical behavior data, position information and the like of the user.
Fig. 1 is a flow chart of a malicious user detection method based on a heterogeneous graph neural network, provided by the invention, as shown in fig. 1, the method includes:
step S110, selecting a user node to be tested;
step S120, judging whether the user node to be tested is a malicious user node or not by utilizing a pre-trained user-level internal threat detection model based on a heterogeneous graph neural network;
the user-level internal threat detection model comprises a relationship enhancement layer, a heterogeneous user embedding layer and a fusion layer; the relation enhancement layer is used for constructing a user heterogeneous graph by utilizing personal information, historical behavior data of all users in a training set and direct relations among the users, the heterogeneous user embedding layer is used for acquiring embedded representations of user nodes in the user heterogeneous graph under different views through neighbor aggregation and meta-path aggregation respectively, and the fusion layer is used for fusing the embedded representations of each user node under different views to obtain fused embedded representations and predicting whether the user node to be tested is a malicious user node or not based on the fused embedded representations.
It should be noted that, the heterogeneous graph (heterogeneous graph, also referred to as heterogeneous graph) represents a graph data structure having a plurality of node types or a plurality of edge types.
Fig. 2 is a schematic structural diagram of a user-level internal threat detection model based on a heterogeneous graph neural network, and as shown in fig. 2, the model is composed of three parts: a relationship enhancement Layer (Relationship Enhancement Layer), a heterogeneous user embedding Layer (Heterogeneous User Embedding Layer), and a Fusion Layer (Fusion Layer). Firstly, the relationship enhancement layer builds a user heterogeneous graph by utilizing information of all users, historical behavior data and direct relationships among the users, and enhances connection relationships among the users through a weighted feature similarity function so as to mine potential user association. And then, the heterogeneous user embedding layer acquires the embedded representation of the user node under different views through neighbor aggregation and meta-path aggregation respectively. And finally, the fusion layer fuses the embedded representations of each user node under the two views and predicts whether the embedded representations are malicious users or not.
The malicious user detection method based on the heterogeneous graph neural network, provided by the invention, realizes the detection of the malicious user by constructing a user-level internal threat detection model, and the specific construction process of the model is as follows: by constructing a heterogeneous graph based on user information, user historical behavior data and direct relationships among users, direct relationships among users are constructed based on the heterogeneous graph. On the basis, the heterogeneous user embedding layer acquires embedded representations of the user under different views through neighbor aggregation and meta-path aggregation respectively so as to more comprehensively capture structural information and semantic information in the heterogeneous graph of the user. Finally, the fusion layer fuses the relevant and complementary embedded representations of each user under the two views and predicts whether the embedded representations are malicious users.
According to the malicious user detection method based on the heterogeneous graph neural network, the method further comprises the following steps:
and enhancing the association relation between the user nodes in the user heterogeneous graph based on the weighted feature similarity function.
It should be noted that, based on different relationships between users, a user heterogram g= (V, E, T, R, X) may be constructed, where V represents a set of user nodes, E represents a set of user relationships, T represents a set of user types, R represents a set of user relationship types, and X represents a set of initial feature vectors of the user nodes. In fact, since the set of relationships E will typically only contain obvious and direct relationships between users, such as colleague relationships and job-level relationships, the user heterogram G will tend to be sparse and there will be some isolated user nodes in the graph, i.e. this simple topology is not sufficient to describe complex relationships between users.
In this embodiment, a weighted feature similarity function is used, and the connection relationship between user nodes in G is enhanced by balancing the original connection relationship and feature similarity, so as to mine potential user association. The formula for quantifying the relationship between node i e V and node j e V is as follows:
E(i, j) = ω·cos(x i , x j ) + (1- ω) · C ij
wherein ω ε (0, 1) is a hyper-parameter for balancing the original connection relationship and feature similarity between nodes, cos (x) i , x j ) For calculating an initial feature vector x i E X and initial feature vector X j Similarity between E and X, C ij Is the original connection of node i and node j.
Fig. 3 is a comparison schematic diagram of user heterogeneous graphs before and after strengthening based on a weighted feature similarity function, and as shown in fig. 3 (a) and (b), shows the difference between the user heterogeneous graphs constructed by a direct relationship and the user heterogeneous graphs strengthened by the weighted feature similarity function, wherein only a small number of user nodes exist in (a), and a plurality of isolated users also exist. After the weighted feature similarity function is enhanced, users which have no direct relationship originally can be found in the step (b), and the users can be related through similar behavior features, so that relationship information in the user heterogeneous graph is enriched.
According to the malicious user detection method based on the heterogeneous graph neural network, the user heterogeneous graph constructed through the direct relationship only comprises a small number of connections and a plurality of isolated nodes exist, and the relationship enhancement layer balances the original connection relationship and the feature similarity between users by using the weighted feature similarity function, so that the relationship between users is enhanced, and potential user association is mined.
According to the malicious user detection method based on the heterogeneous graph neural network, when the embedded representation of the user node in the user heterogeneous graph is acquired through neighbor aggregation, two different attention mechanisms of a node level and a type level are used for layering aggregation of the embedded representation of the user node.
It should be noted that a given node i e V will be associated with S different types of neighbor nodes { Φ } 1 , Φ 2 , ..., Φ S Connected to each other to form node i as∈{Φ 1 , Φ 2 , ..., Φ S The neighbor of } is denoted N i Φs . For the embedded representation of the node i, different types of neighbor nodes can contribute to the node i differently, and meanwhile, the same type of neighbor nodes can also contribute to the node i differently. Thus, in neighbor aggregation, the model uses two different attention mechanisms, node level and type level, to hierarchically aggregate the embedded representation of node i.
Specifically, the model first fuses node i types to node i types through a node-level attention mechanismThe specific formula of the characteristics of the neighbor nodes is as follows:
wherein the method comprises the steps ofIs a LeakyReLU activation function, +.>Is the initial eigenvector of node j, +.>Is of the type +.>The attention score of the neighbor node j to the node i is calculated by the following formula:
wherein the method comprises the steps ofIs of the type->Is a node level attention vector of->Representing a stitching operation.
For S different types of neighbor nodes of node iThe S type-specific embedded representations of node i can be obtained by the above calculation procedure: />. The model then fuses them using a type-level attention mechanism to obtain an embedded representation of node i after neighbor aggregation +.>The specific formula is as follows:
wherein the method comprises the steps ofRepresentation type->The contribution degree to the node i is calculated by the following formula:
where V is the set of all nodes in the user heterogram,representing a weight matrix, +.>Representing the bias vector +_>A type level of attention vector is represented.
According to the malicious user detection method based on the heterogeneous graph neural network, when the nodes conduct neighbor aggregation processing, embedded representation of the nodes i is layered and aggregated through two different attention mechanisms of the node level and the type level, and the fact that information in the embedded representation of the nodes i obtained through neighbor aggregation is rich and accurate is guaranteed.
Due to the nodes obtained by the neighbor aggregationiThe embedded representation of (a) contains only the information of the first-order neighbors, and the model introduces meta-path aggregation in order to enable the finally learned embedded representation to contain higher-order structural information and semantic information. Meta-path aggregation refers to guiding selection and aggregation of neighbor nodes by utilizing a pre-designed meta-path, so that the aggregation is more targeted and flexible. Where a meta-path is a path pattern defined on a graph that describes a particular relationship between nodes.
Specifically, given a nodeiStarting fromMStrip pathNode is connected withiBased on meta-pathsThe neighbors of (a) are marked->. Each meta-path represents different semantic information, while neighboring nodes on the same meta-path tend to have similar information. Thus, in meta-path aggregation, the model first rolls up the nodes using a meta-path specific graph roll-up neural networkiWeighting and summing are carried out on neighbor nodes based on the element path so as to obtain nodesiEmbedded representation under this meta-path:
wherein the method comprises the steps ofdiAnddjrepresenting nodes respectivelyiSum nodejIn the degree of which in the figures,and->Respectively nodesiSum nodejIs used to determine the initial feature vector of (1).
UsingMStrip pathA node can be obtainediA kind of electronic deviceMIndividual meta-path specific embedded representation:. The model then fuses them using semantic level attention mechanisms to obtain nodesiEmbedded representation after meta-path aggregation>
WhereinβPmRepresenting meta-pathsPmOpposite nodeiThe contribution degree of (2) is calculated by the following formula:
wherein the method comprises the steps ofRepresenting a weight matrix, +.>Representing the bias vector +_>Attention vectors representing semantic levels.
It should be noted that after the neighbor aggregation and meta-path aggregation in the heterogeneous user embedding layer, two embedded representations of the node i can be obtainedAnd->The two embedded representations are related and complementary: in the neighbor aggregation, the node i only aggregates the information of the first-order neighbors and ignores the information of the node i. In the meta-path aggregation, the node i aggregates own information and neighbor node information based on the meta-path, and ignores information of intermediate nodes of the meta-path. Therefore, in the embodiment of the invention, the embedded representation obtained by neighbor aggregation and the embedded representation obtained by meta-path aggregation are fused to finally obtain the fused embedded representation. In a specific embodiment, the embedded representation obtained by neighbor aggregation and the embedded representation obtained by meta-path aggregation are projected to the same feature space first, and then aggregated, as follows:
wherein the method comprises the steps ofAnd->Is->And->The projections in the same feature space are projected,、/>representing a weight matrix, +.>Representing the bias vector +_>Representing a splicing operation->Is a LeakyReLU activation function.
According to the method for detecting malicious users based on heterogeneous graph neural network provided by the invention, the method for predicting whether the user node to be detected is a malicious user node based on the fusion embedded representation specifically comprises the following steps:
and inputting the fusion embedded representation to a full connection layer, normalizing by a softmax function, obtaining an output two-dimensional vector, and judging whether the user node to be tested is a malicious user node or not based on the two-dimensional vector.
According to the fusion embedded representation of the node i, the model utilizes a full connection layer and a softmax function to conduct two classifications on the node i so as to predict whether a user corresponding to the node i is a malicious user or not.
According to the malicious user detection method based on the heterogeneous graph neural network, the embedded representation obtained by neighbor aggregation and the embedded representation obtained by meta-path aggregation are fused, the fused embedded representation is finally obtained, the fused embedded representation is used as an input value of a detection model, the richness and the integrity of user information in the fused embedded representation are effectively ensured, and the efficiency and the accuracy of detecting malicious users by the detection model are improved based on the fusion embedded representation.
According to the malicious user detection method based on the heterogeneous graph neural network, in the method, the loss function of the user-level internal threat detection model is a cross entropy function with weight.
It should be noted that, in terms of the loss function, considering that unbalance in the number of positive and negative samples may adversely affect the experimental result, the model uses a weighted cross entropy function as the loss function:
where N is the number of training samples, y i Is the true label of the ith sample, y i =1 means that the sample is a positive sample (positive sample is a malicious user), y i =0 means that the sample is a negative sample (negative sample is normal user), ω 1 Representing the weight, ω, of a positive sample 0 The weight of the negative sample is represented,refers to the loss between the model predictive value and the true value of the ith training sample; />Refers to the model predictive value of the ith training sample.
According to the malicious user detection method based on the heterogeneous graph neural network, provided by the invention, the cross entropy function with weight is used as the loss function of the user-level internal threat detection model, so that the unbalance of the number of positive and negative samples in the model training process is effectively overcome, and the accuracy of an experimental result is finally ensured.
According to the malicious user detection method based on the heterogeneous graph neural network, in the method, the dimension of the user embedding vector in the heterogeneous user embedding layer is 256, the super parameter in the weighted feature similarity function is 0.6, and the weight ratio of the positive sample and the negative sample of the cross entropy function with weight is 12.
It should be noted that, accuracy, that is, accuracy, represents the correct proportion of model prediction in all prediction samples, and the calculation formula is:
f1 The score is the harmonic mean of the Precision (Precision) and Recall (Recall) and is calculated as:
the Precision (Precision) represents the proportion of the true class in all samples predicted to be positive class, and the calculation formula is as follows:
recall (Recall) represents the proportion of the model prediction to the positive class in all samples with the true class as the positive class, and the calculation formula is as follows:
in the above formula, TP, true positive, indicates that a positive result is correctly detected, i.e. the detection result is correct, and the result appears positive; FP, false positive, indicates that a positive result is erroneously detected, i.e., the detection result is erroneous, and the result appears positive; TN is true negative, which means that a negative result is correctly detected, namely the detection result is correct, and the result is negative; FN, false positive, indicates that a negative result is erroneously detected, i.e., the detection result is erroneous, and the result appears negative.
In this embodiment, a heterogeneous user embedding layer is used to obtain an embedded representation of the user, and the setting of the dimension d of the embedded representation is considered. In order to obtain the dimension d with the best model performance, experiments were performed on the different dimensions d of the embedded vector. Fig. 4 is a schematic diagram of Accuracy and F1 scores under different dimensions, as shown in fig. 4, where models with different embedded vector dimensions d perform on two evaluation indexes, namely the Accuracy and the F1 scores. From fig. 4, it can be known that the model can obtain the best effect when d=256.
The super parameter omega in the weighted feature similarity function is used for balancing the original connection relation and feature similarity between nodes in the user heterogeneous graph, and the value range is (0, 1). In order to explore the effect of different omega values on the model effect, a correlation experiment was performed. FIG. 5 is a schematic diagram of Accuracy and F1 scores under different super parameters provided by the invention, and as shown in FIG. 5, performance of models with different omega values on two evaluation indexes of Accuracy and F1 scores is shown. As can be seen from fig. 5, the model performs poorly when the ω value is too small or too large. This is because when ω is too small, the topology of the user heterogeneous map does not change significantly; when omega is too large, more noise edges are introduced, and the original topological structure is ignored; the model can achieve the best effect when ω=0.6.
Considering that the original data has serious problem of unbalance of positive and negative samples, a weighted cross entropy loss function is used, wherein omega 1 Is the weight, ω, of the positive sample 0 Is the weight of the negative sample. To explore the weight omega 1 And omega 0 Impact on model effect, ω 0 Set to a constant 1 and for different ω 1 Experiments were performed. FIG. 6 shows the present invention at different weight ratiosAccuracy and F1 score diagrams, as shown in FIG. 6, demonstrate different weights ω 1 The performance of the model of (c) on both evaluation indexes, accuracy and F1 fraction. It can be found that, for positive samples with a small number of samples, matching weights too small and too large is not beneficial to the recognition of the model, so that the final embodiment matches the weight ω to the positive samples 1 12.
According to the malicious user detection method based on the heterogeneous graph neural network, the dimension of the user embedding vector in the heterogeneous user embedding layer is set to be 256, the super parameter in the weighted feature similarity function is set to be 0.6, the weight ratio of the positive sample and the negative sample of the cross entropy function with the weight is set to be 12, so that the optimal performance of the finally obtained user-level internal threat detection model is realized, and the detection efficiency and the detection accuracy of the malicious user detection are ensured based on the optimal performance.
Fig. 7 is a schematic structural diagram of a user-level internal threat detection model provided by the present invention, and as shown in fig. 7, the malicious user detection apparatus 700 based on a heterogeneous graph neural network includes:
a selecting module 710, configured to select a user node to be tested;
a determining module 720, configured to determine whether the user node to be tested is a malicious user node by using a pre-trained user-level internal threat detection model based on a heterogeneous graph neural network;
the user-level internal threat detection model comprises a relationship enhancement layer, a heterogeneous user embedding layer and a fusion layer; the relation enhancement layer is used for constructing a user heterogeneous graph by utilizing personal information, historical behavior data of all users in a training set and direct relations among the users, the heterogeneous user embedding layer is used for acquiring embedded representations of user nodes in the user heterogeneous graph under different views through neighbor aggregation and meta-path aggregation respectively, and the fusion layer is used for fusing the embedded representations of each user node under different views to obtain fused embedded representations and predicting whether the user node to be tested is a malicious user node or not based on the fused embedded representations.
The malicious user detection device based on the heterogeneous graph neural network provided by the invention realizes the detection of a malicious user by constructing a user-level internal threat detection model, and the specific construction process of the model is as follows: by constructing a heterogeneous graph based on user information, user historical behavior data and direct relationships among users, direct relationships among users are constructed based on the heterogeneous graph. On the basis, the heterogeneous user embedding layer acquires embedded representations of the user under different views through neighbor aggregation and meta-path aggregation respectively so as to more comprehensively capture structural information and semantic information in the heterogeneous graph of the user. Finally, the fusion layer fuses the relevant and complementary embedded representations of each user under the two views and predicts whether the embedded representations are malicious users.
Fig. 8 illustrates a physical structure diagram of an electronic device, as shown in fig. 8, which may include: processor 810, communication interface (Communications Interface) 820, memory 830, and communication bus 840, wherein processor 810, communication interface 820, memory 830 accomplish communication with each other through communication bus 840. The processor 810 may invoke logic instructions in the memory 830 to perform a heterogeneous graphical neural network-based malicious user detection method, the method comprising:
selecting a user node to be tested;
judging whether the user node to be tested is a malicious user node or not by utilizing a pre-trained user-level internal threat detection model based on a heterogeneous graph neural network;
the user-level internal threat detection model comprises a relationship enhancement layer, a heterogeneous user embedding layer and a fusion layer; the relation enhancement layer is used for constructing a user heterogeneous graph by utilizing personal information, historical behavior data of all users in a training set and direct relations among the users, the heterogeneous user embedding layer is used for acquiring embedded representations of user nodes in the user heterogeneous graph under different views through neighbor aggregation and meta-path aggregation respectively, and the fusion layer is used for fusing the embedded representations of each user node under different views to obtain fused embedded representations and predicting whether the user node to be tested is a malicious user node or not based on the fused embedded representations.
Further, the logic instructions in the memory 830 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, are capable of performing the method for detecting a malicious user based on a heterogeneous graph neural network provided by the above methods, the method comprising:
selecting a user node to be tested;
judging whether the user node to be tested is a malicious user node or not by utilizing a pre-trained user-level internal threat detection model based on a heterogeneous graph neural network;
the user-level internal threat detection model comprises a relationship enhancement layer, a heterogeneous user embedding layer and a fusion layer; the relation enhancement layer is used for constructing a user heterogeneous graph by utilizing personal information, historical behavior data of all users in a training set and direct relations among the users, the heterogeneous user embedding layer is used for acquiring embedded representations of user nodes in the user heterogeneous graph under different views through neighbor aggregation and meta-path aggregation respectively, and the fusion layer is used for fusing the embedded representations of each user node under different views to obtain fused embedded representations and predicting whether the user node to be tested is a malicious user node or not based on the fused embedded representations.
In still another aspect, the present invention further provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the above-provided heterogeneous graph neural network-based malicious user detection method, the method comprising:
selecting a user node to be tested;
judging whether the user node to be tested is a malicious user node or not by utilizing a pre-trained user-level internal threat detection model based on a heterogeneous graph neural network;
the user-level internal threat detection model comprises a relationship enhancement layer, a heterogeneous user embedding layer and a fusion layer; the relation enhancement layer is used for constructing a user heterogeneous graph by utilizing personal information, historical behavior data of all users in a training set and direct relations among the users, the heterogeneous user embedding layer is used for acquiring embedded representations of user nodes in the user heterogeneous graph under different views through neighbor aggregation and meta-path aggregation respectively, and the fusion layer is used for fusing the embedded representations of each user node under different views to obtain fused embedded representations and predicting whether the user node to be tested is a malicious user node or not based on the fused embedded representations.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (9)

1. A malicious user detection method based on a heterogeneous graph neural network is characterized by comprising the following steps:
selecting a user node to be tested;
judging whether the user node to be tested is a malicious user node or not by utilizing a pre-trained user-level internal threat detection model based on a heterogeneous graph neural network;
the user-level internal threat detection model comprises a relationship enhancement layer, a heterogeneous user embedding layer and a fusion layer; the relation enhancement layer is used for constructing a user heterogeneous graph by utilizing personal information, historical behavior data of all users in a training set and direct relations among the users, the heterogeneous user embedding layer is used for acquiring embedded representations of user nodes in the user heterogeneous graph under different views through neighbor aggregation and meta-path aggregation respectively, and the fusion layer is used for fusing the embedded representations of each user node under different views to obtain fused embedded representations and predicting whether the user node to be tested is a malicious user node or not based on the fused embedded representations;
the neighbor aggregation and meta-path aggregation and the fusion process are as follows:
given node i e V, it will be associated with S different types of neighbor nodes { Φ } 1 , Φ 2 , ..., Φ S Connected to each other to form node i as∈{Φ 1 , Φ 2 , ..., Φ S The neighbor of } is denoted N i Φs
Fusing node i types into node i types by a node level attention mechanismThe specific formula of the characteristics of the neighbor nodes is as follows:
wherein the method comprises the steps ofIs a LeakyReLU activation function, +.>Is the initial eigenvector of node j, +.>Is of the type +.>The attention score of the neighbor node j to the node i is calculated by the following formula:
wherein the method comprises the steps ofIs of the type->Is a node level attention vector of->Representing a splicing operation;
embedded representation of node i after neighbor aggregationThe calculation formula is as follows:
wherein the method comprises the steps ofRepresentation type->The contribution degree to the node i is calculated by the following formula:
where V is the set of all nodes in the user heterogram,representing a weight matrix, +.>Representing the bias vector +_>An attention vector representing a class of types;
given a nodeiStarting fromMStrip pathNode is connected withi Based on meta-pathsThe neighbors of (a) are marked->
Nodei In-cell pathThe following embedding is expressed as:
wherein the method comprises the steps ofdi Anddj representing nodes respectivelyiSum nodejIn the degree of which in the figures,and->Respectively nodesiSum nodejIs used for the initial feature vector of (a); usingMStrip path->A node can be obtainediA kind of electronic deviceMIndividual meta-path specific embedded representation: />NodeiEmbedded representation after meta-path aggregation>
WhereinβPm Representing meta-pathsPm Opposite nodei The contribution degree of (2) is calculated by the following formula:
wherein the method comprises the steps ofRepresenting a weight matrix, +.>Representing the bias vector +_>Attention vectors representing semantic levels;
the embedded representation obtained by neighbor aggregation and the embedded representation obtained by meta-path aggregation are projected to the same feature space and then aggregated, as follows:
wherein the method comprises the steps ofAnd->Is->And->The projections in the same feature space are projected,、/>representing a weight matrix, +.>Representing the bias vector +_>Representing a splicing operation->Is a LeakyReLU activation function.
2. The heterogeneous graph neural network-based malicious user detection method of claim 1, wherein the relationship enhancement layer further comprises:
and enhancing the association relation between the user nodes in the user heterogeneous graph based on the weighted feature similarity function.
3. The heterogeneous graph neural network-based malicious user detection method of claim 2, wherein when obtaining the embedded representation of the user node in the user heterogeneous graph through neighbor aggregation, the embedded representation of the user node is hierarchically aggregated using two different attention mechanisms, namely a node level and a type level.
4. A heterogeneous graph neural network based malicious user detection method according to claim 3, wherein information for itself and meta-path based neighbor nodes is aggregated when the embedded representation of the user node in the user heterogeneous graph is obtained by meta-path aggregation.
5. The heterogeneous graphical neural network-based malicious user detection method of claim 4, wherein the loss function of the user-level internal threat detection model is a weighted cross entropy function.
6. The heterogeneous neural network-based malicious user detection method of claim 5, wherein the dimension of the user embedding vector in the heterogeneous user embedding layer is 256, the super parameter in the weighted feature similarity function is 0.6, and the weight ratio of the positive sample and the negative sample of the cross entropy function with weight is 12.
7. A malicious user detection device based on a heterograph neural network, comprising:
the selecting module is used for selecting the user node to be tested;
the judging module is used for judging whether the user node to be tested is a malicious user node or not by utilizing a pre-trained user-level internal threat detection model based on the heterogeneous graph neural network;
the user-level internal threat detection model comprises a relationship enhancement layer, a heterogeneous user embedding layer and a fusion layer; the relation enhancement layer is used for constructing a user heterogeneous graph by utilizing personal information, historical behavior data of all users in a training set and direct relations among the users, the heterogeneous user embedding layer is used for acquiring embedded representations of user nodes in the user heterogeneous graph under different views through neighbor aggregation and meta-path aggregation respectively, and the fusion layer is used for fusing the embedded representations of each user node under different views to obtain fused embedded representations and predicting whether the user node to be tested is a malicious user node or not based on the fused embedded representations;
the neighbor aggregation and meta-path aggregation and the fusion process are as follows:
given node i e V, it will be associated with S different types of neighbor nodes { Φ } 1 , Φ 2 , ..., Φ S Connected to each other to form node i as∈{Φ 1 , Φ 2 , ..., Φ S The neighbor of } is denoted N i Φs
Fusing node i types into node i types by a node level attention mechanismThe specific formula of the characteristics of the neighbor nodes is as follows:
wherein the method comprises the steps ofIs a LeakyReLU activation function, +.>Is the initial eigenvector of node j, +.>Is of the type +.>The attention score of the neighbor node j to the node i is calculated by the following formula:
wherein the method comprises the steps ofIs of the type->Is a node level attention vector of->Representing a splicing operation;
embedded representation of node i after neighbor aggregationThe calculation formula is as follows:
wherein the method comprises the steps ofRepresentation type->The contribution degree to the node i is calculated by the following formula:
where V is the set of all nodes in the user heterogram,representing a weight matrix, +.>Representing the bias vector +_>An attention vector representing a class of types;
given a nodeiStarting fromMStrip pathNode is connected withi Based on meta-pathsThe neighbors of (a) are marked->
Nodei In-cell pathThe following embedding is expressed as:
wherein the method comprises the steps ofdi Anddj representing nodes respectivelyiSum nodejIn the degree of which in the figures,and->Respectively nodesiSum nodejIs used for the initial feature vector of (a); usingMStrip path->A node can be obtainediA kind of electronic deviceMIndividual meta-path specific embedded representation: />NodeiEmbedded representation after meta-path aggregation>
WhereinβPm Representing meta-pathsPm Opposite nodei The contribution degree of (2) is calculated by the following formula:
wherein the method comprises the steps ofRepresenting a weight matrix, +.>Representing the bias vector +_>Attention vectors representing semantic levels;
the embedded representation obtained by neighbor aggregation and the embedded representation obtained by meta-path aggregation are projected to the same feature space and then aggregated, as follows:
wherein the method comprises the steps ofAnd->Is->And->The projections in the same feature space are projected,、/>representing a weight matrix, +.>Representing the bias vector +_>Representing a splicing operation->Is a LeakyReLU activation function.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the heterogeneous graphical neural network-based malicious user detection method of any of claims 1-6 when the program is executed.
9. A non-transitory computer readable storage medium, having stored thereon a computer program which, when executed by a processor, implements the steps of a heterogeneous graphical neural network based malicious user detection method as claimed in any of claims 1-6.
CN202311146509.2A 2023-09-07 2023-09-07 Malicious user detection method and device based on heterogeneous graph neural network Active CN116881916B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311146509.2A CN116881916B (en) 2023-09-07 2023-09-07 Malicious user detection method and device based on heterogeneous graph neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311146509.2A CN116881916B (en) 2023-09-07 2023-09-07 Malicious user detection method and device based on heterogeneous graph neural network

Publications (2)

Publication Number Publication Date
CN116881916A CN116881916A (en) 2023-10-13
CN116881916B true CN116881916B (en) 2023-12-12

Family

ID=88266677

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311146509.2A Active CN116881916B (en) 2023-09-07 2023-09-07 Malicious user detection method and device based on heterogeneous graph neural network

Country Status (1)

Country Link
CN (1) CN116881916B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117709967A (en) * 2023-12-19 2024-03-15 深圳前海微众银行股份有限公司 Backwash money detection method and backwash money detection system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110232630A (en) * 2019-05-29 2019-09-13 腾讯科技(深圳)有限公司 The recognition methods of malice account, device and storage medium
CN115238773A (en) * 2022-07-04 2022-10-25 中国人民解放军战略支援部队信息工程大学 Malicious account detection method and device for heterogeneous primitive path automatic evaluation
CN115883213A (en) * 2022-12-01 2023-03-31 南京南瑞信息通信科技有限公司 APT detection method and system based on continuous time dynamic heterogeneous graph neural network
CN116611005A (en) * 2022-02-07 2023-08-18 华为技术有限公司 Data processing method and device based on heterogeneous graph neural network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110232630A (en) * 2019-05-29 2019-09-13 腾讯科技(深圳)有限公司 The recognition methods of malice account, device and storage medium
CN116611005A (en) * 2022-02-07 2023-08-18 华为技术有限公司 Data processing method and device based on heterogeneous graph neural network
CN115238773A (en) * 2022-07-04 2022-10-25 中国人民解放军战略支援部队信息工程大学 Malicious account detection method and device for heterogeneous primitive path automatic evaluation
CN115883213A (en) * 2022-12-01 2023-03-31 南京南瑞信息通信科技有限公司 APT detection method and system based on continuous time dynamic heterogeneous graph neural network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Attributed Heterogeneous Graph Neural Network for Malicious Domain Detection;Shuai Zhang et al.;Proceedings of the 2021 IEEE 24th International Conference on Computer Supported Cooperative Work in Design;全文 *
Heterogeneous Graph Attention Network for Malicious Domain Detection;Zhiping Li et al.;ICANN 2022;第4节 *
Heterogeneous Graph Neural Networks for Malicious Account Detection;Ziqi Liu et al.;CIKM’18;全文 *

Also Published As

Publication number Publication date
CN116881916A (en) 2023-10-13

Similar Documents

Publication Publication Date Title
Reddy et al. Deep neural network based anomaly detection in Internet of Things network traffic tracking for the applications of future smart cities
Liang et al. Anomaly-based web attack detection: a deep learning approach
Tsang et al. Detecting statistical interactions from neural network weights
Stadler et al. Graph posterior network: Bayesian predictive uncertainty for node classification
CN109120462B (en) Method and device for predicting opportunistic network link and readable storage medium
Song et al. Deepmem: Learning graph neural network models for fast and robust memory forensic analysis
Cao et al. Efficient repair of polluted machine learning systems via causal unlearning
WO2022127299A1 (en) Method and system for constructing neural network architecture search framework, device, and medium
KR102153992B1 (en) Method and apparatus for detecting cyber threats using deep neural network
CN116881916B (en) Malicious user detection method and device based on heterogeneous graph neural network
Zhou et al. A priori trust inference with context-aware stereotypical deep learning
CN115373374B (en) Industrial control abnormity detection method and system based on graph nerve and gated circulation network
Yu et al. GNPassGAN: improved generative adversarial networks for trawling offline password guessing
Riyahi et al. Multiobjective whale optimization algorithm‐based feature selection for intelligent systems
CN115114484A (en) Abnormal event detection method and device, computer equipment and storage medium
Nandwani et al. A solver-free framework for scalable learning in neural ilp architectures
Ghosh et al. A cloud intrusion detection system using novel PRFCM clustering and KNN based dempster-shafer rule
Yang et al. A comparative study of ML-ELM and DNN for intrusion detection
Sun et al. Few-Shot network intrusion detection based on prototypical capsule network with attention mechanism
CN112613032B (en) Host intrusion detection method and device based on system call sequence
Kumar et al. Community-enhanced Link Prediction in Dynamic Networks
CN113343123A (en) Training method and detection method for generating confrontation multiple relation graph network
Chu et al. Securing federated sensitive topic classification against poisoning attacks
Sheng et al. Network traffic anomaly detection method based on chaotic neural network
Şahin The role of vulnerable software metrics on software maintainability prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant