CN116208399A

CN116208399A - Network malicious behavior detection method and device based on metagraph

Info

Publication number: CN116208399A
Application number: CN202310125438.1A
Authority: CN
Inventors: 郭庆浪; 李阳阳; 梁生霖; 廖勇
Original assignee: China Academy of Electronic and Information Technology of CETC
Current assignee: China Academy of Electronic and Information Technology of CETC
Priority date: 2023-02-17
Filing date: 2023-02-17
Publication date: 2023-06-02

Abstract

The application discloses a network malicious behavior detection method and device based on a metagraph, comprising the following steps: acquiring required user data; inputting the user data into a trained network malicious behavior detection model to output a detection result through the network malicious behavior detection model, wherein the network malicious behavior detection model comprises: the social network diagram constructing unit, the metagraph convolution recursion network model, the metalearning regression graph neural network model and the metagraph reinforcement learning framework jointly output detection results based on the metagraph convolution recursion network model, the metalearning regression graph neural network model and the metagraph reinforcement learning framework. According to the embodiment of the application, the metagraph convolution recursive network is combined with the metalearning regression graph neural network model, so that the detection of the water army behavior is realized jointly, and the accuracy and the efficiency of the detection of the water army text behavior are improved.

Description

Network malicious behavior detection method and device based on metagraph

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for detecting network malicious behavior based on a metagraph.

Background

Basic machine learning-based malicious behavior detection generally consists of a feature engineering model and an anomaly detection model. The feature engineering model learns different types of behavior data features through deep neural networks of different architectures, and can map various different types of behavior data into a unified feature space through a multi-mode method; the anomaly detection model aims at capturing anomaly values in a large amount of data by training through an unsupervised or weakly supervised method on the basis of the behavior characteristics of the prior representation capability. The anomaly detection model can be divided into clustering-based and classification-based models, and can be combined with feature learning to learn the bottom manifold of a normal sample.

In consideration of the Internet and social network environments, the automatic network malicious behavior detection needs to consider not only behavior characteristics but also network structures, and the basis is that the network topology structure, the network interaction object entity and the network behavior are definitely defined and the modeling aiming at the downstream task targets is carried out. Network scenes can be generally abstracted by defining graphs, heterogeneous graphs or multi-relation graphs, specific attribute characteristics are given to nodes and edges in the graphs, and the graph neural network is used for optimization and prediction. Early graph modeling methods relied either on the natural topology of the road network (i.e., binary adjacency graphs) or on predefined graphs in some metrics (e.g., euclidean distances). GSL is intended to co-learn the graph structure and its corresponding node (or graph) representation. In short, GSL methods fall into the following three categories: (1) A metric-based method of deriving edge weights from node embedding based on metric functions; (2) Embedding a neural-based approach to learning graph structures from nodes using a neural network; (3) Embedding adjacency matrices or nodes into a dictionary is considered a straightforward method for learning parameters. GSL has been applied to time diagram data. Some work adaptively learns adjacency matrices from data using a learnable node embedded dictionary. However, these GSL methods are limited to short sequences of less than 1k time steps and cannot capture dynamically evolving graph structures.

Disclosure of Invention

The embodiment of the application provides a network malicious behavior detection method and device based on a metagraph, which combines a metagraph convolution recursive network with a metalearning regression graph neural network model to jointly realize the detection of water army behaviors and improve the accuracy and efficiency of the detection of water army text behaviors.

The embodiment of the application provides a network malicious behavior detection method based on a metagraph, which comprises the following steps:

acquiring required user data;

inputting the user data into a trained network malicious behavior detection model to output a detection result through the network malicious behavior detection model, wherein the network malicious behavior detection model comprises: the social network diagram constructing unit, the metagraph convolution recursion network model, the metalearning regression graph neural network model and the metagraph reinforcement learning framework;

the social network diagram construction unit is used for constructing a social network diagram based on the user data, and inputting the constructed social network diagram into a metagraph convolution recursive network model, a metalearning regression graph neural network model and a metagraph reinforcement learning framework respectively so as to jointly output a detection result based on the metagraph convolution recursive network model, the metalearning regression graph neural network model and the metagraph reinforcement learning framework;

Wherein the metagraph convolution recursive network model comprises an encoder, a decoder and metagraph Xi Qi, the encoder and the decoder are constructed by taking a graph convolution recursive unit as a basic unit, and the metagraph learner is used for inputting feedback information to the decoder according to a constructed metanode library and by utilizing a super network for the output of the encoder;

the meta-learning regression graph neural network model is based on a regression GNN network, and the regression GNN network comprises two graph roll layers and a complete connection layer;

the metagraph reinforcement learning framework is a Deep Reinforcement Learning (DRL) based framework and comprises a graph pool module, a candidate node prediction module and an identification module, wherein the graph pool module serves as an environment pool, the candidate node prediction module is used for providing a group of candidate nodes based on an AIM searching algorithm, and the identification module is used for processing a DRL agent.

Optionally, the acquiring of the required user data further comprises performing word sense disambiguation by adopting the following method:

construction of monolingual co-occurrence graph G _s ＝<V _s ，E _s >；

Collecting co-occurrence nouns or adjective pairs (cw) in the user data _i ，cw _j ) And adding any co-occurrence noun or adjective as a node to the monolingual co-occurrence graph, each co-occurrence word pair being associated with an edge (v _i ，v _j )∈E _s Connection, either side is based on the corresponding word cw _i And cw _j The strength of association between is assigned a weight w (v _i ，v _j ) The method meets the following conditions:

w(v _i ，v _j )＝1-max[p(cw _i |cw _j )，p(cw _j |cw _i )]

wherein p () represents a conditional probability, represented by cw _i And cw _j The number of simultaneous contexts divided by the number of simultaneous contexts comprising cw _j The number of contexts of (a) is estimated;

configuring a group of target languages, and combining the single language co-occurrence graph G _s Expansion into multilingual graph G _ML ＝<V _ML ，E _ML >Wherein V is _ML ＝V _s ∪∪ _l∈L V _l Is indicative of the source (V) _s ) Or object (V) _l ) A set of nodes, E, of content words of a language _ML ＝E _s ∪∪ _l∈L {E _l ∪E _s，l -a set of edges;

calculating a root hub in the multilingual graph to distinguish the meaning of a target word in a source language, and meeting the following conditions:

where d is the damping factor, deg (v _i ) Is node v _i W is the number of adjacent nodes of (a) _ij Is node v _i And v _j Weights of co-occurrence edges between the two, minimum Spanning Tree (MST) of the graph takes target words as word roots, G _ML The root center of (2) constitutes its firstA stage;

given the context W of the target word W in the source language, MST is used to find the most relevant word in W to disambiguate.

Alternatively, finding the most relevant word in w using MST to disambiguate includes:

by computing the correct central disHub to find the relevant nodes, only the nodes linked to disHub are reserved, defining W _h Is a set of mapped content words, satisfying:

Where d (cw) is a function, weights are assigned to cw according to the distance from w, dist (cw, h) is given by the number of edges between cw and h in MST;

summing the translation counts to order each translation;

the similarity measure between two translations is defined as follows:

wherein the method comprises the steps of

Is to return a group of languages->

A function of a lexical set of medium words c; />

Translation accuracy is verified by maximizing similarity metrics to accomplish cross-vocabulary resource alignment.

Optionally, the graph convolution recursion unit of the metagraph convolution recursion network model uses a definition of a graph convolution operation and Gating Recursion Unit (GRU) to represent GCRU as a basic unit, so as to satisfy:

wherein X is E R ^N×C N represents the number of spatial units), C represents the number of information channels, H εR ^N Representing the diagram convolution operation ∈ _G Input and output of Θ, W _K ∈R ^K×C For approximating the kernel parameters to the order K with a Chebyshev polynomial, σ denotes the activation function, while, as well as, the product of the corresponding position elements, the parameters u, r and C denote the update gate, reset gate and candidate state in the GCRU unit, Θ _{u，r，C} Representing a door parameter;

the metagraph convolution recursive network model adopts a standard formula to learn a space-time diagram, and meets the following conditions:

wherein the derivation is by normalizing the random walk of the non-negative part of the matrix product of the trainable node embedded E

To its transpose;

the metagraph Xi Qi is used for constructing a metanode library

And inputting feedback information to the decoder using a super network, wherein +.>

And d represents the number of memory items and the dimension of each item, respectively; the intermediate variables updated by the metagraph learner are stored in a specific metanode library phi, and the memory operational relationship in the updating process is defined as follows:

wherein the superscript (i) serves as a row index,

represents H _t I-th node vector of +.>

Indicating that the fully connected layer will hide the state->

Projecting into a localized query->

Scalar a _j Physically representing a vector->

And memory item Φj]Similarity between the meta-node vectors +.>

Representing a combination that can be restored further to the storage item;

the super network is used for embedding the GSL nodes, and the generation is formulated as follows under the condition of a meta node library:

wherein NN _H Representing a super network;

the metagraph learner employs a metagraph convolution recursion network (MegaCRN) as a generic framework.

Optionally, the meta-learning regression graph neural network model is implemented based on a regression GNN model, and the regression GNN model includes two graph roll layersAnd a full connection layer, each task t= { L (g) ₁ ，s ₁ ...，g _H ，s _H )，q(g ₁ )，q(g _t+1 |g _t ，s _t ) H is determined by the loss function L, the distribution q (g ₁ ) Transition distribution q (g _t+1 |g _t ，s _t ) And length H;

for regression, the loss function satisfies:

wherein g ^(j) 、s ^(j) Respectively represent slave tasks T _i Input and output of samples;

the updated Θ of the regressive GNN model satisfies:

where γ represents the step size hyper-parameter and η represents the meta-step size.

Optionally, the candidate node prediction module of the metagraph reinforcement learning framework provides a set of candidate nodes based on an AIM search algorithm, the AIM search algorithm being completed using a finite markov decision process.

Optionally, the agent of the metagraph reinforcement learning framework is trained by double Q learning, wherein one estimator determines the best possible action for the next state, and the other estimator provides the Q value for the selected action, satisfying the following update:

wherein Q is ^T Representing the target network, Q ^L Representing a local network, a ^t Representing the action performed at time t, X ^t Representing a state matrix executed at the time t, theta being a tuning parameter, r and gamma being parameters, G _i A sampling pattern corresponding to the i-th set.

The embodiment of the application also provides a computer device, which comprises: a processor and a memory, the memory having stored thereon a computer program which, when executed by the processor, implements the steps of the metagraph-based network malicious behavior detection method as described above.

The embodiments of the present application also provide a computer readable storage medium having a computer program stored thereon, which when executed by a processor implements the steps of the network malicious behavior detection method based on the metagraph as described above.

According to the embodiment of the application, the metagraph convolution recursive network is combined with the metalearning regression graph neural network model, so that the detection of the water army behavior is realized jointly, and the accuracy and the efficiency of the detection of the water army text behavior are improved.

The foregoing description is only an overview of the technical solutions of the present application, and may be implemented according to the content of the specification in order to make the technical means of the present application more clearly understood, and in order to make the above-mentioned and other objects, features and advantages of the present application more clearly understood, the following detailed description of the present application will be given.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

FIG. 1 is an example of an overall framework for network malicious behavior detection in accordance with embodiments of the present application;

FIG. 2 is an example of a structure of a metagraph convolution recursive network model according to an embodiment of the present application;

FIG. 3 is an example of a structure of a neural network model of a meta-learning regression graph according to an embodiment of the present application;

fig. 4 is a diagram illustrating an example of a metagraph reinforcement learning framework structure according to an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

1. Graphic Structure Learning (GSL)

Early approaches either relied on the natural topology of the road network (i.e., binary adjacency graphs) or on predefined graphs in some metrics (e.g., euclidean distances). GSL is intended to co-learn the graph structure and its corresponding node (or graph) representation. In short, GSL methods fall into the following three categories: (1) A metric-based method of deriving edge weights from node embedding based on metric functions; (2) Embedding a neural-based approach to learning graph structures from nodes using a neural network; (3) Embedding adjacency matrices or nodes into a dictionary is considered a straightforward method for learning parameters. GSL has been applied to time diagram data. Some work adaptively learns adjacency matrices from data using a learnable node embedded dictionary. However, these GSL methods are limited to short sequences of less than 1k time steps and cannot capture dynamically evolving graph structures.

2. Graphic Neural Network (GNN)

The graph neural network is a deep learning technology with a graph convolution layer, and is superior to the existing method in a large number of computer vision applications. Recently, they have received great attention due to their unique ability to efficiently model the correlation between samples. They provide an effective solution to integrate various information.

GNN is an optimizable transformation of all attributes (nodes, edges, global context) of a graph that maintains graph symmetry (permutation invariance). Symmetry information means that the whole result is unchanged after another ordering of the vertices. The input of GNN is a graph and the output is a graph, which transforms the attributes (point, edge, global information) of your graph, but does not change the connectivity of the graph, i.e. which edge connects which vertex, and this information does not change.

3. Metagraph convolution recursive network

It can simply rely on observed data to remain robust and adapt to normal to non-stationary conditions. The primitive convolution recursion network mainly comprises a graph convolution recursion unit and a primitive learner. Wherein, the meta learner learning includes two steps: (1) querying a node-level prototype from a metanode library; (2) dynamically rebuilding node embeddings using a super network. This local memory capability enables the modular metagraph Xi Qi to substantially distinguish between different spatiotemporal patterns over time, which may even be generalized to accident situations.

4. Neural network for regression graph of meta-learning

The neural network of the meta-learning regression graph properly comprises the graph structure of the social network text behaviors and effectively models the correlation between the social network text behaviors, and on the other hand, the regression GNN model is more flexible due to learning, and meanwhile, the influence of some problems is reduced. The network exhibits unique capabilities in modeling the correlation between data and combines global and local topological properties to predict behavioral scores. The prediction performance is improved by reducing the effect of sample heterogeneity. The network exhibits a good balance between flexibility and performance and can be used in other application areas that suffer from high intra-class variability problems.

5. Meta learning

Meta-Learning (Meta-Learning) is generally understood as "Learning-to-Learning" and refers to the process of improving a Learning algorithm in multiple Learning phases. During the base learning process, an internal (or underlying/base) learning algorithm solves the tasks defined by the dataset and the targets. During meta-learning, the external (or upper layer/meta) algorithm updates the internal learning algorithm so that its learned model improves the external target. Therefore, the core idea of meta learning is to learn a priori knowledge (priority).

The meaning of meta learning is two layers, the first layer is that a machine learns to learn so that the machine has the capability of analyzing and solving problems, and the machine obtains experience by completing tasks, so that the capability of completing the tasks is improved; the second layer is to make the machine learning model better generalizable to new fields, thus completing new tasks that are very different.

and acquiring the required user data. And acquiring corresponding user data through each large social platform, so as to form a required data set, and judging and detecting the water army behavior of the social network through subsequent operation. For example, the water army behavior may be marked during the training process to construct a data set to complete the training.

Inputting the user data into a trained network malicious behavior detection model to output a detection result through the network malicious behavior detection model, wherein the network malicious behavior detection model comprises, as shown in fig. 1: the social network diagram constructing unit, the metagraph convolution recursion network model, the metalearning regression graph neural network model and the metagraph reinforcement learning framework.

The social network diagram constructing unit is configured to construct a social network diagram based on the user data, for example, defined as g= (V, E), and input the constructed social network diagram into a metagraph convolution recursive network model, a metalearning regression graph neural network model, and a metagraph reinforcement learning framework, respectively, so as to obtain a detection result based on data collaborative processing of the metagraph convolution recursive network model, the metalearning regression graph neural network model, and the metagraph reinforcement learning framework.

The metagraph convolution recursive network model comprises an encoder, a decoder and metagraph Xi Qi, wherein the encoder and the decoder are constructed by taking a graph convolution recursive unit as a basic unit, and the metagraph learner is used for inputting feedback information to the decoder according to a constructed metanode library and by utilizing a super network for the output of the encoder.

The meta-learning regression graph neural network model is based on a regression GNN network, and the regression GNN network comprises two graph roll layers and a complete connection layer.

The data collaborative processing mechanism utilizes a metagraph convolution recursive network model to learn graph structures and characteristics, the metalearning regression graph neural network model to score the network, and the metagraph reinforcement learning framework to perform reinforcement abnormal (water army) node identification on the basis of the graph structures and the characteristics.

Because the social network environment is complex, a plurality of languages cause some translation errors, and sometimes result judgment errors can be caused by word sense divergence, the word sense is disambiguated before the water army behavior is detected, so that the detection accuracy is improved. In some embodiments, word sense disambiguation is performed in the following manner:

definition C is all contexts of target word w in source language s, and single language co-occurrence graph G is built _s ＝<V _s ，E _s >。

Then each co-occurrence noun or adjective pair (cw) in the user data is collected _i ，cw _j ) And adding any co-occurrence noun or adjective as a node to the monolingual co-occurrence graph, each co-occurrence word pair being associated with an edge (v _i ，v _j )∈E _s Connection, either side is based on the corresponding word cw _i And cw _j The strength of association between is assigned a weight w (v _i ，v _j ) The method meets the following conditions:

w(v _i ，v _j )＝1-max[p(cw _i |cw _j )，p(cw _j |cw _i )]

where p () represents a conditional probability given a word cw _j Word cw _i Conditional probability of (2), by cw _i And cw _j The number of simultaneous contexts divided by the number of simultaneous contexts comprising cw _i The number of contexts of (a) is estimated;

Configuring a group of target languages L, and combining the single language co-occurrence graph G _s Expansion into multilingual graph G _ML ＝<V _ML ，E _ML >Wherein V is _ML ＝V _s ∪∪ _l∈L V _l Is indicative of the source (V) _s ) Or object (V) _l ) A set of nodes, E, of content words of a language _ML ＝E _s ∪∪ _l∈L {E _l ∪E _s，l -a set of edges;

where d is the damping factor, deg (v _i ) Is node v _i W is the number of adjacent nodes of (a) _ij Is node v _i And v _j Weights of co-occurrence edges between the two, minimum Spanning Tree (MST) of the graph takes target words as word roots, G _ML The root center of (a) constitutes its first level. By using a multi-lingual graph, an MST can be obtained that contains translation nodes and edges.

In some embodiments, using the MST to find the most relevant word in w to disambiguate includes:

by computing the correct central disHub (i.e., meaning) to find the relevant nodes, only the nodes linked to disHub are retained, defining W _h Is a set of mapped content words, satisfying:

where d (cw) is a function, weights are assigned to cw based on distance from w, and dist (cw, h) is given by the number of edges between cw and h in the MST.

Summing the translation counts to order each translation;

The embodiment of the application provides a cross-vocabulary resource alignment method based on translation, which verifies the accuracy of translation by maximizing similarity measurement so as to finish the cross-vocabulary resource alignment.

By maximizing the lexical intersection between translations, mapping a given source word to a target word, a similarity measure between two translations is defined as follows:

wherein the method comprises the steps of

Is to return a group of languages->

A function of the lexical set of the medium word c.

In some embodiments, as shown in FIG. 2, the metaconvolution recursive network model references a metaconvolution recursive network by which multiple social accounts are identified. The method mainly comprises a graph convolution recursion unit and a meta learner, wherein the graph convolution recursion unit is motivated by a Graph Convolution Network (GCN) serving as graph structured data (such as a social network) to represent learning success, and the possibility of injecting graph convolution operation into the recursion unit (such as LSTM) is realized. Thus, the derived Graph Convolution Recursion Unit (GCRU) may capture both spatial and temporal dependencies represented by the input graph topology in a sequential manner. In this embodiment, without losing generality, the definition of the graph convolution operation and the Gating Recursion Unit (GRU) is used to represent GCRU as a basic unit, which satisfies the following conditions:

Wherein X is E R ^N×C N represents the number of spatial units), C represents the number of information channels, H εR ^N Representing the diagram convolution operation ∈ _G Input and output of Θ, W _K ∈R ^K×C For approximating the kernel parameters to the order K with a Chebyshev polynomial, σ denotes the activation function, while, as well as, the product of the corresponding position elements, the parameters u, r and C denote the update gate, reset gate and candidate state in the GCRU unit, Θ _{u，r，C} Representing the door parameters.

For graphic structure learning, matrix

Typically defined based on certain metrics and empirical laws. However, the choice of metrics may be arbitrary and suboptimal, which motivates a series of studies to integrate Graphic Structure Learning (GSL) into the modeling to achieve simultaneous optimization. In this embodiment, the metagraph convolution recursive network model performs space-time graph learning by using a canonical formula, that is, an adaptive graph, and satisfies:

To its transpose.

Another GSL strategy, a transient map may be used with the input signal X _t Or hidden state H _t Defined in a similar manner. In a hidden state H _t The following are examples:

wherein the parameter matrix W essentially will be H _t Projected into another embedded space.

For metagraph Xi Qi, a new space-time diagram (STG) learning module is proposed in this embodiment. The term "metagraph" is created to represent the generation of node embeddings for graph structure learning, unlike the definition in Heterogeneous Information Networks (HIN). According to the definition in the two equations above, the adaptive graph relies on parameterized node embedding E only, while the instantaneous graph is actually input conditional (projection of X with parameter W _t Or H _t ). Obviously, this generation determines the nature of the derived graph, since the former is time-invariant, but the latter is sensitive to the input signal. This facilitates node embedding that further enhances STG generation, as real world networks are more complex, exhibiting spatio-temporal heterogeneity and non-stationarity.

It is intended to memorize typical features in the sample seen for further pattern matching. It is desirable to inject memory and discrimination capability into space-time diagram learning. The embodiment of the application utilizes the idea of a memory network, and the metagraph Xi Qi is used for constructing a metanode library according to the constructed metanode library

And d represents the number of memory items and the dimension of each item, respectively, the intermediate variables updated by the metagraph learner are stored in a specific memory group phi, and the memory operational relationship during the updating process is defined as follows: / >

Wherein the superscript (i) serves as a row index,

represents H _t Is the i-th node vector in (a). The first expression indicates full connectivity (FC parameter W _Q ∈R ^d ) The layer will hide the state->

Projecting into a localized query->

The second formula represents by adding +.>

And each memory phi [ j ]]Matching to calculate scalar a _j Memory read operation of the scalar a _j Physically representing a vector->

And memory item Φj]Similarity between them. Meta-node vector->

May be further restored as a combination of storage items. The reconstructed representation M can be utilized _t ∈R ^N×d Hidden representation H in enhancement coding _t From H' _t ＝[H _t ，M _t ]Representation ([. Cndot.]Representing a cascading operation). The present embodiment further utilizes a super network, and the generation of the GSL node embeds is conditioned on a meta node library. The super network is used for embedding the GSL nodes, the generation of which is conditional on a meta node library, and the memory enhancement node embedding generation can be formulated as follows:

wherein NN _H Representing a super network; in the absence of generalIn the case of sexuality, an FC layer (parameter W _E ∈R ^d ). Then, a metagraph may be constructed

Instead of the adaptive and instantaneous maps, feedback is provided to the GCRU decoder. In this way, interactions between the GCRN backbone and the metagraph learner are made in a dynamic manner.

The described metagraph learner is utilized in the specific present embodiment to reference a metagraph convolution recursion network (MegaCRN) as a generic framework for modeling. It can be optimized in an end-to-end fashion. The MegaCRN learns node-level prototypes of patterns in the metanode library to adaptively update the auxiliary graph based on observed conditions. To further enhance its ability to distinguish between different scenarios on different paths, two constraints are used to sparse attention-based query mechanisms, including consistency loss

And loss of contrast->

Expressed as:

where T represents the total number of sequences (i.e., samples) in the training set, p, n represents the number of sequences (i.e., samples) obtained by querying at a given location

In the case of->

The top two memory item indexes obtained by ranking. By implementing these two constraints, will +.>

Regarded as an anchor, and its most similar prototype Φp]Considered as positive samples, the second analogue prototype Φn]As a negative sample (λ represents the margin between positive and negative pairs). The idea here is to keep the memory items as compact as possible while being as different as possible. These two constraints direct the memory Φ to distinguish between different spatiotemporal patterns at the node level. In practice, it was found that adding them to the target standard (i.e. MAE) helped the convergence of the training (with a balance factor kappa) ₁ 、κ ₂ )：

In some embodiments, as shown in FIG. 3, the meta-learning regression graph neural network model is implemented based on a regression GNN model, denoted as f, which maps a social network graph G to a behavioral score s. In combination with the metascroll recursive network, the results are determined together. The regressive GNN model includes two layers of graph volumes and a fully connected layer, and is trained to accommodate a large number of tasks, each task t= { L (g) ₁ ，s ₁ ...，g _H ，s _H )，q(g ₁ )，q(g _t+1 |g _t ，s _t ) H is determined by the loss function L, the distribution q (g ₁ ) Transition distribution q (g _t+1 |g _t ，s _t ) And length H;

for the regression GNN model, the loss function satisfies:

wherein g ^(j) 、s ^(j) Respectively represent slave tasks T _i Input and output of samples; is not known in the modelIn the meta-learning scenario, a distribution of tasks p (T) for which regression GNN model adaptation is desired is defined.

The purpose of Meta-RegGNN is to prepare a regressive GNN model for rapid adaptation. Thus, GNNs may learn internal features of social network posting behavior related to all tasks in p (T). To do so, the RegGNN model parameters are first found that respond to modifications in a given task, so small modifications in parameters will yield a great improvement to the loss function of any task in social network posting behavior. The model consists of a parameterized function f with parameters Θ _Θ And (3) representing. When adapting to new task T _i The latter is updated to Θ' at the time. Updated Θ is defined as:

where γ represents the step size hyper-parameter. Meta-optimization is realized on the regression GNN model parameters Θ, and the updated regression GNN parameters Θ' are used to calculate the target. In fact, meta-RegGNN aims at optimizing model parameters so that one or a small number of gradient steps on the new task produce efficient behavior.

Meta-optimization is performed through tasks to update the regression GNN model parameters Θ, and the updated Θ of the regression GNN model meets the following conditions:

In order to properly consider the graphical structure of social network water force behavior and effectively model the correlation between data samples, the regressive GNN network of this embodiment consists of two graph convolution layers and one fully connected layer. The correlation matrix for a given connector C is symmetric, may have zero or positive eigenvalues, may be simply regularized to be symmetric positive, according to:

I′＝C+μI

wherein I represents an identity matrix, mu > 0. In fact, since positive correlation proved to be more important in analyzing brain networks, all negative eigenvalues were set to zero to train the regression GNN. Thus, the regressive GNN receives the regularized positive adjacency matrix I' of the connector and predicts a corresponding behavioral score using graph convolution. This reduces the size of the connector and learns about the embedding. After the first graph convolution operation, a discard layer for regularization is added. Finally, the obtained embedding passes through a fully connected layer, which produces scalar outputs (IQ scores).

In some embodiments, as shown in FIG. 4, the metagraph reinforcement learning framework is a GNN fusion metareinforcement learning framework, which is a Deep Reinforcement Learning (DRL) based framework for learning an optimal subset of recognition activation-Aware Influence Maximization (AIM), a generic Influence Maximization (IM) formulation, that contains both types of activations. The framework has various novel aspects: (1) Meta-learning to enable predictions across different graphics types and sizes; (2) Double Q learning (a reinforcement learning algorithm) to estimate the sequence of child nodes without solving computationally intensive optimization problems at each test time; (3) A single policy multi-objective rewards formula for system balancing of multiple AIM objectives.

The candidate node prediction module of the metagraph reinforcement learning framework provides a set of candidate nodes based on an AIM search algorithm, which is completed by adopting a finite markov decision process.

Limited Markov decision process

The task of identifying the optimal set of AIM child nodes is a sequential process, adding one node at a time. More importantly, the factor that determines the choice of node at any step of the flow depends only on the last node (solution set) added to the sequence. Thus, the process follows the Markov nature and is therefore formulated as a Markov Decision Process (MDP). Furthermore, at any step of the process, the action space is limited, i.e. nodes have to be selected from a limited set of nodes. Thus, more precisely, this process may be referred to as limited MDP. Key elements defining any finite MDP in the context of Deep Reinforcement Learning (DRL) are:

The state represents the current solution set, with the nodes attached in order to form the final AIM solution set. Thus, the cardinality of states increases with the progress. Therefore, a state representation vector (X ^t )。X ^t The state of the system at any time step t should be characterized according to the selected node. Thus, X is ^t Can be expressed as:

X ^t ＝f(S ^t )

wherein S is ^t Is the partial AIM solution set at time t, and f is the transformation operator. When a node in the state is sampled from the graph, the appropriate choice of f operator will be based on the transformation of the graph neural network.

The action refers to adding a new node u to the partial subset S ^t Is a process of (2).

Rewards quantify the benefits of taking action. The AIM formula has two targets, which generate two reward functions. First prize (R) ₁ ) Belonging to a marginal gain that affects diffusion when a particular node is added to the solution set. Second prize (R) ₂ ) Corresponding to the inherent probability that a node is added to the solution set. The rewards may be written as:

R ₂ (X，a)＝p _s (a)

wherein the operator

Is the influence diffusion of solution set S in calculation graph G, p _s Represents an activation probability parameter, a represents a specific node.

Environment: it is a world of agents with which to interact, consisting of everything outside the agent. Here, the environment is a graph. These interactions continue to occur, i.e., the agent selects a node, the graphical environment responds to these actions and presents new conditions to the agent.

Strategy: is a policy or suggested action that the DRL agent should take in every state of the environment to achieve the learning objective. It is a probability distribution over feasible nodes that can be added to the partial decomposition set S ^t To take the state from X ^t Move to X ^t+1 . Thus, policy pi (a|X ^t ) Select to be in any state X ^t The node that produces the highest cumulative return.

And (3) terminating: in each set, the search starts with one random node in the candidate set, and the estimated nodes are appended to the partial set of solutions, one at a time (one for each step of each set). When deconcentrating S ^t When the cardinality of (a) reaches the search budget b, the set terminates.

Metagraph reinforcement learning (GraMeR)

GraMeR consists of three modules, the first of which acts as an environmental pool containing training patterns from different families and sizes. The second module provides a set of candidate nodes on which the AIM search algorithm will be implemented. Finally, a third module handles the DRL agent. The process of training the proxy begins with randomly selecting graph G from the pool _i And passing through the second module to generate a set of candidate nodes. Each training graph is an environment that interacts through the proxy MDP. An event starts with a random node in the candidate set of sampling graphs and continues until the budget runs out. At each step of the event, the agent will select the next node according to its current policy. Thereafter, the agent updates its current policy by training the network with the batch of sampled data from the buffer. The playback buffer stores the status, actions, and rewards (charts) of all past steps in the event and environment, allowing the network to utilize known information. Once the event meets the termination criteria, the agent will sample a new graph and iterate the training until the policy converges.

The agent in GraMeR trains through Q learning because it is a discrete finite MDP. Q learning maximizes the cumulative return on actions taken by agents in interacting with the environment. Future rewards depend on the action currently taken. The optimal value of the action (i.e., Q value) corresponds to the optimal strategy for maximizing Q value. The Q value may be iteratively updated according to the bellman equation:

q learning using the above equation is often overestimated in practice using a single estimator that determines the best action with the highest Q value in the next state and the Q value of that best action. To avoid overestimation, the present embodiments reference a double Q learning, using two different estimations. One estimator determines the best possible action for the next state, while the other (target network Q ^T ) Providing the Q value of the selected action. In some embodiments, the agent of the metagraph reinforcement learning framework is trained by dual Q learning, where one estimator determines the best possible action for the next state and the other estimator provides the Q value of the selected action, Q ^L (X ^t ，a ^t ) The modified update equation of (2) is:

Q ^L (X ^t ，a ^t )+θ×[r ^t +γQ ^T (X ^t+1 ，a ^* )-Q ^L (X ^t ，a ^t )]

θ is the tuning parameter, r and γ are parameters. Local network (Q) ^L ) Each step is trained by sampling a batch of data from the playback buffer. Predicted Q value (i.e., Q ^L (X ^t ，a ^t ) And the expected Q value (r) from the Belman equation ^t +γQ ^T (X ^t+1 ，a ^* ) Mean square error loss to update local Q ^L Parameters of the network. However, target network Q ^T Not every step is explicitly trainedAfter a certain number of episodes, the whole Q is utilized ^L The weights of the network are updated continuously.

Meta-learning attributes are introduced into GraMeR to rapidly and efficiently solve unknown tasks. Typically, the meta reinforcement learning method includes two optimizer loops. The external optimizer samples the new environment in each iteration and adjusts the parameters that determine the proxy behavior. In the internal loop, the agent interacts with the environment and optimizes to obtain maximum return. As with most environments, it is not feasible to obtain a representation vector for the entire environment. Thus, cross-environment learning is captured by an external optimizer. However, in the case of AIM, the environment is a graph, which can be represented very accurately by a single graph embedded vector. Thus, the external optimizer is skipped and the entire context (graph) information (graph embedded vector) is provided to the training algorithm along with the states and actions. This enables Q-learning to capture changes in the environment through a single optimizer. The exact update of the proxy then satisfies the following update pattern:

Q ^L (X ^t ，a ^t ，G _i )＝Q ^L (X ^t ，at，G _i )+θ×[r ^t +γQ ^T (X ^t+1 ，a ^* ，G _i )-Q ^L (X ^t ，a ^t ，G _i )]

Wherein Q is ^T Representing the target network, Q ^L Representing a local network, a ^t Representing the action performed at time t, X ^t Representing a state matrix executed at the time t, theta being a tuning parameter, r and gamma being parameters, G _i A sampling pattern corresponding to the i-th set. The charts will vary from episode to episode, but only one chart will be explored for a particular episode.

GNN coding: similar to the graph, vectors representing states (partial solution sets) and actions (nodes) are also needed. In this regard, a graphSAGE is referenced to estimate node embeddings. Thereafter, state S ^t Nodes in (a)The embedded vectors are aggregated by means of a mean/maximum operation to obtain a single representative vector X of the entire state ^t . Since an action corresponds to a single node, it is represented by a corresponding node embedded vector. graphSAGE is more efficient than traditional GCN because the candidates for a particular node as part of the AIM solution set depend primarily on its subgraph. Therefore, graphSAGE is a learning method based on induced subgraphs and is a suitable choice. In addition, unlike GCN, graphSAGE does not require complete graph information, GCN requires complete adjacency matrix, and thus does not scale well with the size of the graph.

Multi-objective modeling: together with the status and actions, the bonus function needs to be explicitly designed for AIM, as it contains multiple objectives. The first objective is related to maximizing impact diffusion, while the other objective corresponds to maximizing the inherent probability of child nodes. Thus, graMeR belongs to the category of multi-target DRLs.

With the single strategy approach, an optimal strategy is learned by combining two objectives with known preference weights. Precisely, at each step of the event, rewards are calculated separately for each goal and accumulated in a buffer together with status and actions. The Q values of the two targets are then combined using a linear scalar technique to generate a single Q value for the selection action.

Therefore, the detection result of the water army is cooperatively determined by using the three models provided by the implementation, and specifically, the following method can be adopted:

after a large number of social network graph models are obtained, the node level feature learning is completed by the metagraph convolution recursive network model, the node sequence feature representation is captured through a coder and decoder based on a recursive graph neural network, and the node representation and the social network graph topological structure are synchronously optimized and updated by combining Graph Structure Learning (GSL).

The neural network model of the meta-learning regression graph completes graph-level regression tasks and designs a meta-learning mechanism to strengthen adaptability, task-driven scoring is carried out on different social network graphs, so that behavior scores are generated for each social network graph, and in the example, the higher the degree of abnormality of the social network behavior is, the higher the behavior scores are, and the behavior scores are used as sample importance weights for training the whole model.

The metagraph reinforcement learning framework is used for reinforcing node classification, and abnormal nodes (namely water army nodes) are selected from the social network graph in the graph pool by utilizing the optimized graph structure, the node characteristics and the behavior scores of the graph and combining a metagraph reinforcement learning mechanism.

The graphic structure of the embodiment of the application is learned. In addition to sequence modeling, correlations between variables (road links in traffic data) are captured through a generic graph structure. The learner graph here is a time (input) variable that handles abrupt changes in the upcoming data well, and the graph here is subdivided, with each node embedded being carefully tailored to its prototype (meta-node).

The metagraph convolution recursion network of the embodiment mainly comprises a graph convolution recursion unit and a metalearner. The metaconvolution recursive network can simply rely on observed data to remain robust and adapt to normal to non-stationary conditions. A metagraph learner driven by a metanode library is inserted into the graph convolution recursion unit encoder-decoder. Wherein, the meta learner learning includes two steps: (1) querying a node-level prototype from a metanode library; (2) dynamically rebuilding node embeddings using a super network. This local memory capability enables the modular metagraph Xi Qi to substantially distinguish between different spatiotemporal patterns over time, which may even be generalized to accident situations. The metagraph learner explicitly unwraps spatial and temporal heterogeneity.

The embodiment of the application discloses a neural network model of a meta-learning regression graph. On the one hand, the graph structure of social network texting behaviors is properly included, and the correlation between the social network texting behaviors is effectively modeled, and on the other hand, the regression GNN model is more flexible due to learning, and meanwhile, the influence of some problems is reduced. The network exhibits unique capabilities in modeling the correlation between data and combines global and local topological properties to predict behavioral scores. In addition, it ensures flexibility of the model and supports inductive learning, thereby enhancing versatility of the model to unknown data.

The primitive reinforcement learning framework of the embodiment of the application. The GNN is fused with a meta reinforcement learning framework and is used for identifying the nodes with influence in the network. First, the search space of the IM is reduced by the GNN-based candidate node predictor. Deep Q learning is then used to identify IM child nodes with GNNs as environmental encoders.

According to the embodiment of the application, data are obtained through each large social platform, word sense disambiguation technology is utilized to disambiguate word senses of multiple languages, a translation-based cross-vocabulary resource alignment method is used, translation accuracy is verified by maximizing similarity measurement, and a metagraph convolution recursive network is combined with a metalearning regression graph neural network model and a metagraph reinforcement learning framework to jointly detect water force behaviors.

It should be noted that, in the embodiments of the present disclosure, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), including several instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method described in the embodiments of the present application.

The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those of ordinary skill in the art without departing from the spirit of the present application and the scope of the protection of the claims, which fall within the protection of the present application.

Claims

1. The network malicious behavior detection method based on the metagraph is characterized by comprising the following steps of:

acquiring required user data;

2. The metagraph-based network malicious behavior detection method of claim 1, further comprising word sense disambiguation after obtaining the desired user data by:

construction of monolingual co-occurrence graph G _s ＝<V _s ,E _s >；

Collecting co-occurrence nouns or adjective pairs (cw) in the user data _i ,cw _j ) And adding any co-occurrence noun or adjective as a node to the monolingual co-occurrence graph, each co-occurrence word pair being associated with an edge (v _i ,v _j )∈E _s Connection, either side is based on the corresponding word cw _i And cw _j The strength of association between is assigned a weight w (v _i ,v _j ) The method meets the following conditions:

w(v _i ，v _j )＝1-max[p(cw _i |cw _j )，p(cw _j |cw _i )]

wherein p (|·|·) represents conditional probability, represented by cw _i And cw _j The number of simultaneous contexts divided by the number of simultaneous contexts comprising cw _j In the context of (a) Number to estimate;

configuring a group of target languages, and combining the single language co-occurrence graph G _s Expansion into multilingual graph G _ML ＝＜V _ML ，E _ML >Wherein V is _ML ＝V _s ∪∪ _l∈L V _l Is indicative of the source (V) _s ) Or object (V) _l ) A set of nodes, E, of content words of a language _ML ＝E _s ∪∪ _l∈L {E _l ∪E _s，l -a set of edges;

where d is the damping factor, deg (v _i ) Is node v _i W is the number of adjacent nodes of (a) _ij Is node v _i And v _j Weights of co-occurrence edges between the two, minimum Spanning Tree (MST) of the graph takes target words as word roots, G _ML The root center of (2) constitutes its first level;

3. The metagraph-based network malicious behavior detection method of claim 2, wherein using the MST to find the most relevant word in w to disambiguate comprises:

summing the translation counts to order each translation;

The similarity measure between two translations is defined as follows:

wherein the method comprises the steps of

Is to return a group of languages->

A function of a lexical set of medium words c;

4. The network malicious behavior detection method according to claim 1, wherein the graph convolution recursion unit of the graph convolution recursion network model adopts a definition of a graph convolution operation and Gating Recursion Unit (GRU) to represent GCRU as a basic unit, satisfying:

wherein X is E R ^N×C N represents the number of spatial units), C represents the number of information channels, H εR ^N Representing the diagram convolution operation ∈ _G Input and output of Θ, W _K ∈R ^K×C In order to approximate the kernel parameters to the order K with the Chebyshev polynomial, σ represents the activation function, while, as well as the product of the corresponding position elements, the parameters u, r and C represent the update gate, reset in the GCRU unitGate and candidate state, Θ _{u，r，C} Representing a door parameter;

To its transpose;

The metagraph Xi Qi is used for constructing a metanode library

And d represents the number of memory items and the dimension of each item, respectively; the intermediate variables updated by the metagraph learner are stored in a specific metanode library phi, and the memory operation relationship in the updating process is defined as follows:

/>

wherein the superscript (i) serves as a row index,

represents H _t I-th node vector of +.>

Is composed of a complete connection layer W _Q Conceal the status->

Projection into the localization query space results in scalar a _j Physically representing a vector->

And memory item Φj]Similarity between the meta-node vectors +.>

Representing a combination that can be restored further to the storage item;

wherein NN _H Representing a super network;

5. The method of claim 4, wherein the meta-graph based network malicious behavior detection is implemented based on a regressive GNN model that includes two graph volume layers and a full connection layer, each task t= { L (g ₁ ，s ₁ …，g _H ，s _H )，q(g ₁ )，q(g _t+1 |g _t ，s _t ) H is determined by the loss function L, the distribution q (g ₁ ) Transition distribution q (g _t+1 |g _t ，s _t ) And length H;

for the regression GNN model, the loss function satisfies:

the updated Θ of the regressive GNN model satisfies:

6. The metagraph-based network malicious behavior detection method of claim 5, wherein the candidate node prediction module of the metagraph reinforcement learning framework provides a set of candidate nodes based on an AIM search algorithm, the AIM search algorithm being completed using a finite markov decision process.

7. The metagraph-based network malicious behavior detection method of claim 6, wherein the agent of the metagraph reinforcement learning framework is trained through dual Q learning, wherein one estimator determines the best possible action for the next state and the other estimator provides the Q value for the selected action, satisfying the following update:

Q ^L (X ^t ，a ^t ，G _i )＝Q ^L (X ^t ，a ^t ，G _i )+θ×[r ^t +γQ ^T (X ^t+1 ，a ^* ，G _i )-Q ^L (X ^t ，a ^t ，G _i )]

8. A computer device, comprising: a processor and a memory having stored thereon a computer program which, when executed by the processor, implements the steps of a metagraph based network malicious behavior detection method according to any one of claims 1 to 7.

9. A computer readable storage medium, characterized in that it has stored thereon a computer program which, when executed by a processor, implements the steps of the metagraph-based network malicious behavior detection method according to any one of claims 1 to 7.