CN115473718A - Business data anomaly identification method and device based on behavior association mining - Google Patents

Business data anomaly identification method and device based on behavior association mining Download PDF

Info

Publication number
CN115473718A
CN115473718A CN202211084180.7A CN202211084180A CN115473718A CN 115473718 A CN115473718 A CN 115473718A CN 202211084180 A CN202211084180 A CN 202211084180A CN 115473718 A CN115473718 A CN 115473718A
Authority
CN
China
Prior art keywords
behavior
nodes
user
layer
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211084180.7A
Other languages
Chinese (zh)
Inventor
沈文
郭骞
俞庚申
李慧芹
杨睿
韩维
刘一凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Co ltd Customer Service Center
State Grid Smart Grid Research Institute Co ltd
State Grid Jiangsu Electric Power Co Ltd
Original Assignee
State Grid Co ltd Customer Service Center
State Grid Smart Grid Research Institute Co ltd
State Grid Jiangsu Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Co ltd Customer Service Center, State Grid Smart Grid Research Institute Co ltd, State Grid Jiangsu Electric Power Co Ltd filed Critical State Grid Co ltd Customer Service Center
Priority to CN202211084180.7A priority Critical patent/CN115473718A/en
Publication of CN115473718A publication Critical patent/CN115473718A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method and a device for identifying abnormal business data based on behavior association mining, which relate to the field of data security processing, and comprise the following steps: determining internet behavior information of a user from the business data, and extracting user characteristics and behavior characteristics from the internet behavior information; the user characteristics are used for representing the operation environment information of the user internet surfing operation, and the behavior characteristics are used for representing the time sequence information of the user internet surfing operation; inputting the user characteristics and the behavior characteristics into a business data recognition model to obtain abnormal behavior information output by the business data recognition model; the business data recognition model is obtained by training based on a sample behavior flow graph of a user; the sample behavior flow graph is constructed based on sample user characteristics and sample behavior characteristics of the user. The invention can rapidly make judgment and response to the abnormal behavior by performing the abnormal recognition through the service data recognition model, thereby accurately realizing the management limitation of the user behavior and the like and meeting the development requirement of the big data era.

Description

Business data anomaly identification method and device based on behavior association mining
Technology neighborhood
The invention relates to the field of data security processing, in particular to a method and a device for identifying abnormal business data based on behavior association mining.
Background
In order to ensure the security and stability of the network environment, the supervision of the network traffic data in the big data era becomes very important. Because network flow has the characteristics of large data volume, strong randomness and the like, the current user abnormal behavior detection method can not quickly decide and respond to the detection of the abnormal behavior of continuously updated service data, can not accurately realize user behavior management limitation and the like, can not meet the requirement of efficient and accurate detection of mass data, and can not meet the development requirement of a big data era.
Therefore, the abnormal user behavior in the service data can be identified more efficiently and more accurately, and the key role in network security management is played.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for identifying abnormal business data based on behavior association mining, so as to solve the problem that the detection of abnormal user behavior related to current business data cannot be quickly determined and responded.
According to a first aspect, an embodiment of the present invention provides a method for identifying abnormal business data based on behavior association mining, where the method includes:
determining the internet behavior information of a user, and extracting user characteristics and behavior characteristics from the internet behavior information; the user characteristics are used for representing the operation environment information of the user internet surfing operation, and the behavior characteristics are used for representing the time sequence information of the user internet surfing operation;
inputting the user characteristics and the behavior characteristics into a service data recognition model to obtain abnormal behavior information output by the service data recognition model; the business data recognition model is obtained by training based on a sample behavior flow graph of a user; the sample behavior flow graph is constructed based on sample user characteristics and sample behavior characteristics of the user;
the business data identification model is used for constructing a behavior flow graph based on user characteristics and behavior characteristics, determining fusion neighborhood characteristics corresponding to each node in the behavior flow graph and embedded representation characteristics corresponding to the nodes based on the fusion domain characteristics of the nodes, and performing abnormal behavior prediction on business data based on classification results determined from the embedded representation characteristics; the nodes comprise user characteristics and behavior characteristics, and the nodes are connected based on the behavior characteristics and used for representing the time sequence relation among the nodes.
With reference to the first aspect, in a first implementation manner of the first aspect, the service data identification model includes a flow graph construction layer, a homogeneous user relationship network layer, and a heterogeneous user message network layer;
the flow graph construction layer is used for constructing a behavior flow graph of the user based on the user characteristics and the behavior characteristics;
the homogeneous user relation network layer is used for performing attention operation on each node in the behavior flow graph based on the node and the neighbor node of the node, and determining the fusion neighborhood characteristics of the node; the neighbor node is a node which is connected with the current node;
the heterogeneous user message network layer is used for performing attention operation and semantic attention understanding on the fusion neighborhood characteristics of the nodes and the fusion neighborhood characteristics of the neighbor nodes, determining the embedded representation characteristics of the nodes and the distance between the nodes and the corresponding embedded representation characteristics, and determining abnormal behavior information based on the distance.
With reference to the first implementation manner of the first aspect, in a second implementation manner of the first aspect, the inputting the user characteristics and the behavior characteristics into the service data recognition model to obtain abnormal behavior information output by the service data recognition model specifically includes:
inputting the user characteristics and the behavior characteristics into a flow graph construction layer to obtain a behavior flow graph output by the flow graph construction layer;
inputting the behavior flow graph into a homogeneous user relationship network layer to obtain fusion neighborhood characteristics corresponding to the nodes output by the homogeneous user relationship network layer;
inputting the fusion neighborhood characteristics into a heterogeneous user message network layer to obtain abnormal behavior information output by the heterogeneous user message network layer; the abnormal behavior information comprises nodes and state characteristics corresponding to the nodes.
With reference to the second implementation manner of the first aspect, in a third implementation manner of the first aspect, the homogenous user relationship network layer includes a neighborhood convolution layer, a similarity determination layer, a normalization processing layer, and a first attention operation layer;
the neighborhood convolution layer is used for performing convolution operation on the node, neighbor nodes of the node and convolution weights corresponding to the node to determine the state characteristics of the node; the state features are used for representing labels of the nodes;
the similarity determination layer is used for determining similarity coefficients between the state features corresponding to the neighbor nodes and the state features corresponding to the nodes;
the normalization processing layer is used for performing normalization processing on the similarity coefficient and determining an attention coefficient between the domain nodes;
the first attention operation layer is used for carrying out weighting processing based on the state characteristics, the splicing weight and the attention coefficient corresponding to the neighbor nodes and determining fusion neighborhood characteristics corresponding to the nodes.
With reference to the second implementation manner of the first aspect, in a fourth implementation manner of the first aspect, the heterogeneous user message network layer includes a weight learning layer, a second attention operation layer, a semantic attention understanding layer, and a prediction output layer;
the second attention operation layer is used for carrying out weighting processing on the basis of fusion neighborhood characteristics and attention weights corresponding to the neighbor nodes and determining the time sequence characteristics of the nodes; the time sequence characteristics are used for representing the semantics of the nodes;
the semantic attention understanding layer is used for mapping the time sequence characteristics corresponding to the nodes, determining the semantic attention weight of the nodes under the meta-path, performing weighting processing based on the time sequence characteristics corresponding to the nodes and the semantic attention weight, and determining the embedded expression characteristics of the nodes;
and the prediction output module is used for determining the classification result of the embedded representation characteristics and outputting abnormal behavior information based on the classification result.
With reference to the third implementation manner of the first aspect, in a fifth implementation manner of the first aspect, the inputting the behavior flow graph into the homogeneous user relationship network layer to obtain a fusion neighborhood characteristic corresponding to a node output by the homogeneous user relationship network layer specifically includes:
inputting the behavior flow diagram into the neighborhood convolution layer to obtain state characteristics corresponding to nodes output by the neighborhood convolution layer;
inputting the node characteristics into a similarity determination layer to obtain a similarity coefficient between nodes output by the similarity determination layer;
inputting the similar coefficients into the normalization processing layer to obtain attention coefficients among nodes output by the normalization processing layer;
and inputting the state characteristics, the splicing weight and the attention coefficient corresponding to the neighbor nodes of the nodes into the first attention operation layer to obtain the fusion neighborhood characteristics corresponding to the nodes output by the first attention operation layer.
With reference to the fourth implementation manner of the first aspect, in the sixth implementation manner of the first aspect, the inputting the fusion neighborhood feature into the heterogeneous user message network layer to obtain abnormal behavior information output by the heterogeneous user message network layer specifically includes:
inputting the fusion neighborhood characteristics into a weight learning layer to obtain attention weights corresponding to the nodes output by the weight learning layer;
inputting the fusion neighborhood characteristics of the nodes and the attention weight into a second attention operation layer to obtain the time sequence characteristics of the nodes output by the second attention operation layer;
inputting the time sequence characteristics into a semantic attention understanding layer to obtain embedded representation characteristics corresponding to nodes output by the semantic attention understanding layer;
and inputting the embedded representation characteristics into a prediction output module to obtain abnormal behavior information output by the prediction output module.
With reference to the first aspect, in a seventh implementation manner of the first aspect, the business data identification model is obtained by training through the following steps:
determining sample state characteristics of sample nodes from a sample behavior flow graph; each sample node in the sample behavior flow graph comprises a sample user characteristic and a sample behavior characteristic, and the sample nodes are connected based on the sample behavior characteristics and used for representing the time sequence relation among the sample nodes;
and taking the sample behavior flow graph as input data used for training, taking sample state characteristics corresponding to the sample nodes as labels used for training, and training in a deep learning mode to obtain a business data identification model of abnormal behavior information for generating the internet behavior information of the user.
According to a second aspect, an embodiment of the present invention further provides a device for identifying abnormal business data based on behavior association mining, where the device includes:
the characteristic extraction module is used for determining the internet behavior information of the user and extracting user characteristics and behavior characteristics from the internet behavior information; the user characteristics are used for representing the operation environment information of the user internet surfing operation, and the behavior characteristics are used for representing the time sequence information of the user internet surfing operation;
the behavior recognition module is used for inputting the user characteristics and the behavior characteristics into the business data recognition model to obtain abnormal behavior information output by the business data recognition model; the business data recognition model is obtained by training based on a sample behavior flow diagram of a user; the sample behavior flow graph is constructed based on sample user characteristics and sample behavior characteristics of the user;
the business data identification model is used for constructing a behavior flow graph based on user characteristics and behavior characteristics, determining fusion neighborhood characteristics corresponding to each node in the behavior flow graph and embedded representation characteristics corresponding to the nodes based on the fusion domain characteristics of the nodes, and performing abnormal behavior prediction on business data based on classification results determined from the embedded representation characteristics; the nodes comprise user characteristics and behavior characteristics, and the nodes are connected based on the behavior characteristics and used for representing the time sequence relation among the nodes.
With reference to the second aspect, in a first implementation manner of the second aspect, the behavior identification module specifically includes:
the flow graph constructing unit is used for inputting the user characteristics and the behavior characteristics into the flow graph constructing layer to obtain a behavior flow graph output by the flow graph constructing layer;
the relationship identification unit is used for inputting the behavior flow graph into the homogeneous user relationship network layer to obtain fusion neighborhood characteristics corresponding to the nodes output by the homogeneous user relationship network layer;
the message identification unit is used for inputting the fusion neighborhood characteristics into the heterogeneous user message network layer to obtain abnormal behavior information output by the heterogeneous user message network layer; the abnormal behavior information comprises nodes and state characteristics corresponding to the nodes.
With reference to the first embodiment of the second aspect, in the second embodiment of the second aspect, the relationship identification unit specifically includes:
the first identification unit is used for inputting the behavior flow diagram into the neighborhood convolutional layer to obtain state characteristics corresponding to nodes output by the neighborhood convolutional layer;
the second identification unit is used for inputting the node characteristics into the similarity determination layer to obtain a similarity coefficient between nodes output by the similarity determination layer;
the third identification unit is used for inputting the similar coefficients into the normalization processing layer to obtain attention coefficients among nodes output by the normalization processing layer;
and the fourth identification unit is used for inputting the state characteristics, the splicing weight and the attention coefficient corresponding to the neighbor nodes of the nodes into the first attention operation layer to obtain the fusion neighborhood characteristics corresponding to the nodes output by the first attention operation layer.
With reference to the first embodiment of the second aspect, in a third embodiment of the second aspect, the message identification unit specifically includes:
the fifth identification unit is used for inputting the fusion neighborhood characteristics into the weight learning layer to obtain attention weights corresponding to the nodes output by the weight learning layer;
a sixth identification unit, configured to the weight learning layer, determine an attention weight of the node based on the self-attention mechanism;
the seventh identification unit is used for inputting the fusion neighborhood characteristics and the attention weight of the node into the second attention operation layer to obtain the time sequence characteristics of the node output by the second attention operation layer;
the eighth identification unit is used for inputting the time sequence characteristics into the semantic attention understanding layer to obtain embedded representation characteristics corresponding to the nodes output by the semantic attention understanding layer;
and the ninth identification unit is used for inputting the embedded representation characteristics into the prediction output module to obtain the abnormal behavior information output by the prediction output module.
According to a third aspect, an embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements, when executing the program, the steps of the method for identifying an anomaly of business data based on behavior association mining as described in any one of the above.
According to a fourth aspect, the embodiment of the present invention further provides a non-transitory computer readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the business data anomaly identification method based on behavior association mining as described in any one of the above.
The business data abnormity identification method and device based on behavior association mining provided by the invention determine the internet access behavior information of a user from the business data, then extract the user characteristics and behavior characteristics, the extracted information not only retains the operation environment information of the internet access operation of the user, but also retains the time sequence information of the internet access operation of the user, and the trained business data identification model is utilized to identify the business data and identify the abnormal behavior information therein.
Drawings
The features and advantages of the present invention will be more clearly understood by reference to the accompanying drawings, which are schematic and are not to be understood as limiting the invention in any way, and in which:
fig. 1 shows a flow diagram of a business data anomaly identification method based on behavior association mining according to the present invention;
FIG. 2 illustrates one of the resulting schematics of the business data recognition model provided by the present invention;
fig. 3 shows a specific flowchart of step S20 in the method for identifying abnormal business data based on behavior association mining according to the present invention;
FIG. 4 is a second diagram illustrating the results of the business data recognition model provided by the present invention;
fig. 5 shows a specific flowchart of step S22 in the method for identifying abnormal business data based on behavior association mining according to the present invention;
FIG. 6 is a third diagram illustrating the results of the business data recognition model provided by the present invention;
fig. 7 shows a specific flowchart of step S23 in the method for identifying abnormal business data based on behavior association mining according to the present invention;
FIG. 8 is a flow diagram illustrating a business data recognition model training process provided by the present invention;
FIG. 9 is a schematic structural diagram of a business data anomaly identification device based on behavior association mining, provided by the invention;
fig. 10 shows a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by the skilled person without creative efforts based on the embodiments of the present invention, belong to the protection scope of the present invention.
Network traffic analysis has an important role in the security maintenance of user behavior and network environment. In the prior art, the abnormal behavior of the user is identified by methods such as port scanning, message feature extraction, field matching and the like, but as the abnormal behavior of the user service is updated and changed continuously, the abnormal detection methods of port scanning, message feature acquisition and field matching have higher and higher cost, and the defects that the abnormal behavior cannot be identified quickly and accurately exist. The user service anomaly detection combined with the network flow analysis mainly extracts and analyzes normal data through the network flow and formulates a corresponding rule, and the abnormal data is rapidly identified by the formulated rule.
At present, the machine learning technology is widely applied to abnormal data processing neighborhoods and can be used as a better scheme for identifying abnormal behaviors of users. And the related machine learning model performs feature extraction on the flow data of the network layer and the transmission layer, and performs sample marking on the detected existing abnormal type data, so that the automatic evolution detection rule realizes the identification of the abnormal data of the user service. However, the currently adopted machine learning model lacks the characteristics of close relation with the user internet behavior, and does not combine the problems of special network environment requirements of user-defined user service behavior types and the like, so that the user service abnormal data identification system has high overhead, and can influence the analysis and detection of actual data, thereby influencing the actual identification effect.
The application provides a method for identifying abnormal business data based on behavior association mining, which can be used for electronic devices such as computers, mobile phones, wearable intelligent devices, tablet computers and the like, and fig. 1 is a flowchart of the method for identifying abnormal business data based on behavior association mining according to the embodiment of the application, and as shown in fig. 1, the method includes the following steps:
s10, determining the internet behavior information of the user from the business data, and extracting the user characteristics and the behavior characteristics from the internet behavior information.
The service data may be stored in the electronic device in advance, or may be acquired by the electronic device from the outside. For example, the traffic data collected by the electronic device from various traffic monitoring devices may include, but are not limited to, traffic data collected from traffic data of a network layer and a transport layer. Here, how to obtain the internet surfing behavior information of the user is not limited at all, and only the service data can be obtained.
In this embodiment, the user characteristics are used to represent operating environment information of the user internet access operation, the user characteristics include information of user equipment, an application program, a browser and a web page, the behavior characteristics are used to represent time sequence information of the user internet access operation, and the behavior characteristics include time sequences of various internet access operations of the user, that is, the behavior characteristics include time sequence information of the user characteristics, for example, a user is accustomed to clicking a certain functional icon first and then clicking another functional icon, and the like.
In this embodiment, the service data may include internet access behavior information of at least one user, so that user characteristics and behavior characteristics of users of a corresponding number are also extracted from the internet access behavior information, for example, the service data includes internet access behavior information of users a, B, and C, and then the user characteristics and behavior characteristics corresponding to the user a, the user characteristics and behavior characteristics corresponding to the user B, and the user characteristics and behavior characteristics corresponding to the user C are respectively extracted from the internet access behavior information.
S20, inputting the user characteristics and the behavior characteristics into a trained business data recognition model to obtain abnormal behavior information output by the business data recognition model, wherein in the application, the business data recognition model is obtained by training based on a sample behavior flow graph of a user, the sample behavior flow graph is constructed based on the sample user characteristics and the sample behavior characteristics of the user, the business data recognition model is used for constructing the behavior flow graph based on the user characteristics and the behavior characteristics, determining fusion neighborhood characteristics corresponding to each node in the behavior flow graph and embedded representation characteristics corresponding to the node based on the fusion domain characteristics of the node, and predicting abnormal behaviors of the business data based on classification results determined from the embedded representation characteristics. In this embodiment, each node in the behavior flow graph includes a user characteristic and a behavior characteristic, and because the behavior characteristic includes related timing information, the nodes are connected based on the behavior characteristic, and the connection line is used to represent a timing relationship between the nodes.
The Embedding representation feature is the representation of the node in the Embedding Space (Embedding Space), that is, each node in the behavior flow graph is encoded, so that the similarity of the node in the Embedding Space is approximate to the similarity of the node in the original graph. In the application, a behavior flow graph of a user is constructed based on extracted user characteristics and behavior characteristics, then fusion domain characteristics corresponding to the nodes are obtained based on the user characteristics (each node of the behavior flow graph), neighbor nodes of the nodes and weights of the neighbor nodes, embedded characterization characteristics are determined based on the fusion neighborhood characteristics of the nodes and the weights of the neighbor nodes and the neighbor nodes of the nodes, and abnormal behavior prediction is carried out on service data based on classification results determined from the embedded representation characteristics. How to obtain the embedded representation features corresponding to the nodes is described in detail below.
The behavior flow graph is provided with a plurality of nodes, the nodes are related information of the user, contain user characteristics and behavior characteristics and are used for representing behavior static information of the user, the nodes are connected based on the behavior characteristics, the connection is used for representing the time sequence relation between the nodes, and the nodes which are connected with the node A are neighbor nodes of the node A. Similarly, the sample behavior flow graph has a plurality of sample nodes, the sample nodes are also relevant information of the user, include user characteristics and user behavior characteristics, and are used for representing behavior static information of the user, the sample nodes are connected based on the sample behavior characteristics, the connection lines are used for representing the time sequence relationship between the sample nodes, and the nodes which are connected with the sample nodes C are neighbor nodes of the sample nodes C.
It can be understood that the sample user characteristics and the sample behavior characteristics are extracted from the sample internet behavior information of the user, and the sample internet behavior information is extracted from the historical service data. Here, how to obtain the historical service data of the user is not limited at all, and only the historical service data can be obtained.
More specifically, after the internet behavior information of the user is obtained, the user characteristics and the behavior characteristics of the user are extracted from the internet behavior information, then the user characteristics and the behavior characteristics are respectively filled in each node and each edge of a behavior flow graph, the construction of the behavior flow graph is completed, the node, namely a user information node, bears user equipment, an application program, a used browser, a webpage and other related information, and a connecting line (edge) from a node A to a node B represents the time sequence relation of the user behavior, namely the time sequence of the internet behavior.
The construction method of the sample behavior flow graph is consistent with that of the behavior flow graph, and details are not repeated here.
The business data abnormity identification method based on behavior association mining determines the internet access behavior information of a user from business data, extracts user characteristics and behavior characteristics from the internet access behavior information, not only maintains the operation environment information of the internet access operation of the user, but also maintains the time sequence information of the internet access operation of the user, identifies the business data by utilizing a trained business data identification model, and identifies the abnormal behavior information in the business data.
In the following, a method for identifying abnormal business data based on behavior association mining according to the present invention is described with reference to fig. 2, where in this embodiment, a business data identification model includes a flow diagram construction layer, a homogeneous user relationship network layer, and a heterogeneous user message network layer.
The flow graph construction layer is used for constructing a behavior flow graph of a user based on user characteristics and behavior characteristics; the homogeneous user relation network layer is used for performing attention operation on each node in the behavior flow graph based on each node in the behavior flow graph and neighbor nodes of the nodes, and determining fusion neighborhood characteristics of the nodes; the heterogeneous user message network layer is used for performing attention operation and semantic attention understanding on the fusion neighborhood characteristics of the nodes and the fusion neighborhood characteristics of the neighbor nodes, determining the embedded representation characteristics of the nodes and the distance between the nodes and the corresponding embedded representation characteristics, and determining abnormal behavior information based on the distance.
In the present embodiment, the behavior flow graph G = (V, E), where V = { u = (V, E) } 1 ,u 1 ,…,u n },u i Representing the ith user characteristic, i epsilon n, E = { epsilon = 11 ,…,ε m },ε j Represents the jth behavior feature, j ∈ m.
In this embodiment, the homogeneous user relationship network layer is a graph neural network based on a homogeneous graph, and more specifically, the homogeneous user relationship network layer employs a graph convolution network and a graph attention network; the heterogeneous user message network layer is a isomer-based graph neural network, and more particularly, the heterogeneous user message network layer adopts a heterogeneous graph-based attention network.
The service data identification model furthest saves the structure of the social network generated by the user behaviors by using the graph structure, not only keeps the operation environment information of the user internet surfing operation in the graph structure, but also keeps the time sequence information of the user internet surfing operation, so that the service data identification model integrates the characteristics of the three aspects of the user behaviors, the content generated by the behaviors and the time sequence relation among the user behaviors of the social network to identify the service data abnormity, and the service data abnormity identification precision and the identification speed are obviously improved.
Therefore, referring to fig. 3, step S20 specifically includes:
and S21, inputting the user characteristics and the behavior characteristics into a flow graph construction layer to obtain a behavior flow graph output by the flow graph construction layer.
And S22, inputting the behavior flow graph into the homogeneous user relationship network layer to obtain fusion neighborhood characteristics corresponding to the nodes output by the homogeneous user relationship network layer.
S23, inputting the fusion neighborhood characteristics into the heterogeneous user message network layer to obtain abnormal behavior information output by the heterogeneous user message network layer, wherein the abnormal behavior information comprises nodes and state characteristics corresponding to the nodes, and the state characteristics are labels of the nodes and correspond to the types of the abnormal behaviors.
The method for identifying abnormal business data based on behavior association mining according to the present invention is described below with reference to fig. 4, where the homogeneous user relationship network layer includes a neighborhood convolution layer, a similarity determination layer, a normalization processing layer, and a first attention operation layer.
The neighborhood convolution layer is used for performing convolution operation on the node, neighbor nodes of the node and convolution weights corresponding to the node, and determining state characteristics of the node, wherein the state characteristics are used for representing labels of the node; the similarity determining layer is used for determining similarity coefficients between the state features corresponding to the neighbor nodes and the state features corresponding to the nodes; the normalization processing layer is used for performing normalization processing on the similarity coefficient and determining an attention coefficient between the domain nodes; the first attention operation layer is used for carrying out weighting processing on the basis of state features, splicing weights and attention coefficients corresponding to the neighbor nodes and determining fusion neighborhood features corresponding to the nodes.
Therefore, referring to fig. 5, step S22 specifically includes:
s221, inputting the behavior flow graph into the neighborhood convolutional layer to obtain state characteristics corresponding to the nodes output by the neighborhood convolutional layer.
The neighborhood convolutional layer is a graph convolutional network based on space, that is, graph convolution is defined based on the spatial relationship of nodes, the convolution mode is to accumulate the states of all neighbor nodes of a certain node to update the state of the current node to obtain the state characteristics of the node, and specifically:
Figure BDA0003834792640000131
wherein h is l (v) Representing the status characteristics of level l nodes v, h l (v) Representing the state characteristics of the l +1 layer node v; n (v) represents a neighbor node of node v; sigma represents an activation coefficient, namely a nonlinear activation function sigma is added;
Figure BDA0003834792640000132
is the convolution weight of the node v in the l level for feature enhancement, in this embodiment, any node i has the corresponding convolution weight in the l level
Figure BDA0003834792640000133
Is a learnable parameter matrix
Figure BDA0003834792640000134
The method is used for aggregating the characteristics of the node i at the l-th layer and realizing the conversion of the characteristic vector dimension.
As an optional implementation manner of the present invention, the neighborhood convolutional layer is a convolutional layer of one layer and a local output function, where the neighborhood convolutional layer takes the entire behavior flow graph as input, performs a convolution operation on all nodes and neighboring nodes corresponding to the nodes through the convolutional layer, updates the state of the nodes according to the convolution result to obtain the state characteristics of the nodes, and finally converts the state characteristics of the nodes into a label for user anomaly detection through a local output function to output.
As another optional implementation manner of the present invention, the neighborhood convolutional layer is a multilayer convolutional layer and an output function, and the neighborhood convolutional layer also takes the entire behavior flow graph as input, performs a convolution operation on all nodes and neighboring nodes corresponding to the nodes in each convolutional layer, updates the node with a convolution result, inputs the node to the next convolutional layer through an activation function, and performs a cyclic operation, and finally converts the state of the node into a label for abnormal user detection through a local output function to output.
I.e. the state of the node is updated according to the states of its neighboring nodes, and the distribution of the weight to the node i is mainly dependent on the convolution weight W corresponding to the node i i ,W i The method is continuously learning and optimizing, and particularly, the parameters can be optimized through forward propagation and backward propagation.
S222, inputting the node characteristics into the similarity determination layer to obtain a similarity coefficient between the nodes output by the similarity determination layer.
The similarity determination layer, the normalization processing layer and the first attention operation layer after the similarity determination layer are an attention network added with an attention mechanism, and attention operation in the attention network is only performed on neighbor nodes of a certain node, for example, for a node i, similarity coefficients of the neighbor nodes and the neighbor nodes are calculated one by one, specifically:
e ij =a([V i h i ||V i h j ]),j∈N(i)
wherein e is ij Representing similarity coefficients between neighbor nodes j of the node i and the node i; [ | | · of [ ]]The feature splicing function is used for splicing the features after the dimension is increased; a is a mapping coefficient, in this embodiment, a is a function used to map the spliced high-dimensional features into real numbers, specifically, implemented by a single-layer feedforward neural network; v i Is the splicing weight of node i, and N (i) represents the neighbor nodes of node i.
The method and the device have the advantages that the graph attention network adding attention mechanism is used, different weights (namely splicing weights) can be distributed to different nodes, and in addition, only paired neighbor nodes are relied on during training instead of a specific overall network structure, so that a service data identification model has better generalization.
And S223, inputting the similar coefficients into the normalization processing layer to obtain attention coefficients among the nodes output by the normalization processing layer.
For similarity coefficient e ij Carrying out normalization processing to obtain an attention coefficient a between a neighboring node j and a node i ij Namely:
Figure BDA0003834792640000141
s224, inputting the state features, the splicing weights and the attention coefficients corresponding to the neighbor nodes of the nodes into the first attention operation layer to obtain fusion neighborhood features corresponding to the nodes output by the first attention operation layer.
In the first attention operation layer, according to the state characteristics, the splicing weight and the attention coefficient corresponding to the neighbor nodes of the nodes, the characteristics are weighted and summed to obtain the fusion neighborhood characteristics corresponding to the nodes, and the method specifically comprises the following steps:
Figure BDA0003834792640000142
wherein, h' i The new feature information is the fusion neighborhood feature of the node i, namely the new feature information after combining the feature information of the neighbor node of the node i.
In the following, the method for identifying abnormal business data based on behavior association mining according to the present invention is described with reference to fig. 6, where the heterogeneous user message network layer includes a weight learning layer, a second attention operation layer, a semantic attention understanding layer, and a prediction output layer.
The weight learning layer is used for determining the attention weight of the node based on the self-attention mechanism; the second attention operation layer is used for carrying out weighting processing on the basis of fusion neighborhood characteristics and attention weights corresponding to the neighbor nodes and determining the time sequence characteristics of the nodes; the time sequence characteristics are used for representing the semantics of the nodes; the semantic attention understanding layer is used for mapping the time sequence characteristics corresponding to the nodes, determining semantic attention weights of the nodes under the meta-path, performing weighting processing based on the time sequence characteristics corresponding to the nodes and the semantic attention weights, and determining embedded representation characteristics of the nodes; and the prediction output module is used for determining the classification result of the embedded representation characteristics and outputting abnormal behavior information based on the classification result.
Therefore, referring to fig. 7, step S23 specifically includes:
and S231, inputting the fusion neighborhood characteristics into the weight learning layer to obtain the attention weight corresponding to the node output by the weight learning layer.
Learning the weight of the neighbor through a self-attention mechanism of the node in the weight learning layer, and for the fusion neighborhood feature h 'of the fusion spliced node i' i And fusion neighborhood feature h 'of neighbor node j of the neighbor node' j Attention vector a of node capable of learning Φ To learn attention weights of neighboring node j relative to node i
Figure BDA0003834792640000151
The method specifically comprises the following steps:
Figure BDA0003834792640000152
and S232, inputting the fusion neighborhood characteristics of the nodes and the attention weight into the second attention operation layer to obtain the time sequence characteristics of the nodes output by the second attention operation layer.
After obtaining the attention weight corresponding to each neighbor node of a certain node i, with the attention network, weighting and summing the fusion neighborhood characteristics according to the attention weight to obtain the time sequence characteristics of the node i, specifically:
Figure BDA0003834792640000153
wherein the content of the first and second substances,
Figure BDA0003834792640000154
and representing the time sequence characteristics of the node i, wherein the time sequence characteristics are the relevant information of the node in each meta-path, namely the node is combined with the neighbor nodes, for example, if 10 neighbor nodes exist in the node A, then 10 meta-paths exist in the node A.
And S233, inputting the time sequence characteristics into the semantic attention understanding layer to obtain embedded representation characteristics corresponding to the nodes output by the semantic attention understanding layer.
The semantic attention understanding layer firstly learns the weights of a certain node on different element paths and performs weighted fusion on the time sequence characteristics of the node to obtain the semantic attention weight of the node, and the specific steps are as follows:
Figure BDA0003834792640000161
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003834792640000162
representing a semantic attention weight of node i; v is the total number of the element paths of the node i, namely the total number of the neighbor nodes; q. q.s T For semantic mapping coefficients, for mapping the semantics of each node of each meta-path onto real numbers, q in this embodiment T A vector parameter with the size of 1 multiplied by 3 is adopted;
Figure BDA0003834792640000163
is a single-layer neural network, W is the parameter matrix that the neural network layer can learn, and is also the layer weight of the neural network layer, and b is the bias vector of the neural network layer.
In the present application, after the processing of step S20, the node is updated to the fused neighborhood feature, the neighbor node of the fused neighborhood feature is also the fused neighborhood feature corresponding to the neighbor node of the original node, and the fused neighborhood feature may be of four types: the user information of the user a, the behavior message information of the user a, the user information of the non-user a (e.g., user B), and the behavior message information of the non-user a (e.g., user B), so the timing characteristics can be classified into three categories according to the relationship between the node and its neighbor nodes: the user information of the user A is the behavior message information of the user A, the user information of the user A is the behavior message information of the user A but not the behavior message information of the user A, and the user information of the user A is not the user information of the user A. That is, according to the relationship between a node and its neighboring nodes, the final timing characteristics can be classified into three categories:
Figure BDA0003834792640000164
and
Figure BDA0003834792640000165
the corresponding semantic attention weights can also be divided into three categories:
Figure BDA0003834792640000166
and
Figure BDA0003834792640000167
then, the semantic attention understanding layer performs weighting processing based on the time sequence characteristics corresponding to the nodes and the semantic attention weight, determines the embedded representation characteristics of the nodes, and the embedded representation characteristics can retain information in the original image in a low-dimensional vector, specifically:
Figure BDA0003834792640000168
where Z represents the embedded representation characteristics of all nodes.
And S234, inputting the embedded representation characteristics into the prediction output module to obtain abnormal behavior information output by the prediction output module.
The prediction output module obtains a multi-label classification result based on Z, obtains whether abnormal behaviors exist and which type of abnormal behaviors belong to according to the classification result and the corresponding label, and then transmits and feeds back the generated abnormal behavior information to the user.
The method for identifying the abnormal business data based on behavior association mining is described below with reference to fig. 8, and the business data identification model is obtained by training through the following steps:
and A10, determining sample state characteristics of the sample nodes from the sample behavior flow graph. The determining step of the sample state feature is similar to step S221, and is not described herein again.
And A20, taking the sample behavior flow graph as input data for training, taking sample state characteristics corresponding to the sample nodes as labels for training, and training in a deep learning mode to obtain a business data identification model of abnormal behavior information for generating the internet behavior information of the user.
In this embodiment, in step a20, a semi-supervised learning manner is used to train the model, and each parameter in the service data recognition model is adjusted, the ratio of the training set to the validation set to the test set is 3.
The business data anomaly identification device based on behavior association mining provided by the invention is described below, and the business data anomaly identification device based on behavior association mining described below and the business data anomaly identification method based on behavior association mining described above can be referred to correspondingly.
The application provides a business data anomaly identification device based on behavior association mining, which can be used for electronic devices such as computers, mobile phones, wearable smart devices, tablet computers and the like, fig. 9 is a schematic structural diagram of the business data anomaly identification device based on behavior association mining according to an embodiment of the application, and as shown in fig. 9, the device includes:
the feature extraction module 10 is configured to determine internet access behavior information of the user from the service data, and extract user features and behavior features from the internet access behavior information.
The service data may be stored in the electronic device in advance, or may be acquired by the electronic device from the outside. For example, the electronic device may collect the service data from various traffic monitoring devices, including but not limited to, collecting the service data from traffic data of a network layer and a transport layer. Here, how to obtain the internet surfing behavior information of the user is not limited at all, and only the service data can be obtained.
In this embodiment, the user characteristics are used to represent operating environment information of the user internet access operation, the user characteristics include information of user equipment, an application program, a browser and a web page, the behavior characteristics are used to represent time sequence information of the user internet access operation, and the behavior characteristics include time sequences of various internet access operations of the user, that is, the behavior characteristics include time sequence information of the user characteristics, for example, a user is accustomed to clicking a certain functional icon first and then clicking another functional icon, and the like. In this embodiment, the service data may include internet behavior information of at least one user, so that user features and behavior features of a corresponding number of users are also extracted from the internet behavior information, for example, the service data includes internet behavior information of three users, namely, a user a, a user B, and a user C, and then the user features and behavior features corresponding to the user a, the user features and behavior features corresponding to the user B, and the user features and behavior features corresponding to the user C are respectively extracted from the internet behavior information.
The behavior recognition module 20 is configured to input the user characteristics and the behavior characteristics into a trained business data recognition model to obtain abnormal behavior information output by the business data recognition model, in this application, the business data recognition model is obtained by training based on a sample behavior flow graph of a user, the sample behavior flow graph is constructed based on the sample user characteristics and the sample behavior characteristics of the user, the business data recognition model is configured to construct a behavior flow graph based on the user characteristics and the behavior characteristics, determine fusion neighborhood characteristics corresponding to each node in the behavior flow graph and embedded representation characteristics corresponding to the node based on the fusion domain characteristics of the node, and predict abnormal behaviors of the business data based on a classification result determined from the embedded representation characteristics. In this embodiment, each node in the behavior flow graph includes a user characteristic and a behavior characteristic, and because the behavior characteristic includes related timing information, the nodes are connected based on the behavior characteristic, and the connection is used to represent a timing relationship between the nodes.
The embedding representation feature is the representation of the nodes in the embedding space, that is, each node in the behavior flow graph is encoded, so that the similarity of the nodes in the embedding space is similar to the similarity of the nodes in the original graph. In the method, a behavior flow graph of a user is constructed based on extracted user characteristics and behavior characteristics, then fusion domain characteristics corresponding to the nodes are obtained based on the user characteristics (each node of the behavior flow graph), neighbor nodes of the nodes and the weights of the neighbor nodes, embedded characterization characteristics are determined based on the fusion neighborhood characteristics of the nodes and the weights of the neighbor nodes and the neighbor nodes of the nodes, and abnormal behavior prediction is carried out on service data based on classification results determined from the embedded characterization characteristics. How to obtain the embedded representation characteristics corresponding to the nodes is specifically described below.
The behavior flow graph is provided with a plurality of nodes, the nodes are related information of the user, contain user characteristics and behavior characteristics and are used for representing behavior static information of the user, the nodes are connected based on the behavior characteristics, the connection lines are used for representing the time sequence relation among the nodes, and the nodes which are connected with the node A are the neighbor nodes of the node A. Similarly, the sample behavior flow graph has a plurality of sample nodes, the sample nodes are also relevant information of the user, include user characteristics and user behavior characteristics, and are used for representing behavior static information of the user, the sample nodes are connected based on the sample behavior characteristics, the connection is used for representing a time sequence relation between the sample nodes, and the nodes connected with the sample nodes C are neighbor nodes of the sample nodes C.
It can be understood that the sample user characteristics and the sample behavior characteristics are extracted from the sample internet behavior information of the user, and the sample internet behavior information is extracted from the historical service data. Here, how to obtain the historical service data of the user is not limited at all, and only the historical service data can be obtained.
The business data abnormity identification device based on behavior association mining extracts user characteristics and behavior characteristics from the internet access behavior information of a user, the extracted information not only retains operation environment information of internet access operation of the user, but also retains time sequence information of the internet access operation of the user, and a trained business data identification model is utilized to identify the user characteristics and the behavior characteristics and identify abnormal behavior information in the extracted information.
Fig. 10 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 10: a processor (processor) 310, a communication Interface (Communications Interface) 320, a memory (memory) 330 and a communication bus 340, wherein the processor 310, the communication Interface 320 and the memory 330 communicate with each other via the communication bus 340. The processor 310 may invoke logic commands in the memory 330 to perform a behavioral association mining based business data anomaly identification method comprising:
determining the internet behavior information of a user, and extracting user characteristics and behavior characteristics from the internet behavior information; the user characteristics are used for representing the operation environment information of the user internet surfing operation, and the behavior characteristics are used for representing the time sequence information of the user internet surfing operation;
inputting the user characteristics and the behavior characteristics into a service data recognition model to obtain abnormal behavior information output by the service data recognition model; the business data recognition model is obtained by training based on a sample behavior flow graph of a user; the sample behavior flow graph is constructed on the basis of sample user characteristics and sample behavior characteristics of a user;
the business data identification model is used for constructing a behavior flow graph based on user characteristics and behavior characteristics, determining fusion neighborhood characteristics corresponding to each node in the behavior flow graph and embedded representation characteristics corresponding to the nodes based on the fusion domain characteristics of the nodes, and performing abnormal behavior prediction on business data based on classification results determined from the embedded representation characteristics; the nodes comprise user characteristics and behavior characteristics, and the nodes are connected based on the behavior characteristics and used for representing the time sequence relation among the nodes.
In addition, the logic commands in the memory 330 may be implemented in the form of software functional units and stored in a computer readable storage medium when the logic commands are sold or used as a separate medium. Based on such understanding, the technical solution of the present invention may be essentially or partially contributed to by the prior art, or may be embodied in a form of a software medium, which is stored in a storage medium and includes a plurality of commands for enabling a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.
In another aspect, the present invention further provides a computer program medium, where the computer program medium includes a computer program, the computer program may be stored on a non-transitory computer-readable storage medium, and when the computer program is executed by a processor, a computer can execute the method for identifying abnormal business data based on behavior association mining provided by the above methods, where the method includes:
determining the internet behavior information of a user, and extracting user characteristics and behavior characteristics from the internet behavior information; the user characteristics are used for representing the operation environment information of the user internet surfing operation, and the behavior characteristics are used for representing the time sequence information of the user internet surfing operation;
inputting the user characteristics and the behavior characteristics into a business data recognition model to obtain abnormal behavior information output by the business data recognition model; the business data recognition model is obtained by training based on a sample behavior flow diagram of a user; the sample behavior flow graph is constructed based on sample user characteristics and sample behavior characteristics of the user;
the business data identification model is used for constructing a behavior flow graph based on user characteristics and behavior characteristics, determining fusion neighborhood characteristics corresponding to each node in the behavior flow graph and embedded representation characteristics corresponding to the nodes based on the fusion domain characteristics of the nodes, and performing abnormal behavior prediction on business data based on classification results determined from the embedded representation characteristics; the nodes comprise user characteristics and behavior characteristics, and the nodes are connected based on the behavior characteristics and used for representing the time sequence relation among the nodes.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art will understand and implement the present invention without inventive effort.
Through the above description of the embodiments, it is clear to those skilled in the art that the embodiments may be implemented by software plus a necessary general hardware platform, and may also be implemented by hardware. With this understanding in mind, the above technical solutions may be embodied in the form of a software medium which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes commands for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (14)

1. A business data exception identification method based on behavior correlation mining is characterized by comprising the following steps:
determining internet behavior information of a user from the business data, and extracting user characteristics and behavior characteristics from the internet behavior information; the user characteristics are used for representing the operation environment information of the user internet surfing operation, and the behavior characteristics are used for representing the time sequence information of the user internet surfing operation;
inputting the user characteristics and the behavior characteristics into a business data recognition model to obtain abnormal behavior information output by the business data recognition model; the business data recognition model is obtained by training based on a sample behavior flow graph of a user; the sample behavior flow graph is constructed based on sample user characteristics and sample behavior characteristics of the user;
the business data identification model is used for constructing a behavior flow graph based on user characteristics and behavior characteristics, determining fusion neighborhood characteristics corresponding to each node in the behavior flow graph and embedded representation characteristics corresponding to the nodes based on the fusion domain characteristics of the nodes, and performing abnormal behavior prediction on business data based on classification results determined from the embedded representation characteristics; the nodes comprise user characteristics and behavior characteristics, and the nodes are connected based on the behavior characteristics and used for representing the time sequence relation among the nodes.
2. The method for identifying abnormal business data based on behavior association mining as claimed in claim 1, wherein the business data identification model comprises a flow graph construction layer, a homogeneous user relationship network layer and a heterogeneous user message network layer;
the flow graph construction layer is used for constructing a behavior flow graph of the user based on the user characteristics and the behavior characteristics;
the homogeneous user relation network layer is used for performing attention operation on each node in the behavior flow graph based on the node and the neighbor node of the node, and determining fusion neighborhood characteristics of the node; the neighbor node is a node which is connected with the current node;
the heterogeneous user message network layer is used for performing attention operation and semantic attention understanding on the fusion neighborhood characteristics of the nodes and the fusion neighborhood characteristics of the neighbor nodes, determining the embedded representation characteristics of the nodes, obtaining a classification result based on the embedded representation characteristics, and determining abnormal behavior information based on the classification result.
3. The method for identifying abnormal business data based on behavior association mining as claimed in claim 2, wherein the step of inputting the user characteristics and the behavior characteristics into the business data identification model to obtain abnormal behavior information output by the business data identification model specifically comprises:
inputting the user characteristics and the behavior characteristics into a flow graph construction layer to obtain a behavior flow graph output by the flow graph construction layer;
inputting the behavior flow graph into a homogeneous user relation network layer to obtain fusion neighborhood characteristics corresponding to the nodes output by the homogeneous user relation network layer;
inputting the fusion neighborhood characteristics into a heterogeneous user message network layer to obtain abnormal behavior information output by the heterogeneous user message network layer; the abnormal behavior information comprises nodes and state characteristics corresponding to the nodes.
4. The business data anomaly identification method based on behavior correlation mining, according to claim 3, wherein the homogeneous user relationship network layer comprises a neighborhood convolution layer, a similarity determination layer, a normalization processing layer and a first attention operation layer;
the neighborhood convolution layer is used for carrying out convolution operation on the node, neighbor nodes of the node and convolution weights corresponding to the node to determine state characteristics of the node; the state features are used for representing labels of the nodes;
the similarity determination layer is used for determining similarity coefficients between the state features corresponding to the neighbor nodes and the state features corresponding to the nodes;
the normalization processing layer is used for performing normalization processing on the similarity coefficient and determining an attention coefficient between the domain nodes;
the first attention operation layer is used for carrying out weighting processing based on the state characteristics, the splicing weight and the attention coefficient corresponding to the neighbor nodes and determining fusion neighborhood characteristics corresponding to the nodes.
5. The business data anomaly identification method based on behavior correlation mining, according to claim 3, wherein the heterogeneous user message network layer comprises a weight learning layer, a second attention operation layer, a semantic attention understanding layer and a prediction output layer;
the weight learning layer is used for determining the attention weight of the node based on a self-attention mechanism;
the second attention operation layer is used for carrying out weighting processing on the basis of fusion neighborhood characteristics and attention weights corresponding to the neighbor nodes and determining the time sequence characteristics of the nodes; the time sequence characteristics are used for representing the semantics of the nodes;
the semantic attention understanding layer is used for mapping the time sequence characteristics corresponding to the nodes, determining the semantic attention weights of the nodes, performing weighting processing on the time sequence characteristics corresponding to the nodes and the semantic attention weights, and determining the embedded expression characteristics of the nodes;
and the prediction output module is used for determining the classification result of the embedded representation characteristics and outputting abnormal behavior information based on the classification result.
6. The method for identifying abnormal business data based on behavior association mining as claimed in claim 4, wherein the step of inputting the behavior flow graph into the homogeneous user relationship network layer to obtain the fusion neighborhood characteristics corresponding to the nodes output by the homogeneous user relationship network layer specifically comprises the steps of:
inputting the behavior flow diagram into the neighborhood convolutional layer to obtain state characteristics corresponding to nodes output by the neighborhood convolutional layer;
inputting the node characteristics into a similarity determination layer to obtain a similarity coefficient between nodes output by the similarity determination layer;
inputting the similar coefficients into the normalization processing layer to obtain attention coefficients among nodes output by the normalization processing layer;
and inputting the state characteristics, the splicing weight and the attention coefficient corresponding to the neighbor nodes of the nodes into the first attention operation layer to obtain the fusion neighborhood characteristics corresponding to the nodes output by the first attention operation layer.
7. The method for identifying abnormal business data based on behavior association mining as claimed in claim 5, wherein the step of inputting the fusion neighborhood feature into the heterogeneous user message network layer to obtain abnormal behavior information output by the heterogeneous user message network layer specifically comprises:
inputting the fusion neighborhood characteristics into a weight learning layer to obtain attention weights corresponding to the nodes output by the weight learning layer;
inputting the fusion neighborhood characteristics and the attention weight of the node into a second attention operation layer to obtain the time sequence characteristics corresponding to the node output by the second attention operation layer;
inputting the time sequence characteristics into a semantic attention understanding layer to obtain embedded representation characteristics corresponding to the nodes output by the semantic attention understanding layer;
and inputting the embedded representation characteristics into a prediction output module to obtain abnormal behavior information output by the prediction output module.
8. The business data anomaly identification method based on behavior association mining as claimed in claim 1, wherein the business data identification model is obtained by training through the following steps:
determining sample state characteristics of sample nodes from a sample behavior flow graph; each sample node in the sample behavior flow graph comprises sample user characteristics and sample behavior characteristics, and the sample nodes are connected based on the sample behavior characteristics and used for representing the time sequence relationship among the sample nodes;
and taking the sample behavior flow graph as input data used for training, taking sample state characteristics corresponding to the sample nodes as labels used for training, and training in a deep learning mode to obtain a business data identification model for generating abnormal behavior information of the internet behavior information of the user.
9. A business data abnormity identification device based on behavior association mining is characterized in that the device comprises:
the characteristic extraction module is used for determining the internet behavior information of the user and extracting user characteristics and behavior characteristics from the internet behavior information; the user characteristics are used for representing the operation environment information of the user internet surfing operation, and the behavior characteristics are used for representing the time sequence information of the user internet surfing operation;
the behavior recognition module is used for inputting the user characteristics and the behavior characteristics into the service data recognition model to obtain abnormal behavior information output by the service data recognition model; the business data recognition model is obtained by training based on a sample behavior flow graph of a user; the sample behavior flow graph is constructed based on sample user characteristics and sample behavior characteristics of the user;
the business data identification model is used for constructing a behavior flow graph based on user characteristics and behavior characteristics, determining fusion neighborhood characteristics corresponding to each node in the behavior flow graph and embedded representation characteristics corresponding to the nodes based on the fusion domain characteristics of the nodes, and performing abnormal behavior prediction on business data based on classification results determined from the embedded representation characteristics; the nodes comprise user characteristics and behavior characteristics, and the nodes are connected based on the behavior characteristics and used for representing the time sequence relation among the nodes.
10. The business data anomaly recognition device based on behavior association mining as claimed in claim 9, wherein the behavior recognition module specifically comprises:
the flow graph constructing unit is used for inputting the user characteristics and the behavior characteristics into the flow graph constructing layer to obtain a behavior flow graph output by the flow graph constructing layer;
the relationship identification unit is used for inputting the behavior flow graph into the homogeneous user relationship network layer to obtain fusion neighborhood characteristics corresponding to the nodes output by the homogeneous user relationship network layer;
the message identification unit is used for inputting the fusion neighborhood characteristics into the heterogeneous user message network layer to obtain abnormal behavior information output by the heterogeneous user message network layer; the abnormal behavior information comprises nodes and state characteristics corresponding to the nodes.
11. The device for identifying abnormal business data based on behavioral association mining according to claim 10, wherein the relationship identifying unit specifically includes:
the first identification unit is used for inputting the behavior flow diagram into the neighborhood convolutional layer to obtain state characteristics corresponding to nodes output by the neighborhood convolutional layer;
the second identification unit is used for inputting the node characteristics into the similarity determination layer to obtain a similarity coefficient between nodes output by the similarity determination layer;
the third identification unit is used for inputting the similar coefficients into the normalization processing layer to obtain attention coefficients among nodes output by the normalization processing layer;
and the fourth identification unit is used for inputting the state characteristics, the splicing weight and the attention coefficient corresponding to the neighbor nodes of the nodes into the first attention operation layer to obtain the fusion neighborhood characteristics corresponding to the nodes output by the first attention operation layer.
12. The device for identifying abnormal business data based on behavioral association mining according to claim 10, wherein the message identification unit specifically includes:
the fifth identification unit is used for inputting the fusion neighborhood characteristics into the weight learning layer to obtain attention weights corresponding to the nodes output by the weight learning layer;
a sixth identification unit, configured to the weight learning layer, determine an attention weight of the node based on the self-attention mechanism;
the seventh identification unit is used for inputting the fusion neighborhood characteristics of the nodes and the attention weight into the second attention operation layer to obtain the time sequence characteristics of the nodes output by the second attention operation layer under the meta path;
the eighth identification unit is used for inputting the time sequence characteristics into the semantic attention understanding layer to obtain embedded representation characteristics corresponding to the nodes output by the semantic attention understanding layer;
and the ninth identification unit is used for inputting the embedded representation characteristics into the prediction output module to obtain the abnormal behavior information output by the prediction output module.
13. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the business data anomaly identification method based on behavior association mining according to any one of claims 1 to 8 when executing the program.
14. A non-transitory computer readable storage medium, on which a computer program is stored, wherein the computer program, when being executed by a processor, implements the steps of the business data anomaly identification method based on behavior association mining according to any one of claims 1 to 8.
CN202211084180.7A 2022-09-06 2022-09-06 Business data anomaly identification method and device based on behavior association mining Pending CN115473718A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211084180.7A CN115473718A (en) 2022-09-06 2022-09-06 Business data anomaly identification method and device based on behavior association mining

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211084180.7A CN115473718A (en) 2022-09-06 2022-09-06 Business data anomaly identification method and device based on behavior association mining

Publications (1)

Publication Number Publication Date
CN115473718A true CN115473718A (en) 2022-12-13

Family

ID=84371480

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211084180.7A Pending CN115473718A (en) 2022-09-06 2022-09-06 Business data anomaly identification method and device based on behavior association mining

Country Status (1)

Country Link
CN (1) CN115473718A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116760727A (en) * 2023-05-30 2023-09-15 南京南瑞信息通信科技有限公司 Abnormal traffic identification method, device, system and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030200462A1 (en) * 1999-05-11 2003-10-23 Software Systems International Llc Method and system for establishing normal software system behavior and departures from normal behavior
US20180103052A1 (en) * 2016-10-11 2018-04-12 Battelle Memorial Institute System and methods for automated detection, reasoning and recommendations for resilient cyber systems
CN113094707A (en) * 2021-03-31 2021-07-09 中国科学院信息工程研究所 Transverse mobile attack detection method and system based on heterogeneous graph network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030200462A1 (en) * 1999-05-11 2003-10-23 Software Systems International Llc Method and system for establishing normal software system behavior and departures from normal behavior
US20180103052A1 (en) * 2016-10-11 2018-04-12 Battelle Memorial Institute System and methods for automated detection, reasoning and recommendations for resilient cyber systems
CN113094707A (en) * 2021-03-31 2021-07-09 中国科学院信息工程研究所 Transverse mobile attack detection method and system based on heterogeneous graph network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
季述郧: "基于图卷积的电信用户行为识别方法研究与仿真", 《中国优秀硕士学位论文全文数据库(电子期刊) 信息科技辑》, no. 1, 15 January 2022 (2022-01-15) *
易树平;李嘉佳;易茜: "基于行为流图的可信交互检测方法", 《控制与决策》, vol. 35, no. 11, 14 May 2019 (2019-05-14), pages 2715 - 2722 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116760727A (en) * 2023-05-30 2023-09-15 南京南瑞信息通信科技有限公司 Abnormal traffic identification method, device, system and storage medium

Similar Documents

Publication Publication Date Title
CN111914156B (en) Cross-modal retrieval method and system for self-adaptive label perception graph convolution network
Gao Network intrusion detection method combining CNN and BiLSTM in cloud computing environment
CN113158554B (en) Model optimization method and device, computer equipment and storage medium
CN111625715B (en) Information extraction method and device, electronic equipment and storage medium
CN114330966A (en) Risk prediction method, device, equipment and readable storage medium
CN116150509B (en) Threat information identification method, system, equipment and medium for social media network
CN112819024B (en) Model processing method, user data processing method and device and computer equipment
Sun et al. Image steganalysis based on convolutional neural network and feature selection
CN114329455B (en) User abnormal behavior detection method and device based on heterogeneous graph embedding
CN115473718A (en) Business data anomaly identification method and device based on behavior association mining
CN116090504A (en) Training method and device for graphic neural network model, classifying method and computing equipment
CN113705402A (en) Video behavior prediction method, system, electronic device and storage medium
Jin et al. Improving the Performance of Deep Learning Model‐Based Classification by the Analysis of Local Probability
CN114638984B (en) Malicious website URL detection method based on capsule network
CN113051607B (en) Privacy policy information extraction method
CN115587616A (en) Network model training method and device, storage medium and computer equipment
CN111615178B (en) Method and device for identifying wireless network type and model training and electronic equipment
CN114925681A (en) Knowledge map question-answer entity linking method, device, equipment and medium
CN115964478A (en) Network attack detection method, model training method and device, equipment and medium
CN113409096A (en) Target object identification method and device, computer equipment and storage medium
CN110929118A (en) Network data processing method, equipment, device and medium
CN116069831B (en) Event relation mining method and related device
Feng et al. Construction of Legal Reporting Information Platform Based on Natural Optimization Algorithm
CN116248357A (en) Domain name crawling method and device and network equipment
Luo Network Security Situation Prediction Technology Based on Fusion of Knowledge Graph.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination