CN115512133A

CN115512133A - Exception detection method and system for import-export behavior dynamic graph data

Info

Publication number: CN115512133A
Application number: CN202211251195.8A
Authority: CN
Inventors: 包先雨; 蔡伊娜; 高祖康; 李俊杰; 程烨; 蒋涛; 黄智强; 黄哲学; 郑文丽; 程立勋; 方凯彬
Original assignee: Shenzhen Qianhai Quantum Cloud Code Technology Co ltd; Shenzhen University; Shenzhen Academy of Inspection and Quarantine
Current assignee: Shenzhen Qianhai Quantum Cloud Code Technology Co ltd; Shenzhen University; Shenzhen Academy of Inspection and Quarantine
Priority date: 2022-10-13
Filing date: 2022-10-13
Publication date: 2022-12-23

Abstract

The invention relates to the technical field of import and export behavior dynamic graph detection, and discloses an import and export behavior dynamic graph data anomaly detection method, which comprises the following steps: s1, defining an import and export behavior dynamic graph; s2, extracting characteristics of the import and export behavior dynamic graph; s3, detecting the abnormality of the import and export behavior dynamic graph: in an anomaly detection module, the nodes in the S2 are used for representing and detecting abnormal edges in the import and export behavior dynamic graph; according to the method, on the aspect of time sequence feature extraction of the dynamic graph, a sliding window cyclic neural network structure is not adopted, a graph embedding method based on long-term and short-term time sequence attention is adopted to extract the time sequence feature, the short-term time sequence attention of the nodes in the snapshot can be efficiently extracted through a block calculation attention structure, each time sequence block extracts and transmits the long-term time sequence feature of the nodes through a long-term memory state vector, the integrity of the time sequence feature of the model extraction nodes is guaranteed, and therefore the performance of anomaly detection is improved.

Description

Exit-import behavior dynamic graph data anomaly detection method and system

Technical Field

The invention relates to the technical field of import and export behavior dynamic graph detection, in particular to an import and export behavior dynamic graph data anomaly detection method and system.

Background

The customs clearance system comprises a large amount of key data for describing import and export behaviors, the data are supervised, abnormal mutation of the customs monitoring import and export behaviors faces various challenges, firstly, the relation between commodities and ports and the relations between the commodities and the commodities are considered when detecting the abnormity, but the relations become complex along with the increase of the commodity types and the ports, and the relation between the commodities and the ports is an important characteristic for detecting the abnormity; meanwhile, the relationship between the same or similar commodities and the port is often similar, and the anomaly detection function is achieved; due to the fact that the economy is global deeply, the types of commodities accumulated by customs are multiple, the relations between the commodities and the port and between the commodities and the commodities become complex rapidly along with the increase of the types of the commodities, and further challenges are brought to abnormal detection; secondly, the change condition of the relationship between the commodity and the port needs to be considered when detecting the abnormity; detecting an abnormality, wherein an important basis is the change condition of the relationship between the commodity and the port, for example, if a certain commodity is mainly at an inlet and an outlet of a fixed port and changes to an inlet and an outlet of another fixed port, a certain abnormality is likely to occur; however, as the number of categories of goods increases, these relationships become complex, and it is therefore also a challenge to consider how these relationships change over time.

The port and the commodity, the relationship between the commodity and how these relationships change are represented in the form of a dynamic graph. The dynamic graph can represent the object and the relation between the objects, so that the relation between the commodity and the port can be effectively represented; the dynamic graph has a constantly changing characteristic and can show how the relation between the commodity and the port changes along with the time;

the dynamic graph is a graph data structure with time sequence characteristics, and the anomaly detection of the dynamic graph is one of the most challenging tasks in anomaly detection, and the aim of the anomaly detection is to find anomalous objects, points and edges in the dynamic graph data, wherein the anomaly point and anomaly edge detection is particularly widely applied, for example, the detection of the occurrence of accidents in a traffic network, the detection of network anomalies or network attacks in a computer network, and the like;

the existing dynamic graph anomaly detection method can be mainly divided into two types, one type is based on a heuristic rule anomaly detection algorithm, and the other type is based on a graph representation learning technology anomaly detection algorithm;

when the time sequence feature is extracted, most of the existing methods based on graph type learning adopt a sliding window-based recurrent neural network structure to extract the time sequence feature of nodes of snapshots in a window, but the extracted time sequence feature only considers the short-term time sequence feature of the nodes of the snapshots in the window, ignores the node state of the past snapshots, causes the loss of the long-term time sequence feature of the nodes, and limits the effect of extracting the time sequence feature to a certain extent, thereby causing the incomplete detection of the abnormity. Therefore, those skilled in the art provide a method and a system for detecting data abnormality of an import/export behavior dynamic graph, so as to solve the problems mentioned in the background art.

Disclosure of Invention

The invention aims to provide a method and a system for detecting data abnormality of an import/export behavior dynamic graph, so as to solve the problems in the background technology.

In order to achieve the purpose, the invention provides the following technical scheme:

an import-export behavior dynamic graph data anomaly detection method comprises the following steps:

s1, defining an import and export behavior dynamic graph: constructing a graph sequence to define an import-export behavior dynamic graph;

s2, feature extraction of the import and export behavior dynamic graph: extracting structural features and time features in the import and export behavior dynamic graph into node representations of respective modules through a structural and context feature extraction module and a dynamic time sequence feature extraction module;

s3, anomaly detection of the import and export behavior dynamic graph: and detecting abnormal edges in the import and export behavior dynamic graph by using the node representation in the S2 in the abnormal detection module.

As a still further scheme of the invention: the definition content of the import and export behavior dynamic graph in the S1 is that the import and export behavior dynamic graph is defined as G;

the above formula is a graph sequence;

in the above-mentioned formula, the compound has the following structure,

represented as graphs at time stamp t, for each graph

Wherein

And

representative drawing

A point set and an edge set;

representative node v _i And v _j One edge in between, and the weight is w;

and

respectively representing the total point set and the edge set of the import-export behavior dynamic graph G, then n = | V |, A ^t ∈R ^n×n An adjacency matrix representing each graph.

As a still further scheme of the invention: the structural and contextual feature extraction module in S2 uses a multi-layer graph convolutional neural network for snapshot to extract structural features in the import and export behavior dynamic graph, and content features are aggregated among nodes.

As a still further scheme of the invention: the dynamic time sequence feature extraction module in the S2 provides a multi-head attention mechanism of the import and export behavior dynamic graph, and expands the multi-head attention mechanism into a block circulation structure, so that two main time features, namely long-term and short-term time sequence features, of the dynamic graph are extracted.

An import and export behavior dynamic graph data anomaly detection system comprises a structure and context feature extraction module, a dynamic time sequence feature extraction module and an anomaly detection module.

As a still further scheme of the invention: the structure and context feature extraction module extracts a structure feature of the snapshot in the dynamic graph by using a multilayer graph convolution neural network and maps the nodes in the dynamic graph into a high-dimensional space vector;

the dynamic time sequence feature extraction module divides a dynamic graph sequence into a plurality of time sequence blocks according to the size of a fixed window, a multi-head attention mechanism is introduced into each time sequence block to better extract time sequence features, vector expression of the time sequence features is updated, and each time sequence block transmits long-term time sequence information of each node to the next block through a long-term memory state vector, so that the long-term time sequence features are stored and extracted, and abnormal edge detection performance is improved;

the anomaly detection module uses the node vectors in the dynamic graph to construct vector expressions of edges, each edge vector is put into a nonlinear activation function to carry out anomaly scoring, and abnormal edge data with the anomaly score larger than a threshold value is found out.

As a still further scheme of the invention: the structural and contextual feature extraction module will perform on each graph in the sequence of graphs

Performing multilayer graph convolution operation once to extract the structural features of each graph and obtain the vertex vector of each graph, which is specifically described as:

in the formula (1), GCN is a multi-layer graph convolution neural network for snapshots, L represents the number of layers of GCN,

representative drawing

Vector representation per vertex, d _h Is a dimension of a vector and is a function of,and the GCN is calculated specifically as follows:

Z ⁽⁰⁾ ＝X ^t (2)

in equations (3) and (4), σ (-) represents some activation function, such as the ReLU activation function,

the resulting output of this module is each graph

Vertex vector matrix H ^t 。

As a still further scheme of the invention: the dynamic time sequence feature extraction module adopts a dynamic graph multi-head attention network to extract long-term and short-term time features into the representation of each node in parallel;

the graph sequence G is divided into a plurality of blocks by the window size k,

the ith block is represented, and a vertex vector sequence { H ] of each block is obtained after the ith block passes through a structure and context feature extraction module ^t-k+1 ,…,H ^t }；

Before putting the sequence into the multi-head attention network of the dynamic graph, a memory vector M, a current block is added

The vector sequence of (1) only contains local short-time characteristics and also needs to retain previous time sequence characteristics, so that the block is subjected to

In other words, the input sequence is

Note the book

Into blocks

A vector sequence of points v, the sequence being due to the symmetry of the multi-headed attention mechanism

The sequence does not contain position information of the sequence, so that the sequence needs to be subjected to position coding PE (-) once, and multi-head attention input is obtained, and the input process is as follows:

in the formulas (5) and (6),

is the coded information of the ith position,

into blocks

An input sequence of medium vertices v;

after the input sequence is processed, a calculation process of extracting the time sequence information by the multi-head attention mechanism will be described, which specifically comprises the following steps:

in the formulas (7) to (10),

is a representative block

The medium vertex v outputs a sequence of vectors,

and

three learnable parameters are provided, and the above process can make the calculation of each point be parallel by the characteristic of matrix operation.

As a still further scheme of the invention: after the two modules of the structure and context feature extraction module and the dynamic time sequence feature extraction module are processed, each time chart is obtained

Vector representation of each point in

To detect anomalous edges in each graph, a scoring function is defined toThe degree of abnormality of each edge was evaluated, and the score function was defined as follows:

in the formula (11), W _a And b is a parameter that can be learned,

and

vector expressions of two vertexes respectively representing the edge e, wherein the value range of f (e) is {0,1}, and in order to obtain the optimal parameter, a loss function during training is defined as follows:

in the formula (12), y _e The label representing the edge e is a label,

for the L2 norm, λ is an adjustable hyper-parameter.

After the model is trained, abnormal edges in the dynamic graph can be detected according to formula (11).

Compared with the prior art, the invention has the beneficial effects that:

the invention applies a dynamic graph abnormal edge detection algorithm based on long and short time sequence attention, does not adopt a sliding window-based recurrent neural network structure but adopts a graph embedding method based on long and short time sequence attention to extract time sequence characteristics on the extraction of the time sequence characteristics of the dynamic graph, can efficiently extract the short time sequence attention of nodes in a snapshot through a block calculation attention structure, and ensures the integrity of model extraction node time sequence characteristics through extracting and transmitting the long time sequence characteristics of the nodes by each time sequence block through a long-term memory state vector, thereby improving the performance of abnormal detection.

Drawings

FIG. 1 is a block diagram of an algorithm framework in a method and system for detecting data anomalies in a dynamic graph of import-export behavior;

FIG. 2 is a diagram of AUC of snapshots of a data set after injection of 1% abnormal data in a method and system for detecting data abnormality in a dynamic graph of import and export behaviors;

fig. 3 is a flowchart of an algorithm in a method and system for detecting data anomaly of an import-export behavior dynamic graph.

Detailed Description

Referring to fig. 1 to 3, a method for detecting data anomaly of an import/export behavior dynamic graph includes the following steps:

Preferably, the definition content of the import-export behavior dynamic graph in S1 is to define the import-export behavior dynamic graph as G;

the above formula is a graph sequence;

in the above formula, the first and second carbon atoms are,

represented as graphs at time stamp t, for each graph

Wherein

And

representative drawing

A point set and an edge set;

representative node v _i And v _j One edge in between, and the weight is w;

and

Preferably, the structural and contextual feature extraction module in S2 uses the snapshot to extract structural features in the import-export behavior dynamic graph by using the multi-layer graph convolutional neural network, and aggregates content features between nodes.

Preferably, the dynamic time sequence feature extraction module in S2 proposes a multi-head attention mechanism of the import-export behavior dynamic graph, and expands the multi-head attention mechanism into a structure of a block cycle, so as to extract two main time features of the dynamic graph, namely long-term and short-term time sequence features.

An import-export behavior dynamic graph data anomaly detection system comprises a structure and context feature extraction module, a dynamic time sequence feature extraction module and an anomaly detection module.

Preferably, the structure and context feature extraction module extracts a structure feature of the snapshot in the dynamic graph by using a multilayer graph convolution neural network, and maps the nodes in the dynamic graph into a high-dimensional space vector;

the dynamic time sequence feature extraction module divides a dynamic graph sequence into a plurality of time sequence blocks according to the size of a fixed window, a multi-head attention mechanism is introduced into each time sequence block to better extract time sequence features, vector expression of the time sequence features is updated, and each time sequence block transmits long-term time sequence information of each node to the next block through a long-term memory state vector, so that the long-term time sequence features are stored and extracted, and the abnormal edge detection performance is improved;

and the anomaly detection module uses the node vectors in the dynamic graph to construct vector expression of edges, each edge vector is put into a nonlinear activation function to carry out anomaly scoring, and the abnormal edge data with the anomaly score larger than a threshold value is found out.

Preferably, the structural and contextual feature extraction module will perform on each graph in the sequence of graphs

representative drawing

Vector representation per vertex, d _h Is the vector dimension, and the GCN is calculated specifically as follows:

Z ⁽⁰⁾ ＝X ^t (2)

the final output of this module is each graph

Vertex vector matrix H of ^t 。

Preferably, the dynamic time series feature extraction module adopts a dynamic graph multi-head attention network to extract long-term and short-term time features into the representation of each node in parallel;

Before putting the sequence into the dynamic graph multi-head attention network, the memory vector M, the current block are added

The vector sequence of (2) contains only local, short-time temporal characteristics, and also the previous temporal characteristics need to be preserved, so for a block

In other words, the input sequence is

Note the book

Into blocks

The sequence does not contain position information of a sequence, so that the sequence needs to be subjected to position coding PE (-) once, and the input of multi-head attention is obtained, and the input process is as follows:

in the equations (5) and (6),

is the coded information of the ith position,

into blocks

An input sequence of medium vertices v;

in the formulas (7) to (10),

is a representative block

The medium vertex v outputs a sequence of vectors,

and

Preferably, each time graph is obtained after being processed by two modules, namely a structure and context feature extraction module and a dynamic time sequence feature extraction module

Vector representation of each point in

In order to detect abnormal edges in each graph, a scoring function is defined to evaluate the degree of abnormality of each edge, and the scoring function is defined as follows:

in the formula (11), W _a And b is a parameter that can be learned,

and

vector expressions of two vertexes respectively representing the edge e, the value range of f (e) is {0,1}, and in order to obtain the optimal parameter, a loss function during training is defined as follows:

in the formula (12), y _e The label representing the edge e is a label,

for the L2 norm, λ is an adjustable hyper-parameter.

To better illustrate the technical effects of the present invention, the following tests were carried out:

the detection method is marked as LASTAN, and the LASTAN is defined as a dynamic graph abnormal edge detection algorithm based on long and short time sequence attention;

experiments are carried out under six real data sets, namely a database (UCI) for machine learning, a database (Digg) of a U.S. duke company, an electronic mail distributed control database (Email-DNC), a quantification system (Bitcoid-Alpha) of intelligent algorithm automatic coin frying, a Bitcoin trading platform system (Bitcoid-OTC) and an AS-level Topology (AS-Topology) generated from route view project data, wherein the six real world data sets comprise dynamic graph data of types such AS social networks, computer network attacks and the like, meanwhile, a statistical frequency algorithm (CM-etStch) for improving processing speed and reducing processing time by sacrificing counting accuracy, a method for detecting anomaly in edge streams (Sedan Spot), a method for detecting dynamic network anomaly (Net Walk), a method for detecting variable data type neural network (StrGNN) are selected AS datum line methods for comparison, and each data set is divided into 50% of training set by the research method;

because the data set does not carry abnormal data, the method for referring to Net Walk in the research injects abnormal data into the data set according to the proportion of 1 percent, 5 percent and 10 percent;

TABLE 1 comparison of AUC results of each method experiment

The overall experimental results are shown in table 1, and from the experimental results, the performance of the method is better than that of other reference methods on six data sets, while fig. 2 shows the AUC scores of different snapshots in each data set, and from the experimental results, the performance of the research method at any time is better than that of other methods, particularly on three data sets of UCI, bitcion-Alpha and bitcion-OTC, as time increases, other reference methods show a greater decreasing trend on the AUC scores, and the research method still maintains a better effect.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention are equivalent to or changed within the technical scope of the present invention.

Claims

1. A method for detecting data abnormity of an import-export behavior dynamic graph is characterized by comprising the following steps:

2. The method according to claim 1, wherein the defining content of the import/export behavior dynamic graph in S1 is that the import/export behavior dynamic graph is defined as G;

the above formula is a graph sequence;

in the above formula, the first and second carbon atoms are,

represented as graphs at time stamp t, for each graph

Wherein

And

representative drawing

A point set and an edge set;

representative node v _i And v _j An edge in between, and weight w;

and

3. The method for detecting the data abnormality of the import/export behavior dynamic graph according to claim 1, wherein the structural and contextual feature extraction module in S2 uses a snapshot-based multi-layer graph convolutional neural network to extract structural features in the import/export behavior dynamic graph and aggregate content features between nodes.

4. The method as claimed in claim 1, wherein the dynamic timing feature extraction module in S2 extracts two main time features of the dynamic graph, namely long-term and short-term timing features, by extracting a multi-point attention mechanism of the dynamic graph of the import-export behavior and expanding the multi-point attention mechanism into a structure of a block cycle.

5. The system for realizing the data anomaly detection of the import-export behavior dynamic graph is characterized by comprising a structure and context feature extraction module, a dynamic time sequence feature extraction module and an anomaly detection module.

6. The system for detecting data abnormality of an import-export behavior dynamic graph according to claim 5, wherein said structural and contextual feature extraction module performs a structural feature extraction on the snapshot in the dynamic graph by using a multi-layer graph convolution neural network, and maps the nodes in the dynamic graph into high-dimensional space vectors;

7. The system according to claim 5 or 6, wherein the structural and contextual feature extraction module extracts the structural and contextual feature of each graph in the graph sequence

representative drawing

Z ⁽⁰⁾ ＝X ^t (2)

the resulting output of this module is each graph

Vertex vector matrix H of ^t 。

8. The system for detecting the data abnormality of the import-export behavior dynamic graph according to claim 5 or 6, wherein the dynamic time sequence feature extraction module adopts a dynamic graph multi-head attention network to extract long-term and short-term time features into the representation of each node in parallel;

the graph sequence G is divided into a plurality of blocks by a window size k,

In other words, the input sequence is

Note the book

Into blocks

A vector sequence of mid points v, the sequence being symmetrical due to a multi-headed attention mechanism

in the equations (5) and (6),

is the coded information of the ith position,

into blocks

An input sequence of medium vertices v;

in the formulas (7) to (10),

is a representative block

The middle vertex v outputs a sequence of vectors,

and

9. The system for detecting data abnormality of import-export behavior dynamic graph according to claim 5 or 6, wherein each time graph is obtained after being processed by a structure and context feature extraction module and a dynamic time sequence feature extraction module

Vector representation of each point in

in the formula (11), W _a And b is a parameter that can be learned,

and

in the formula (12), y _e The label representing the edge e is a label,

for the L2 norm, λ is an adjustable hyper-parameter.