WO2022088408A1

WO2022088408A1 - Graph neural network-based transaction fraud detection method and system

Info

Publication number: WO2022088408A1
Application number: PCT/CN2020/135271
Authority: WO
Inventors: 王欢; 李青山; 司华友
Original assignee: 南京博雅区块链研究院有限公司; 北京大学; 博雅正链(北京)科技有限公司
Priority date: 2020-11-02
Filing date: 2020-12-10
Publication date: 2022-05-05
Also published as: CN112396160A

Abstract

A graph neural network-based transaction fraud detection method and system. The method comprises the following steps: a transaction data preprocessing step (S100): obtaining transaction data, preprocessing the transaction data, and obtaining a transaction sample set in a panel form; a transaction behavior historical feature extraction step (S300): performing long short-term memory network processing on the transaction sample set to obtain a transaction behavior historical feature; a transaction behavior aggregation feature extraction step (S400): performing graph convolutional network processing on the transaction behavior historical feature to obtain a transaction behavior aggregation feature; and a prediction step (S500): performing fully-connected layer processing on the transaction behavior historical feature and the transaction behavior aggregation feature, and performing fraud prediction of a transaction node by means of binary classification. The method overcomes the defects that a traditional transaction fraud detection method ignores a relationship between data, and a transaction behavior is time series data, ensures the comprehensiveness of transaction fraud detection, and improves the accuracy of transaction fraud detection.

Description

Transaction Fraud Detection Method and System Based on Graph Neural Network

technical field

The invention relates to the field of financial technology, in particular to a transaction fraud detection method and system based on a graph neural network.

Background technique

In the era of big data, offline and online transactions are becoming more and more frequent, including some illegal transactions such as malicious attacks and phishing. Therefore, before a transaction becomes an illegal transaction, it needs to be detected according to the characteristics of the transaction to prevent huge losses.

Transaction data refers to the directed transactions between many transaction accounts. Due to the existence of scams, malware, terrorist organizations, ransomware, Ponzi schemes, etc., some fraudulent transactions appear in the transaction network, and these data are time series data. Refers to the behavior sequence of transactions over a period of time, so we need to classify illegal transactions and legitimate transactions to detect transaction fraud.

Traditional detection methods use machine learning methods or time series classification methods, but such methods ignore the inherent relationship between such data: transactions are nodes on the network, and if a transaction occurs, it means that there is a relationship between the two nodes. On the other hand, ignoring that transaction behavior is time series data, these defects will greatly affect the comprehensiveness and accuracy of detection.

SUMMARY OF THE INVENTION

Based on this, it is necessary to solve the technical problems of poor comprehensiveness and low accuracy of detection due to the fact that the traditional transaction fraud detection method ignores the relationship between the data itself and the transaction behavior is the characteristic of time series data. A transaction fraud detection method and system based on a graph neural network.

The present invention proposes an embodiment of a transaction fraud detection method based on a graph neural network, comprising the following steps:

The transaction data preprocessing step is to obtain transaction data and preprocess the transaction data to obtain a panel-shaped transaction sample set;

The step of extracting the historical features of the transaction behavior, performing long-short-term memory network processing on the transaction sample set to obtain the historical features of the transaction behavior;

The step of extracting transaction behavior aggregation features is to perform graph convolution network processing on the transaction historical behavior features to obtain transaction behavior aggregation features;

In the prediction step, the historical characteristics of the transaction behavior and the aggregated characteristics of the transaction behavior are processed by the full connection layer, and the fraud prediction of the transaction node is carried out through two classifications.

In one embodiment, in the transaction data preprocessing step, the preprocessing includes the following sub-steps:

obtaining the local transaction node characteristics of the transaction data;

Obtain the transaction node summary characteristics of the transaction data;

Obtain the transaction node sub-graph information of the transaction data.

In one embodiment, before the step of extracting the historical features of the transaction behavior, the following steps are further included:

The spectral clustering sample labeling step is to perform spectral clustering sample labeling processing on the transaction sample set to obtain a spectral clustering transaction sample set.

In one embodiment, in the spectral clustering sample labeling step, the spectral clustering sample labeling process includes the following sub-steps:

Construct the spectral matrix of the transaction sample set;

eigenvalue decomposition of the spectral matrix into an eigenmatrix;

The feature matrix is clustered.

In one embodiment, in the transaction behavior aggregation feature extraction step, the graph convolutional network processing includes the following sub-steps:

obtaining an adjacency matrix of the historical characteristics of the transaction behavior;

The adjacency matrix is input into the graph convolutional network graph learning layers of layers 2 to 4 for feature propagation among neighbors, and nonlinear activation is performed on the outside after each layer.

A transaction fraud detection system based on a graph neural network proposed by the present invention, the transaction fraud detection system based on a graph neural network includes the following modules:

a transaction data preprocessing module, which is used for acquiring transaction data and preprocessing the transaction data to obtain a panel-shaped transaction sample set;

a transaction behavior historical feature extraction module, which is used to perform long-short-term memory network processing on the transaction sample set to obtain transaction behavior historical features;

a transaction behavior aggregation feature extraction module, the transaction behavior aggregation feature extraction module is configured to perform graph convolution network processing on the transaction historical behavior features to obtain transaction behavior aggregation features;

A prediction module, which is configured to perform full-connection layer processing on the historical characteristics of the transaction behavior and the aggregated characteristics of the transaction behavior, and perform fraud prediction of transaction nodes through binary classification.

In one embodiment, in the transaction data preprocessing module, the preprocessing includes the following sub-steps:

obtaining the local transaction node characteristics of the transaction data;

Obtain the transaction node summary characteristics of the transaction data;

Obtain the transaction node sub-graph information of the transaction data.

In one embodiment, the graph neural network-based transaction fraud detection system further includes a spectral clustering sample labeling module, which is configured to perform spectral clustering sample labeling on the transaction sample set processing to obtain a spectral clustering transaction sample set.

In one embodiment, in the spectral clustering sample labeling module, the spectral clustering sample labeling process includes the following sub-steps:

Construct the spectral matrix of the transaction sample set;

eigenvalue decomposition of the spectral matrix into an eigenmatrix;

The feature matrix is clustered.

In one embodiment, in the transaction behavior aggregation feature extraction module, the graph convolutional network processing includes the following sub-steps:

The above-mentioned transaction fraud detection method and system based on graph neural network overcomes the traditional transaction fraud detection method, which ignores the relationship between the data itself through the extraction of historical features of transaction behavior and the extraction of aggregated features of transaction behavior, and then through the full connection layer processing. And transaction behavior is the defect of time series data, which ensures the comprehensiveness of transaction fraud detection and improves the accuracy of transaction fraud detection.

Description of drawings

In order to more clearly illustrate the embodiments of the present invention and the technical solutions in the designed system architecture, the following briefly introduces the system embodiments and the drawings required in the system architecture and technical solutions with reference to the accompanying drawings. Obviously, the following description The accompanying drawings are only some embodiments of the present invention, and for those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative efforts.

Fig. 1 is the flow chart of the transaction fraud detection method based on graph neural network of the present invention;

Fig. 2 is the flow chart of transaction data preprocessing steps in the transaction fraud detection method based on the graph neural network of the present invention;

Fig. 3 time variation curve of local transaction node feature (transaction fee) after transaction data preprocessing in the transaction fraud detection method based on graph neural network of the present invention;

Fig. 4 is the time change curve of the summary characteristic of the transaction node (the maximum value in the transaction fee of the local transaction node and its neighbor transaction node) after transaction data preprocessing in the transaction fraud detection method based on the graph neural network of the present invention;

Figure 5. Illegal transaction diagram formed by transaction data marked as illegal transaction in Bitcoin transaction data;

Figure 6. Legal transaction diagram formed by transaction data marked as legal transaction in Bitcoin transaction data;

7 is a comparison diagram of the effect of spectral clustering sample labeling and binary real distribution in the transaction fraud detection method based on graph neural network of the present invention;

Fig. 8 is a flow chart of parallelized construction of distance matrix in the method for detecting transaction fraud based on graph neural network of the present invention;

9 is a structural diagram of a transaction fraud detection system based on a graph neural network according to an embodiment of the present invention;

FIG. 10 is a data flow diagram of the transaction fraud detection system based on the graph neural network of the present invention.

Detailed ways

It should be pointed out that the content of the following detailed description is all exemplary, and the purpose is to illustrate the content of the present invention. The same meaning as commonly understood.

The following will clearly and completely describe the system architecture in the embodiments of the present invention and the solutions in the prior art with reference to the accompanying drawings in the embodiments of the present invention. It should be noted that the described embodiments are only for the purpose of The invention is explained and illustrated, but not the entire content. On the basis of the embodiments provided by the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work are within the protection scope of the present application.

In the following embodiments of the present invention, the method and system for detecting transaction fraud based on a graph neural network of the present invention are described in detail by taking the digital currency transaction fraud detection of Bitcoin (BTC) as an example. It should be noted that the graph neural network-based transaction fraud detection method and system of the present invention can also be used in fraud detection of other transaction data, such as digital currency transaction data, traffic data, and stock data.

Referring to Fig. 1, the transaction fraud detection method based on the graph neural network proposed by the present invention includes the following steps:

S100, a transaction data preprocessing step, acquiring transaction data and preprocessing the transaction data to obtain a panel-shaped transaction sample set;

S300, the step of extracting the historical characteristics of the transaction behavior, performing long-short-term memory network processing on the transaction sample set to obtain the historical characteristics of the transaction behavior;

S400, a transaction behavior aggregation feature extraction step, performing graph convolution network processing on transaction historical behavior features to obtain transaction behavior aggregation features;

S500, a prediction step, which performs full-connection layer processing on the historical characteristics of the transaction behavior and the aggregated characteristics of the transaction behavior, and conducts fraud prediction of transaction nodes through two classifications.

The above-mentioned transaction fraud detection method based on graph neural network overcomes the traditional transaction fraud detection method, which ignores the relationship between the data itself and the transaction through the extraction of historical features of transaction behavior and the extraction of aggregated features of transaction behavior, and then through the full connection layer processing. Behavior is a flaw in time-series data, ensuring comprehensive transaction fraud detection and improving transaction fraud detection accuracy.

In the above embodiment, the transaction data preprocessing step is mainly used to collect and preprocess the transaction data required for transaction fraud detection, so that the preprocessed transaction sample set is in the form of a panel and has a connection between transaction nodes and transaction nodes. The samples constitute a dynamically changing transaction flow graph.

As an optional implementation, please refer to FIG. 2 , the transaction features at each time step are obtained through preprocessing, and the preprocessing includes the following sub-steps:

S110, acquiring local transaction node characteristics of the transaction data;

S120, acquiring transaction node summary characteristics of transaction data;

S130, acquiring transaction node sub-graph information of the transaction data.

In the above transaction data preprocessing step, not only the characteristics of the local transaction nodes, but also the summary characteristics of the transaction nodes and the sub-graph information of the transaction nodes are obtained. The result is more precise.

In this implementation, the Bitcoin real transaction data used is a transaction graph collected from the Bitcoin blockchain. The data description of the transaction graph is as follows: a node in the graph represents a transaction, and an edge can be seen as the flow of bitcoin between one transaction and another. It consists of 203769 nodes and 234355 edges. Among them, 2% of the nodes are marked as illegal nodes, 21% of the nodes are marked as legal transaction nodes, and the rest of the transactions are not marked.

In the transaction data, each transaction node is associated with time information, where the time information refers to the estimated time when the Bitcoin network confirms the transaction. Considering the time information in this embodiment, the time interval of about 2 weeks is divided into 49 different time steps, about two years of Bitcoin transaction data. At each time step, there is a connected component, and the time interval between their mutual transactions on the blockchain is less than 3 hours, and the transaction nodes that exist in other time steps will not be connected. side, the time interval here can be modified to other reasonable values. The various trading characteristics of each time step are explained in detail below.

The local transaction node feature in step S110 represents transaction data of the local transaction node, such as time step, input transaction number (node in-degree), output transaction number (node out-degree), transaction fee, output amount, and derivative statistics. Among them, the derived statistical features refer to some average features of neighboring nodes, such as the average BTC fee received by the number of input transactions, the average BTC fee received by the number of output transactions, the average BTC fee spent by the number of input transactions, and the average number of output transactions. BTC fees spent, average number of input/output transactions related to the number of input transactions (average number of input related transactions), average number of input/output transactions related to the number of output transactions (average number of output related transactions), etc.

The summary features of the transaction nodes in step S120 are obtained through the local transaction node features of the neighbor transaction nodes of the local transaction node forward and/or backward one-hop (one-hop), that is, all neighbor transaction nodes of the local transaction node are obtained. The characteristic data of the same local trading node obtained by step S100 are processed, and the descriptive statistical characteristics such as the maximum value, minimum value, median, mode, standard deviation, full distance and correlation coefficient among them are obtained as the transaction node. Summarize features.

Step S130 is to obtain the local topology information of a transaction node, which is obtained by calculating the spectral information of the graph of all transaction nodes radiating an appropriate number of layers with the local transaction node as the center. The transaction node is obtained from the spectral information of the graph composed of all transaction nodes radiating from the center to 2 layers, that is, the obtained Laplacian matrix L'=D'WD' eigenvalue is used as an additional feature - transaction node sub-graph information , which reflects the topological information of the graph in the frequency domain. If the eigenvalues are similar, it means that the subgraph topological structure where the transaction node is located is more similar.

In this embodiment, the node characteristics of the transaction graph are described as follows: the time step is 2 weeks, with a total of 49 steps. The first 93 node characteristics are the characteristics of local transaction nodes, which are the characteristics and transaction data of local transaction nodes, including time step, number of input transactions (node in-degree), number of output transactions (node out-degree), transaction fee, output amount and Derived statistical features. The last 72 node features are the aggregated features of the transaction nodes, using the maximum value of the same feature parameters (a local transaction node feature) obtained from the local (central) transaction node’s backward and/or backward neighbor transaction nodes , minimum, median, mode, standard deviation, range, and correlation coefficient.

After the transaction data preprocessing step of S100, Figure 3 and Figure 4 are drawn to observe the change curve of transaction characteristics over time after the transaction data preprocessing. Among them, Figure 3 is the time change curve of a certain local transaction node characteristics (such as transaction fees), Figure 4 is the summary characteristics of transaction nodes (such as the maximum value of the transaction fees of the local transaction node and its neighbor transaction nodes) time curve. The figure shows the change of three types of nodes over time on two different attributes (local transaction node characteristics and transaction node summary characteristics). It can be seen that these two attributes can better distinguish legal transaction nodes (Fig. The relatively stable curve in the middle and lower part) and the illegal transaction node (the curve in the upper part of the figure is more tortuous), in which the attribute curve of the legal transaction node is relatively stable over time at the bottom of the image, while the illegal transaction node is at the top of the image. The curve is steeper.

In the above embodiment, after the transaction data preprocessing step of step S100 is completed, when the number of known classified transaction samples is sufficient, the historical feature extraction of transaction behavior in step S300 can be directly performed. However, when the number of transaction samples known to be classified is small and it is impossible to accurately detect transaction fraud, it is necessary to further perform spectral clustering sample labeling, and label unlabeled transaction nodes to avoid the situation of too small samples.

As an optional implementation manner, before the step of extracting the historical features of the transaction behavior, the following steps are further included:

S200 , a spectral clustering sample labeling step, performing spectral clustering sample labeling processing on the transaction sample set to obtain a spectral clustering transaction sample set.

In the above transaction sample set, it consists of 203,769 nodes and 234,355 edges. Among them, 2% of the transaction nodes are marked as illegal transaction nodes, 21% of the transaction nodes are marked as legal transaction nodes, and the rest of the transaction nodes are not marked, that is, 77% of the transaction nodes are not marked. Since the classification of some samples of transaction data - transaction nodes is unknown, the present invention adopts the spectral clustering unsupervised method to classify these transaction nodes, and learns the labels of the unknown transaction nodes to increase the sample size and use them as available data for subsequent training. Optionally, due to the large number of samples to be learned, parallelized spectral clustering should be used.

In this embodiment, spectral clustering is used to label unlabeled nodes for samples. Spectral clustering can overcome the defect that K-means clustering is affected by data shape, and is a globally optimal clustering method. The main idea of spectral clustering is to regard the data as points in the n-dimensional space, as shown in Figure 5 and Figure 6, which are the illegal transaction graph formed by the transaction data marked as illegal transaction in the bitcoin transaction data and the A graph of legitimate transactions formed by transaction data marked as legitimate transactions. If there is a certain similarity between points, they are connected by edges, and the purpose of clustering is achieved by cutting the graph composed of the above points and dividing them into multiple subgraphs, that is, the sum of the weight values in the subgraphs is as high as possible. The sum of the weight values between the subgraphs is as low as possible; the implementation method is to connect the eigenvalue decomposition of the graph cut and the eigenvalue decomposition of the Laplacian matrix together through the Rayleigh entropy, so as to solve the NP-hard problem. Convert to continuous eigenvalues to solve the problem.

As an optional implementation manner, in the spectral clustering sample labeling step, the spectral clustering sample labeling process includes the following sub-steps:

Construct the spectral matrix of the transaction sample set;

Decompose the spectral matrix into eigenvalues;

Cluster the feature matrix.

Among them, other clustering methods can be selected according to the need for clustering the feature matrix, for example, K-Means clustering method can be selected.

Further, there can be different implementation methods for constructing the general matrix of the transaction sample set. For example, an optional spectral clustering processing method is shown in Table 1.

Table 1 Spectral clustering processing methods

Among them, in step 5), in the bitcoin transaction data, known classifications are legal and illegal, so the number of clusters k is set to k=2.

Please refer to FIG. 7 , the left side of the figure is the real distribution of the two classifications of the original data, and the right side is the spectral clustering sample labeling result processed by the spectral clustering algorithm of the present invention. Among them, the sample size is n=1000. It can be seen that the algorithm achieves better clustering results on spherical data, and the KNN-based spectral clustering algorithm is affected by the scale parameter and the number of neighbors. Here, 2 and 2 are taken by default. 5. As can be seen from the figure, the classification result of the present invention after the spectral clustering sample labeling after spectral clustering is very similar to the true distribution of the binary classification, indicating that the spectral clustering sample labeling accuracy of the present invention is very high, which can greatly improve the detection accuracy of results.

Further, considering the huge number of transaction nodes in reality, it can be done by parallelizing and re-aggregating each step. For example, parallel operations can be performed through the MapReduce programming model. In this case, the spectral clustering processing method is shown in Table 2. Show.

Table 2. Spectral clustering processing method of large-scale transaction data

Among them, in step 6), in the bitcoin transaction data, known classifications are legal and illegal, so the number of clusters k is set to k=2. In other embodiments, the k value may be set or calculated according to the specific situation of the classification.

During the spectral clustering process, the sample set data=(x ₁ , _x ₂ , . The distance here can be measured using Euclidean distance, shortest path or Jedi distance, preferably shortest path or Jedi distance. The Jedi distance means that there is only one shortest path from point A to point B (not allowed to leave the surface) on the surface (three-dimensional space), and the distance of this shortest path is the geodesic distance. The process of parallelizing the construction of the distance matrix is shown in the figure. As shown in 8, when all the above map() tasks are executed, new key-value pairs are generated; reduce() reduces the results of all partitions, that is, traverses and merges values from the same new key written by map(), and combines The values in each row are filled column by column, resulting in a complete distance matrix.

Next, after the symmetric distance matrix is sparsed in parallel by row, the Gaussian similarity is calculated to obtain the similarity matrix W. The degree matrix D'=D- ^1/2 is calculated in parallel by row, and the parallelized calculation L'=D'WD'. Since these types of matrices are sparse matrices, these calculations can be performed in parallel.

After the above parallelization implementation, the final sparse real symmetric matrix L' is obtained. The Lanczos method is suitable for iterative approximation to solve the eigenvalues and eigenvectors of such large sparse matrices. The idea is to convert the Laplace matrix into a real symmetric tridiagonal matrix by means of orthogonal similarity transformation.

The eigenvalues and eigenvectors obtained by decomposing _Tkk are the eigenvalues and eigenvectors of L'. If only the first k eigenvalues are calculated, the calculation can be completed with only k iterations, so it is more efficient. The number of clusters k is set to 2 (legal transactions and illegal transactions), and the _matrix composed of feature vectors h ₁ , _h ₂ , .

Finally, the parallelized K-means method is used to cluster H, and the cluster partition C(c ₁ , c ₂ ) is obtained, and the labeling of transaction nodes with unknown labels is completed.

In the above-mentioned embodiment, the step of extracting the historical features of the transaction behavior is to perform long-short-term memory network processing on the transaction sample set to obtain the historical features of the transaction behavior. That is, by learning the historical characteristics of the transaction behavior, the historical characteristics of the transaction behavior can be obtained.

In the transaction process of Bitcoin, there is a time for each transaction to be broadcast to the Bitcoin network, and the transaction history behavior during this period is enough to influence the prediction of whether the transaction is a legal transaction in the next step, that is, in this step, Selecting a historical feature sequence with a suitable time step will be enough to influence whether the transaction is a legal transaction in the next prediction step. Therefore, in this step, the time series of each transaction node is learned through the long short-term memory network (LSTM) to learn the historical behavior of the transaction node to refine the behavior history. feature.

LSTM is committed to solving the long-term dependency problem. It adds three gates on the basis of RNN, namely input gate, forget gate and output gate, to effectively filter historical information, and the final output h _t is composed of output gates o ^t and C _t Long-term cellular state storage body determination.

where h _t =LSTM(x _t )

The processing of LSTM is as follows:

f _t =σ(W _f ·[h _t-1 , x _t +b _f ])

i _t =σ(W _t ·[h _t-1 , x _t +b _t ])

o ^t =σ(W _o ·[h _t-1 , x _t +b _o ])

h _t =o ^t *tanh(C _t )

In this step, the transaction node time series data in the transaction sample set obtained in the transaction data preprocessing step or the transaction node time series data in the spectral cluster transaction sample set obtained in the spectral clustering sample labeling step are input into the LSTM neural network The output obtained after the layer is h ^t =LSTM(x ^t ), in this embodiment, the step size is set to 10 to learn the historical characteristics of trading behavior.

As an optional implementation manner, in the transaction behavior aggregation feature extraction step, the graph convolutional network processing includes the following sub-steps:

Obtain the adjacency matrix of the historical characteristics of transaction behavior;

The adjacency matrix is input into the graph convolutional network graph learning layer of layers 2 to 4 for feature propagation among neighbors, and nonlinear activation is performed on the outside after each layer.

Preferably, the number of layers is set to 2-4 layers, so as to avoid too many layers affecting the learning of local features of nodes, and what is learned is global features.

Preferably, the adjacency matrix is input into the 2-layer graph convolutional network graph learning layer for feature propagation among neighbors, and nonlinear activation is performed on the outside after each layer.

The calculation of the adjacency matrix of the historical feature of the transaction behavior can include two parts, the first part is whether there is an edge connection, and if so, it is set to 1; because it is a time series, the second part can be the similarity of each feature sequence. Finally, the weighted sum of the above two parts according to the weight is the similarity between a node and its neighbor nodes.

Further, the principle and process of the graph learning layer of the graph convolutional network are as follows:

Since Bitcoin transaction data constitutes a graph, the present invention mainly uses graph-based methods for fraud detection. Graph-based learning aims to train a prediction function

This function maps the feature space of an entity to the target label space. It is usually achieved by minimizing the objective loss function, which can be abstracted as I=Ω+λΦ.

Among them, Ω is the loss for a specific prediction task, which measures the error between the true value and the predicted value; Φ is the regularization term of the graph, which makes the prediction smooth on the graph; λ is a hyperparameter to balance the above ratio of the two. The regularization term usually implements the smoothness assumption of the graph signal, that is, similar vertices tend to have similar predictions, preserving the topological relationship of the graph. A widely used regularization term Φ is defined as follows, which is a measure weighting based on Euclidean distance, which belongs to the variation measure in the graph signal, and describes the overall smoothness. When g(x _i , x _j ) is 1 is the Euclidean distance:

where g(x _i , x _j ) is the similarity measure between feature vectors of entity pairs,

is the degree of vertex i. The regularizer smoothes each pair of entities so that their predictions (after normalization by degrees) are close to each other. The strength of the smoothing is determined by the similarity g(x _i , x _j ) of the feature vectors. This can be equivalently written in a more compact matrix form:

is the predicted value vector, L is the Laplacian matrix of the graph i.e.

A is the similarity matrix, and each element is g(x _i , x _j ).

Graph Convolutional Network (GCN) is a special graph-based learning method that has developed rapidly in recent years. It incorporates the core idea of graph-based learning, namely advanced convolutional neural networks (CNNs). The core idea of standard CNNs is to use convolutions (such as 3×3 filter matrices) to capture local patterns in the input data (such as oblique lines in images). According to the idea of CNNs, the goal of GCN is to capture the local connection patterns on the graph through convolution. However, an intuitive solution such as applying a convolution operation directly on the adjacency matrix of the graph is not feasible, because the filtered output of the convolution may change when two rows of adjacency matrices are swapped, while the swapped adjacency matrices still represent the same The graph structure is caused by the disorder of graph nodes. The present invention adopts two methods to solve this problem. One solution is to use the nearest neighbor algorithm (K-Nearest Neighbor, KNN) to process the neighbor nodes around the vertex and Sort to get the normalized output of the vertices, and then perform graph convolution network methods such as LGCL (Learn Graph Convolution Layer): the learnable graph convolution layer automatically selects a fixed number of neighbor nodes for each feature value-based sorting, so as to Transform the graph-structured data into regular one-dimensional mesh data, and then apply standard CNN operations on the one-dimensional mesh data; another solution is to use spectral convolution to capture local connections in the Fourier domain, transform the graph Fourier transform (Graph Fourier Transform, GFT) is used in graph transformation, such as: y=Tx=f(F, X)=UFU ^T X.

Among them, f represents the filtering operation T of parameterized convolution, and U is the matrix of characteristic column vectors of L. Starting from the right side of the formula, U ^T X represents the positive transformation of GFT, and X is projected onto each eigenvector to obtain the Fourier coefficient α (in the spectral domain); the next step is Fα, this step is scaling eigenvalue scaling, right The elements of the angle matrix F are the eigenvalues of L, and the higher the frequency, the larger the scaling coefficient λ, that is to say, L is a high-pass filter. The above vector obtained by scaling

Multiplying a U matrix to the left is an inverse transformation of GFT, which is equivalent to transforming the frequency domain information back to the time domain.

The above calculation involves the eigenvector decomposition of L, which has high complexity. Preferably, F is regarded as the equation of Λ, so that the k-order approximation of the Chebyshev polynomial T _k (x) can be used to represent F:

in

λ _max is the largest eigenvalue of L; θ _k represents the coefficients of the Chebyshev polynomial; T _k (x)=2xT _k-1 (x)-T _k-2 (x), and T ₁ (x)=x , T ₀ (x)=0. After such an approximation, N parameters are reduced to k parameters, and the complexity is O(1). There is no need to calculate eigenvalues and eigenvectors, but the interpretability in the frequency domain is lost. The following Chebyshev GCN approximation for K=1 is derived below:

Where X is the original vertex feature, the dimension is N*C; W is the parameter to be learned, the dimension is C*F, and F is the output feature dimension. Then the dimension of the output after a first-order graph convolution is N*F.

When the second approximate graph convolution (two-hop) is performed on the above output (one-hop), the mathematical expression is as follows:

in

In the above-mentioned embodiment, the prediction step is to perform full-connection layer processing on the historical characteristics of the transaction behavior and the aggregated characteristics of the transaction behavior, and perform fraud prediction of transaction nodes through two classifications.

Considering that traditional machine learning also has a good effect on fraud detection, it is further preferred that

The historical characteristics of transaction behavior, aggregation characteristics of transaction behavior, and the output results of traditional machine learning models are processed through the full connection layer and then classified into two categories to obtain the prediction of whether the final transaction node is an illegal transaction (that is, predicting that the label of the transaction node to be tested is a legal transaction). or illegal transactions).

Based on the same inventive concept, please refer to FIG. 9 and FIG. 10 , the present invention also proposes a transaction fraud detection system based on a graph neural network. The transaction fraud detection system based on the graph neural network includes the following modules:

The transaction data preprocessing module is used to obtain transaction data and preprocess the transaction data to obtain a panel-shaped transaction sample set;

The transaction behavior historical feature extraction module is used to perform long-short-term memory network processing on the transaction sample set to obtain the transaction behavior historical features;

The transaction behavior aggregation feature extraction module, the transaction behavior aggregation feature extraction module is used to perform graph convolution network processing on transaction historical behavior features to obtain transaction behavior aggregation features;

The prediction module is used to perform full connection processing on the historical characteristics of the transaction behavior and the aggregated characteristics of the transaction behavior, and conduct fraud prediction of transaction nodes through two classifications.

The above-mentioned transaction fraud detection system based on graph neural network overcomes the traditional transaction fraud detection method that ignores the relationship between data itself and transactions through the extraction of historical features of transaction behavior and aggregation of transaction behavior, and then through the full connection layer processing. Behavior is a flaw in time-series data, ensuring comprehensive transaction fraud detection and improving transaction fraud detection accuracy.

In the above embodiment, the transaction data preprocessing module is mainly used to collect and preprocess the transaction data required for transaction fraud detection, so that the preprocessed transaction sample set is in the form of a panel and has a relationship between transaction nodes and transaction nodes. The samples constitute a dynamically changing transaction flow graph.

As an optional implementation manner, the transaction features at each time step are obtained through preprocessing, and the preprocessing includes the following sub-steps:

Get the local transaction node characteristics of transaction data;

Obtain transaction node summary characteristics of transaction data;

Get transaction node subgraph information of transaction data.

In the above transaction data preprocessing module, not only the characteristics of the local transaction nodes, but also the summary characteristics of the transaction nodes and the sub-graph information of the transaction nodes are obtained. The result is more precise.

In this implementation, the Bitcoin real transaction data used is a transaction graph collected from the Bitcoin blockchain. The data description of the transaction graph is as follows: a node in the graph represents a transaction, and an edge can be seen as the flow of bitcoin between one transaction and another. It consists of 203769 nodes and 234355 edges. Among them, 2% of the nodes are marked as illegal nodes, 21% of the nodes are marked as legitimate transaction nodes, and the rest of the transactions are not marked.

In transaction data, each transaction node is associated with time information, where time information refers to the estimated time when the Bitcoin network confirms the transaction. Considering the time information in this embodiment, the time interval of about 2 weeks is divided into 49 different time steps, about two years of Bitcoin transaction data. At each time step, there is a connected component, and the time interval between the mutual transactions between them appears on the blockchain is less than 3 hours, and the transaction nodes that exist in other time steps will not be connected. side, the time interval here can be modified to other reasonable values. The various trading characteristics of each time step are explained in detail below.

The above-mentioned local transaction node characteristics represent transaction data of the local transaction node, such as time step, the number of input transactions (node in-degree), the number of output transactions (node out-degree), transaction fees, output volume, and derivative statistics. Among them, the derived statistical features refer to some average features of neighboring nodes, such as the average BTC fee received by the number of input transactions, the average BTC fee received by the number of output transactions, the average BTC fee spent by the number of input transactions, and the average number of output transactions. BTC fees spent, average number of input/output transactions related to the number of input transactions (average number of input related transactions), average number of input/output transactions related to the number of output transactions (average number of output related transactions), etc.

The summary features of the above-mentioned transaction nodes are obtained through the local transaction node features of the neighbor transaction nodes of the local transaction node forward and/or one-hop backward (one-hop), that is, the passing steps of all neighbor transaction nodes of the local transaction node. The characteristic data of the same local trading node obtained by S100 is processed, and the descriptive statistical characteristics such as the maximum value, minimum value, median, mode, standard deviation, range and correlation coefficient among them are obtained as the summary characteristics of the trading node.

The transaction node sub-graph information of the above transaction data is to obtain the local topology information of a transaction node. In the example, it is obtained by calculating the spectral information of the graph of all transaction nodes radiating outward from the local transaction node as the center, that is, the obtained Laplacian matrix L'=D'WL' eigenvalue is used as an additional feature. ——The sub-graph information of the transaction node, which reflects the topology information of the graph in the frequency domain. If the eigenvalues are similar, it means that the sub-graph topological structure where the transaction node is located is more similar.

After the transaction data preprocessing step of S100, Figure 3 and Figure 4 are drawn to observe the change curve of transaction characteristics over time after the transaction data preprocessing. Among them, Figure 3 is the time change curve of a certain local transaction node characteristics (such as transaction fees), Figure 4 is the summary characteristics of transaction nodes (such as the maximum value of the transaction fees of the local transaction node and its neighbor transaction nodes) time curve.

The figure below shows the changes of three types of nodes over time on two different attributes (local transaction node characteristics and transaction node summary characteristics). It can be seen that these two attributes can better distinguish legal transaction nodes ( The lower part of the figure is a relatively stable curve) and the illegal transaction node (the upper part of the figure is more tortuous curve), in which the attribute curve of the legal transaction node is relatively stable over time at the bottom of the image, while the illegal transaction node at the top of the image changes with time. The change curve is steeper.

In the above embodiment, after the preprocessing of transaction data is performed, when the number of transaction samples known to be classified is sufficient, the module for extracting historical features of transaction behavior can be directly executed. However, when the number of transaction samples known to be classified is small and it is impossible to accurately detect transaction fraud, it is necessary to further execute the spectral clustering sample labeling module to label unlabeled transaction nodes to avoid excessive sample size. Condition.

As an optional embodiment, the graph neural network-based transaction fraud detection system further includes a spectral clustering sample labeling module, which is configured to perform spectral clustering sample labeling processing on the transaction sample set to obtain a spectral clustering transaction sample set.

As an optional implementation manner, in the spectral clustering sample labeling module, the spectral clustering sample labeling process includes the following sub-steps:

Construct the spectral matrix of the transaction sample set;

Decompose the spectral matrix into eigenvalues;

Cluster the feature matrix.

Table 1 Spectral clustering processing methods

Table 2. Spectral clustering processing method of large-scale transaction data

In the spectral clustering process, first, input the sample set data=(x ₁ , _x ₂ , . The distance here can be measured using Euclidean distance, shortest path or Jedi distance, preferably shortest path or Jedi distance. The Jedi distance means that there is only one shortest path from point A to point B (not allowed to leave the surface) on the surface (three-dimensional space), and the distance of this shortest path is the geodesic distance. The process of parallelizing the construction of the distance matrix is shown in the figure. As shown in 8, when all the above map() tasks are executed, new key-value pairs are generated; reduce() reduces the results of all partitions, that is, traverses and merges values from the same new key written by map(), and combines The values in each row are filled column by column, resulting in a complete distance matrix.

The eigenvalues and eigenvectors obtained by decomposing _Tkk are the eigenvalues and eigenvectors of L'. If only the first k eigenvalues are calculated, the calculation can be completed with only k iterations, so it is more efficient. The number of clusters k is set to 2 (legal transactions and illegal transactions), and the _matrix composed of the eigenvectors h ₁ , _h ₂ , .

In the above embodiment, the transaction behavior history feature extraction module is used to perform long short-term memory network processing on the transaction sample set to obtain transaction behavior history features. That is, by learning the historical characteristics of the transaction behavior, the historical characteristics of the transaction behavior can be obtained.

In the transaction process of Bitcoin, there is a time for each transaction to be broadcast to the Bitcoin network, and the transaction history behavior during this period is enough to influence the prediction of whether the transaction is a legal transaction in the next step, that is, in this step, Selecting a historical feature sequence with an appropriate time step will be enough to affect whether the transaction is a legal transaction in the next prediction step. Therefore, in this step, the time series of each transaction node is learned through the long short-term memory network (LSTM) to learn the historical behavior of transaction nodes to refine the behavior history. feature.

where h _t =LSTM(x _t )

The processing of LSTM is as follows:

f _t =σ(W _f ·[h _t-1 , x _t +b _f ])

i _t =σ(W _t ·[h _t-1 , x _t +b _t ])

o ^t =σ(W _o ·[h _t-1 , x _t +b _o ])

h _t =o ^t *tanh(C _t )

As an optional embodiment, in the transaction behavior aggregation feature extraction module, the graph convolutional network processing includes the following substeps:

is the predicted value vector, L is the Laplacian matrix of the graph i.e.

A is the similarity matrix, and each element is g(x _i , x _j ).

Graph Convolutional Network (GCN) is a special graph-based learning method that has developed rapidly in recent years. It incorporates the core idea of graph-based learning, namely advanced convolutional neural networks (CNNs). The core idea of standard CNNs is to use convolutions (such as 3×3 filter matrices) to capture local patterns in the input data (such as oblique lines in images). According to the idea of CNNs, the goal of GCN is to capture the local connection patterns on the graph through convolution. However, an intuitive solution such as applying a convolution operation directly on the adjacency matrix of the graph is not feasible, because the filtered output of the convolution may change when two rows of adjacency matrices are swapped, while the swapped adjacency matrices still represent the same The graph structure is caused by the disorder of graph nodes. The present invention adopts two methods to solve this problem. One solution is to use the nearest neighbor algorithm (K-Nearest Neighbor, KNN) to process the neighbor nodes around the vertex and Sort to get the normalized output of the vertices, and then perform graph convolution network methods such as LGCL (Learn Graph Convolution Layer): the learnable graph convolution layer automatically selects a fixed number of neighbor nodes for each feature value-based sorting, so as to Transform the graph-structured data into regular one-dimensional mesh data, and then apply standard CNN operations on the one-dimensional mesh data; another solution is to use spectral convolution to capture local connections in the Fourier domain, convert the graph Fourier transform (Graph Fourier Transform, GFT) is used in graph transformation, such as: y=Tx=f(F, X)=UFU ^T X.

in

in

In the above embodiment, the prediction module is used to perform full-connection layer processing on historical features of transaction behavior and aggregated features of transaction behaviors, and perform fraud prediction of transaction nodes through binary classification.

Considering that traditional machine learning also has a good effect on fraud detection, it is further preferred to process the historical characteristics of transaction behaviors, aggregated characteristics of transaction behaviors and the output results of traditional machine learning models through the full connection layer and then perform binary classification to obtain the final result. Predict whether the transaction node is an illegal transaction (that is, predict whether the label of the transaction node to be tested is a legal transaction or an illegal transaction).

For the description of the disclosed above-mentioned embodiments, in order to facilitate those skilled in the art to be able to realize or use the present invention, it is obvious to those skilled in the art to modify or extend the application of the above-mentioned embodiments to other e-commerce platforms. , the general principles defined in this invention may be applied in other embodiments without departing from the spirit or scope of this invention. Therefore, the present invention will not be limited to the above-described embodiments, but will conform to the widest scope consistent with the technical principles and novel features disclosed in the present invention.

Claims

A method for detecting transaction fraud based on a graph neural network, characterized in that the method for detecting transaction fraud based on a graph neural network comprises the following steps:

The transaction data preprocessing step is to obtain transaction data and preprocess the transaction data to obtain a panel-shaped transaction sample set;

The step of extracting the historical features of the transaction behavior, performing long-short-term memory network processing on the transaction sample set to obtain the historical features of the transaction behavior;

The step of extracting transaction behavior aggregation features is to perform graph convolution network processing on the transaction historical behavior features to obtain transaction behavior aggregation features;

In the prediction step, the historical characteristics of the transaction behavior and the aggregated characteristics of the transaction behavior are fully connected, and the fraud prediction of the transaction node is performed through two classifications.
The method for detecting transaction fraud based on a graph neural network according to claim 1, wherein, in the transaction data preprocessing step, the preprocessing includes the following sub-steps:

obtaining the local transaction node characteristics of the transaction data;

Obtain the transaction node summary characteristics of the transaction data;

Obtain the transaction node sub-graph information of the transaction data.
The method for detecting transaction fraud based on a graph neural network according to claim 1, characterized in that, before the step of extracting historical features of transaction behavior, it further comprises the following steps:

The spectral clustering sample labeling step is to perform spectral clustering sample labeling processing on the transaction sample set to obtain a spectral clustering transaction sample set.
The method for detecting transaction fraud based on a graph neural network according to claim 3, wherein, in the step of labeling samples of spectral clustering, the processing of labeling samples of spectral clustering comprises the following sub-steps:

Construct the spectral matrix of the transaction sample set;

eigenvalue decomposition of the spectral matrix into an eigenmatrix;

The feature matrix is clustered.
The method for detecting transaction fraud based on a graph neural network according to any one of claims 1 to 4, characterized in that, in the step of extracting transaction behavior aggregation features, the graph convolutional network processing includes the following sub-steps:

obtaining an adjacency matrix of the historical characteristics of the transaction behavior;

The adjacency matrix is input into the graph convolutional network graph learning layers of layers 2 to 4 for feature propagation among neighbors, and nonlinear activation is performed on the outside after each layer.
A transaction fraud detection system based on graph neural network is characterized in that, described transaction fraud detection system based on graph neural network comprises the following modules:

a transaction data preprocessing module, which is used for acquiring transaction data and preprocessing the transaction data to obtain a panel-shaped transaction sample set;

a transaction behavior historical feature extraction module, which is used to perform long-short-term memory network processing on the transaction sample set to obtain transaction behavior historical features;

a transaction behavior aggregation feature extraction module, the transaction behavior aggregation feature extraction module is configured to perform graph convolution network processing on the transaction historical behavior features to obtain transaction behavior aggregation features;

A prediction module, which is configured to perform full connection processing on the historical characteristics of the transaction behavior and the aggregated characteristics of the transaction behavior, and perform fraud prediction of transaction nodes through binary classification.
The transaction fraud detection system based on a graph neural network according to claim 6, wherein, in the transaction data preprocessing module, the preprocessing comprises the following sub-steps:

obtaining the local transaction node characteristics of the transaction data;

Obtain the transaction node summary characteristics of the transaction data;

Obtain the transaction node sub-graph information of the transaction data.
The transaction fraud detection system based on the graph neural network according to claim 6, wherein the transaction fraud detection system based on the graph neural network further comprises a spectral clustering sample labeling module, and the spectral clustering sample labeling module uses Perform spectral clustering sample labeling processing on the transaction sample set to obtain a spectral clustering transaction sample set.
The transaction fraud detection system based on a graph neural network according to claim 8, wherein, in the spectral clustering sample labeling module, the spectral clustering sample labeling process comprises the following sub-steps:

Construct the spectral matrix of the transaction sample set;

eigenvalue decomposition of the spectral matrix into an eigenmatrix;

The feature matrix is clustered.
The transaction fraud detection system based on a graph neural network according to any one of claims 6 to 9, wherein, in the transaction behavior aggregation feature extraction module, the graph convolutional network processing includes the following sub-steps:

obtaining an adjacency matrix of the historical characteristics of the transaction behavior;

The adjacency matrix is input into the graph convolutional network graph learning layers of layers 2 to 4 for feature propagation among neighbors, and nonlinear activation is performed on the outside after each layer.