CN115438751A

CN115438751A - Block chain phishing fraud identification method based on graph neural network

Info

Publication number: CN115438751A
Application number: CN202211275203.2A
Authority: CN
Inventors: 卞静; 卓绍烜; 李焱
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2022-10-18
Filing date: 2022-10-18
Publication date: 2022-12-06

Abstract

The invention provides a block chain phishing fraud identification method based on a graph neural network, which comprises the steps of preprocessing transaction data and processing the transaction data into a transaction network graph; clustering the transaction network graph to obtain a global view graph; sampling the transaction network graph to obtain a local visual angle graph; constructing and training a neural network of the graph; inputting the global view map into a trained map neural network to obtain node embedding of a global transaction view; the node embedding comprises the structure and the side view information of a transaction network; inputting the local view angle diagram into a trained graph neural network to obtain node embedding of a local transaction view angle; and splicing the node embedding of the global transaction view and the node embedding of the local transaction view, and inputting the spliced nodes into the multilayer perceptron together to realize the classified identification of the phishing addresses. The invention improves the recognition performance of the etherhouse phishing recognition by mining more effective information from a transaction network based on the data mining from multiple transaction perspectives.

Description

Block chain phishing fraud identification method based on graph neural network

Technical Field

The invention relates to the technical field of block chain transaction phishing address identification, in particular to a block chain phishing fraud identification method based on a graph neural network.

Background

The encryption currencies of bitcoin, etheng and the like which are popular at present use a block chain technology as a key support technology. The blockchain is a novel distributed account book technology, and realizes the trusted transaction of a distrust intermediary under the environment of mutual distrust. Compared with the traditional database technology, the block chain has the characteristics of counterfeiting prevention, falsification prevention, intelligent contract realization and the like, and is praised as a technology which can cause social change. For realizing credible transaction in a distributed environment, the block chain technology largely uses the cryptography technology to hide user information, and simultaneously all transaction information is verified and stored by a distributed network together. Various public chains, such as bitcoin, ether house, etc., have acquired a large amount of user participation, accumulating a large amount of transaction data. The participation of a large number of users and active user transactions make blockchain-based data analysis an important and valuable research problem.

With the development of the block chain technology, the block chain technology is introduced as a bottom layer technology in various industries, and a large amount of data exists in the form of block chain data, so that the research on the data analysis problem based on the block chain has important theoretical and practical significance. The payment system based on the block chain implementation has anonymity and decentralization, while lawless persons use anonymity to carry out fraudulent activities, and the cheating flooding in the block chain system can block the acceptance and use of the block chain technology by users, thereby blocking the progress of the technology. Therefore, identifying abnormal behavior of users of a blockchain transaction network has become an urgent and critical issue in blockchain ecosystems. Transaction phishing fraud is a novel cyber crime which is raised along with the development of a block chain, phishing fraud acts on the way by utilizing the anonymous characteristic of the block chain, and the illegal acts become more rampant increasingly due to relatively lagged legal measures and still-developing data analysis means.

Existing phishing fraud identification methods can be broadly classified into two categories.

1) According to the characteristic engineering-based method, the topological characteristics of the blockchain transaction network are manually analyzed, and the statistical characteristics are constructed to be used as the input of a machine learning classifier for phishing fraud identification. 219 dimensional features based on first and second order wallet nodes are extracted from a transaction graph constructed based on transaction records, and are used as input of a LightGBM classifier to identify phishing transaction addresses.

2) Based on a random walk diagram representing learning method, a Deepwalk, node2vec algorithm is designed to acquire structural information of a diagram. node2vec to obtain the nodes represented by the etherhouse transaction and classify them with a single class SVM. trans2vec is extended by introducing biased trade samples. However, these outer encoders based on random walk algorithms cannot utilize the characteristic information of the nodes, thereby limiting their performance.

3) Based on the graph neural network method, the E-GCN is designed based on a graph automatic encoder. EdgeProp uses information of transaction edge characteristics. The MCGC uses a plurality of feature extraction channels of the graph upgrade network to extract features of a transaction pattern of the target address. TTAGN uses temporal edge representation and edge2node modules to efficiently identify phishing fraud. However, the existing graph neural network-based method only uses the node features, cannot fully utilize the structural information of the graph, and the edge features in the blockchain transaction of the edge features in the blockchain transaction are not fully utilized.

Disclosure of Invention

In order to overcome the problem that the structural information and the side information of a transaction network cannot be fully utilized by using a heuristic shallow embedding and graph neural network identification technology in the conventional abnormal transaction node identification, the invention provides a block chain phishing fraud identification method based on a graph neural network, which is used for improving the identification performance of the phishing fraud identification from the transaction network based on more effective information in the data mining from multiple transaction perspectives.

In order to solve the technical problems, the technical scheme of the invention is as follows:

a method of block chain phishing fraud identification based on a graph neural network, the method comprising the steps of:

preprocessing the transaction data, and processing the transaction data into a transaction network graph;

clustering the transaction network graph to obtain a global view angle graph;

sampling the transaction network graph to obtain a local visual angle graph;

constructing and training a multi-transaction visual angle attention-seeking neural network;

inputting the global view map into a trained multi-transaction view attention map neural network to obtain node embedding of the global transaction view; the node embedding comprises the structure and the side view information of a transaction network;

inputting the local view map into a trained multi-transaction view attention map neural network to obtain node embedding of the local transaction view;

and splicing the node embedding of the global transaction view and the node embedding of the local transaction view, and inputting the spliced nodes into the multilayer perceptron together to realize the classification and identification of the phishing addresses.

Preferably, the transaction network graph is clustered through a clustering function to obtain a global view graph, and the expression of the global view graph is as follows:

where p represents a graph clustering function, c represents the number of clusters,

representing a transaction network graph;

represents the ith sub-graph generated after clustering,

To represent

A set of middle nodes,

To represent

A set of medium edges.

Preferably, the transaction network graph is sampled by a neighbor sampling function to obtain a local view angle graph:

wherein,

and K-hop represents K-order neighbors to represent the node i, j represents the node in the graph, and K-hop represents K-hop, i.e. K-order neighbors are found.

Preferably, the multi-transaction perspective attention map neural network acquires the side perspective coefficient information by capturing the transaction network through the side features and the attention coefficients, and acquires the structural information of the transaction map by aggregating the features of the address nodes.

Further, the multi-transaction perspective attention-seeking neural network is composed of a plurality of MTvConv blocks; the input of the MTvConv block is the input characteristic of a group of nodes

Where N represents the number of nodes, F represents the dimension of the input MTvGAT feature in each node,

An input feature representing an ith dimension;

embedding of output nodes of each MTvConv block in multi-transaction visual-angle attention-seeking neural network after training

Wherein F' represents the dimension of output embedding,

An input feature representing an ith dimension;

the calculation formula of the multi-transaction perspective attention-seeking neural network MTvGAT is as follows:

wherein,

representing the incoming transaction network graph, a representing the adjacency matrix of the incoming transaction network graph, and z representing the embedding of the target nodes learned from the last layer of MTvGAT, MTvConv blocks.

Still further, the attention coefficient α is _ij The calculation formula of (a) is as follows:

wherein,

representing a learnable weight matrix, and converting the input features into high-latitude features;

representing a shared attention mechanism, | | | represents a feature stitching operation.

Still further, the edge view angle coefficient δ _i,j Is formed by combining edge characteristics and attention coefficients in a splicing way:

δ _i,j ＝(e _i,j ||α _i,j )

each MTvConv block takes as input the node and edge characteristics; is expressed by the information forward propagation mechanism as follows:

wherein phi and

the sensor is a multilayer sensor, and output node embedding is calculated by splicing input; the number of the combinations of the plurality of aggregators and the combination of the scalers is indicated as |; the aggregator aggregates information from neighbors, and the scaler scales the aggregated information differently;

the information forward propagation mechanism aggregates the information of neighbor nodes in the node of the multi-transaction visual-angle attention-seeking neural network to generate a new feature vector, namely

l denotes the l-th layer neural network.

And further, training the multi-transaction perspective attention-seeking neural network by adopting neural network back propagation.

Furthermore, a loss function is adopted in the training process to measure the similarity between the input transaction network graph and the target node; the reconstructed loss function is defined as follows:

wherein,

l2-norm representing a vector, and σ represents a sigmoid function,

n；

And optimizing by minimizing reconstruction loss in each training iteration, and further outputting the structure and side view angle information of the node learning trading network.

And further, splicing node embedding of a global transaction view and node embedding of a local transaction view, and inputting the spliced node embedding and node embedding of a local transaction view into a multilayer sensor together to realize classification and identification of the phishing address, wherein a formula expression is as follows:

where, | | denotes a splicing operation, P _n Representing the probability that a node is a phishing address;

representing the global embedded characteristics of the node,

Representing node local embedding features.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

the large-scale blockchain trading network is processed into a global view map and a local view map, the global view map and the local view map are used as the input of a multi-trading view attention map neural network to mine multi-level Ethernet shop trading network information, and node embedding containing structure and side information is output.

The invention improves the recognition performance of the ether phishing fraud by mining more effective block chain transaction network information through the multi-transaction visual-angle attention-seeking neural network.

The invention combines the edge features and the edge view coefficients to fuse the topological structure and the transaction information and generate the final node embedding.

Drawings

FIG. 1 is a flowchart of a method of block chain phishing recognition based on a graph neural network of the present invention.

FIG. 2 is a system block diagram of block chain phishing recognition based on a graph neural network of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and are used for illustration only, and should not be construed as limiting the patent. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.

The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.

Example 1

As shown in FIG. 1, a method of block chain phishing fraud identification based on graph neural network, said method comprising the steps of:

clustering the transaction network graph to obtain a global view graph;

sampling the transaction network graph to obtain a local visual angle graph;

constructing and training a multi-transaction visual angle attention diagram neural network;

inputting the global perspective diagram into a trained multi-transaction perspective attention diagram neural network to obtain node embedding of the global transaction perspective; the node embedding comprises the structure and the side view information of a transaction network;

The embodiment processes the transaction data into the form of graph structure

Wherein,

representing a transaction graph

The epsilon represents a transaction graph

The transaction relationship that occurs between the transaction account nodes in (1).

For the characteristics of a single transaction account node, each transaction account node i has a corresponding characteristic x _i Using matrix X _N*D And expressing, wherein N is the number of transaction account nodes, D is the characteristic number of each transaction account node, wherein the initial characteristics of the transaction nodes comprise the degree of entry of the nodes, the degree of exit of the nodes, the transaction quantity related to the nodes, the Ethernet currency value of all incomes of the nodes, the Ethernet currency value of all expenses of the nodes, the sum of all incomes of the nodes and the Ethernet currency value of all expenses of the nodes, the quantity of neighbor nodes and the reciprocal of transaction frequency.

Features e for each transaction edge j _j Using a matrix E _N′*D′ And representing, wherein N 'is the number of trading edges, and D' is the characteristic number of each trading edge.

Features for the entire transaction network graph (including the amount of money on the transaction edge, the timestamp of the transaction occurrence). The information of the whole trading network graph structure is represented by an adjacency matrix a. Since the relationship between neighboring nodes in the trading network graph should be undirected, the original directed graph needs to be converted into an undirected graph.

In a specific embodiment, the transaction network graph is clustered through a clustering function to obtain a global view graph, and the expression of the global view graph is as follows:

representing a transaction network graph;

represents the ith sub-graph generated after clustering,

To represent

A set of middle nodes,

To represent

The set of middle edges, parameter c, determines the degree of computational complexity.

In a specific embodiment, the transaction network graph is sampled by a neighbor sampling function to obtain a local view graph:

wherein,

The choice of the parameter K should take into account both the computational complexity and the integrity of the node neighbor structure. In this embodiment, all first-order neighbor nodes are selected, and one node is selected from all first-order neighbor nodes.

The multi-transaction view attention-seeking neural network is obtained by performing an improved attention mechanism based on a GAT (goal-oriented architecture) of the neural network; the multi-transaction visual angle attention-oriented neural network MTvGAT has the function of mapping the characteristics of input nodes into node embedding with richer information, and the node embedding can be used as the input of a multi-layer perceptron to classify and identify phishing nodes.

In a specific embodiment, the multi-transaction perspective attention map neural network acquires the side perspective coefficient information by capturing the transaction network through the side features and the attention coefficients, and acquires the structural information of the transaction map by aggregating the features of the address nodes.

Further, the multi-transaction perspective attention-seeking neural network MTvGAT is composed of a plurality of MTvConv blocks;

the input of the MTvConv block is the input characteristic of a group of nodes

An input feature representing an ith dimension;

embedding of graph neural network MTvGAT, each MTvConv block output node after training

Wherein F' represents the dimension of output embedding,

Input features representing an ith dimension;

in this embodiment, multiple MTvConv blocks are connected to construct a complete MTvGAT network structure. The calculation formula of the multi-transaction perspective attention-seeking neural network MTvGAT is as follows:

wherein,

representing the incoming transaction network graph, a representing the adjacency matrix of the incoming transaction network graph, and z representing the embedding of the target node learned from the last layer of MTvGAT, MTvConv.

In a specific embodiment, the attention factor α is _ij The calculation formula of (a) is as follows:

wherein,

indicating a shared attention mechanism, | | | denotes the feature stitching operation.

In one embodiment, the edge view factor δ is _i,j Is formed by combining edge characteristics and attention coefficients in a splicing way:

δ _i,j ＝(e _i,j ||α _i,j )

each MTvConv block takes as input the node and edge characteristics; the information forward propagation mechanism is expressed as follows:

wherein phi and

the sensor is a multilayer sensor, and output node embedding is calculated through splicing input; as indicates a combination of a plurality of aggregators and a combination of scalers; the aggregator aggregates data from neighborsThe scaler performs different scaling on the aggregated information;

l denotes the l-th layer neural network.

In one embodiment, neural network back propagation is used to train a multi-transactional perspective attention-seeking neural network.

In a specific embodiment, a loss function is adopted in the training process to measure the similarity between an input transaction network graph and a target node; the reconstructed loss function is defined as follows:

wherein,

l2-norm representing a vector, σ represents a sigmoid function,

n；

In a specific embodiment, the node embedding of the global transaction view and the node embedding of the local transaction view are spliced and input into the multilayer perceptron together to realize the classification and identification of the phishing address, and the formula expression is as follows:

representing the global embedded characteristics of the node,

Representing node local embedding features.

Example 2

Based on the method for identifying blockchain phishing fraud based on the graph neural network described in embodiment 1, the embodiment is directed to a specific implementation case of the phishing node of the blockchain data etherhouse platform.

Constructing a graph data structure of the Ethernet shop transaction data by using a Python-based DGL library through the Ethernet shop transaction data acquired by XBock and the tags of the phishing node addresses of Etherscan. in-degree of the in-degree node, out-degree of the out-degree node, number of transactions associated with the Total _ Tx node, etherkey value of all incomes of the in-value node, etherkey value of all payouts of the out-value node, sum of all incomes and payouts of the Total-value node, min _ TS: minimum timestamp in node related transactions, max _ TS: maximum timestamp in node related transactions, num _ neighbor: number of all neighbors of the node, tx _ Freq: the frequency of transactions occurring in the node. The initial characteristics of the edge are: the amount of money on the side of the account transaction, and the timestamp of the occurrence of the timestamp transaction.

Definition of etherhouse trading graph:

wherein,

as a transaction diagram

The epsilon is a transaction graph

The transaction relationship that occurs between the transaction account nodes in (1). For the characteristics of a single transaction account node, each transaction account node i has its characteristic x _i Using matrix X _N*D Wherein N is the number of transaction account nodes, D is the characteristic number of each transaction account node, wherein the initial characteristics of the transaction nodes (including the degree of entry of the node, the degree of exit of the node, the transaction quantity related to the node, the Ethernet currency value of all incomes of the node, the Ethernet currency value of all expenses of the node, the sum of all incomes of the node and the Ethernet currency value of expenses, the quantity of neighbor nodes and the reciprocal of transaction frequency) are represented. Feature e for each transaction edge j _j Using a matrix E _N′*D′ And representing, wherein N 'is the number of trading edges, and D' is the feature number of each trading edge, wherein the trading edges are initial features. Characteristics for the entire transaction graph (amount of transaction edge, timestamp of transaction occurrence). The information of the whole trading graph structure is represented by the adjacency matrix a. Map the transaction

Learning the feature representation of the graph as input, and outputting as Z at the node level _N*F Where F is the number of features output by each node.

The pseudo code of the embodiment for inputting the transaction diagram into the multi-transaction perspective attention diagram neural network is as follows:

expressions (1) and (2) in the pseudo code represent that the embedding feature of the node v, the embedding feature of the neighbor node u and the edge view angle coefficient between the two nodes are spliced to obtain the embedding feature of each neighbor node, and then the embedding feature of the neighbor node v is spliced with the kth-1 layer feature of the target node v to obtain the kth layer embedding feature of the v node.

Equations (3) and (4) in the pseudo code are different from equations (1) and (2) in that the node u is selected from nodes in the global cluster map, and the global embedding characteristics of the time v node are acquired.

The sampling function in the pseudo code is used for randomly sampling a certain number of target nodes, learning embedded characteristics for the target nodes and then carrying out anomaly detection.

To further verify the effect of the method for block chain phishing fraud identification based on graph neural network described in the present embodiment, the experimental result data is as follows:

it can be seen from the experimental results that the graphical neural network model achieves better performance than the deep walk and feature-only model. In the graph neural network model, GAT is the worst, achieving similar performance to DeepWalk. This illustrates that relying solely on structural information and node characteristics does not significantly improve overall performance. The MTvGAT achieves the best performance on all metrics, which demonstrates the effectiveness of exploiting the multi-transaction perspective structural features and edge features.

Analyzing data using multiple trading perspectives may result in more comprehensive structural information and relationships between nodes in a large-scale ethernet trading network. In the local view dataset, GCN achieves the highest accuracy, while MTvGAT achieves the highest performance in other metrics. The reason is that MTvGAT can obtain more information through edge features in local views. From the results of the global perspective dataset, graphSAGE showed higher performance because the model was proposed with the goal of inductive token learning on large graphs. On the multi-transaction perspective data set, the MTvGAT achieves the highest performance on all indexes and is separated from other models on all indexes. The validity of our proposed method in identifying phishing address nodes is thus also verified.

Example 3

The present embodiment is based on the method for block chain phishing fraud identification based on graph neural network described in embodiment 1, and further provides a system for block chain phishing fraud identification based on graph neural network, as shown in fig. 2, the system comprises a multimode input module, a multi-transaction perspective attention-seeking neural network module, and a phishing fraud detection module;

the multi-mode input module is used for preprocessing transaction data and processing the transaction data into a transaction network diagram; the system comprises a transaction network graph, a global view angle graph and a cluster processing module, wherein the transaction network graph is used for clustering the transaction network graph to obtain the global view angle graph; the system comprises a transaction network graph, a local visual angle graph and a local visual angle graph, wherein the transaction network graph is used for sampling and processing the transaction network graph to obtain the local visual angle graph;

the multi-transaction perspective attention-seeking neural network is used for constructing and training the multi-transaction perspective attention-seeking neural network; and respectively calculating the global perspective diagram and the local perspective diagram of the input trained multi-transaction perspective attention diagram neural network to obtain node embedding of the global transaction perspective and node embedding of the local transaction perspective.

The phishing fraud detection module is used for splicing node embedding of a global transaction view angle and node embedding of a local transaction view angle and then inputting the spliced node embedding and the spliced node embedding into the multilayer sensor together so as to realize classification and identification of the phishing addresses.

It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. This need not be, nor should it be exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. A method of block chain phishing fraud identification based on graph neural network, characterized by: the method comprises the following steps:

preprocessing the transaction data, and processing the transaction data into a transaction network diagram;

clustering the transaction network graph to obtain a global view graph;

sampling the transaction network graph to obtain a local visual angle graph;

2. The method of graph neural network-based block chain phishing fraud identification of claim 1, characterized in that: clustering the trading network graph through a clustering function to obtain a global view angle graph, wherein the expression of the global view angle graph is as follows:

representing a transaction network graph;

represents the ith sub-graph generated after clustering,

Represent

A set of middle nodes,

To represent

A set of medium edges.

3. The method of graph neural network-based block chain phishing fraud identification of claim 1, characterized in that: sampling the transaction network graph through a neighbor sampling function to obtain a local view angle graph:

wherein,

and the node represents a K-order neighbor node of the node i, j represents a node in the graph, and K-hop represents K hop, namely K-order neighbor searching.

4. The method of graph neural network-based block chain phishing fraud identification of claim 1, characterized in that: the multi-transaction visual angle attention map neural network acquires the visual angle coefficient information by capturing the transaction network through the edge characteristics and the attention coefficients, and acquires the structure information of the transaction map through the characteristics of the aggregation address nodes.

5. The method of graph neural network-based block chain phishing fraud identification of claim 4, characterized in that: the multi-transaction perspective attention-seeking neural network is composed of a plurality of MTvConv blocks;

the input of the MTvConv block is the input characteristic of a group of nodes

Where N represents the number of nodes, F represents the dimension of the input features in each node,

Input features representing an ith dimension;

Where F' represents the dimension of the output embedding,

An embedded feature representing the ith dimension;

wherein,

6. The method of graph neural network-based block chain phishing fraud identification of claim 5, characterized in that: said attention coefficient α _ij The calculation formula of (c) is as follows:

wherein,

7. The method of graph neural network-based block chain phishing recognition of claim 6, characterized in that: the side view angle coefficient delta _i,j Is formed by combining edge characteristics and attention coefficients in a splicing way:

δ _i,j ＝(e _i,j ||α _i,j )

wherein phi and

the sensor is a multilayer sensor, and output node embedding is calculated through splicing input; as indicates a combination of a plurality of aggregators and a combination of scalers; the aggregator aggregates information from neighbors, and the scaler scales the aggregated information differently;

And l represents the l-th layer neural network.

8. The method of graph neural network-based block chain phishing fraud identification of claim 7, characterized in that: and training the multi-transaction perspective attention-seeking neural network by adopting neural network back propagation.

9. The method of graph neural network-based block chain phishing fraud identification of claim 8, characterized in that: in the training process, a loss function is adopted to measure the similarity between the input transaction network graph and the target node; the reconstructed loss function is defined as follows:

wherein,

l2-norm representing a vector, σ represents a sigmoid function,

n；

And optimizing by minimizing reconstruction loss in each training iteration, and further outputting the structure and the side view angle information of the node learning trading network.

10. The method of graph neural network-based block chain phishing recognition of claim 8, wherein: the node embedding of the global transaction visual angle and the node embedding of the local transaction visual angle are spliced and then input into the multilayer perceptron together to realize the classification and the identification of the phishing address, and the formula expression is as follows:

where, | | denotes a stitching operation, P _n Representing the probability that the node is a phishing address;

representing the global embedded characteristics of the node,

Representing node local embedding features.