CN115438751A - Block chain phishing fraud identification method based on graph neural network - Google Patents

Block chain phishing fraud identification method based on graph neural network Download PDF

Info

Publication number
CN115438751A
CN115438751A CN202211275203.2A CN202211275203A CN115438751A CN 115438751 A CN115438751 A CN 115438751A CN 202211275203 A CN202211275203 A CN 202211275203A CN 115438751 A CN115438751 A CN 115438751A
Authority
CN
China
Prior art keywords
transaction
graph
node
neural network
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211275203.2A
Other languages
Chinese (zh)
Inventor
卞静
卓绍烜
李焱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202211275203.2A priority Critical patent/CN115438751A/en
Publication of CN115438751A publication Critical patent/CN115438751A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a block chain phishing fraud identification method based on a graph neural network, which comprises the steps of preprocessing transaction data and processing the transaction data into a transaction network graph; clustering the transaction network graph to obtain a global view graph; sampling the transaction network graph to obtain a local visual angle graph; constructing and training a neural network of the graph; inputting the global view map into a trained map neural network to obtain node embedding of a global transaction view; the node embedding comprises the structure and the side view information of a transaction network; inputting the local view angle diagram into a trained graph neural network to obtain node embedding of a local transaction view angle; and splicing the node embedding of the global transaction view and the node embedding of the local transaction view, and inputting the spliced nodes into the multilayer perceptron together to realize the classified identification of the phishing addresses. The invention improves the recognition performance of the etherhouse phishing recognition by mining more effective information from a transaction network based on the data mining from multiple transaction perspectives.

Description

Block chain phishing fraud identification method based on graph neural network
Technical Field
The invention relates to the technical field of block chain transaction phishing address identification, in particular to a block chain phishing fraud identification method based on a graph neural network.
Background
The encryption currencies of bitcoin, etheng and the like which are popular at present use a block chain technology as a key support technology. The blockchain is a novel distributed account book technology, and realizes the trusted transaction of a distrust intermediary under the environment of mutual distrust. Compared with the traditional database technology, the block chain has the characteristics of counterfeiting prevention, falsification prevention, intelligent contract realization and the like, and is praised as a technology which can cause social change. For realizing credible transaction in a distributed environment, the block chain technology largely uses the cryptography technology to hide user information, and simultaneously all transaction information is verified and stored by a distributed network together. Various public chains, such as bitcoin, ether house, etc., have acquired a large amount of user participation, accumulating a large amount of transaction data. The participation of a large number of users and active user transactions make blockchain-based data analysis an important and valuable research problem.
With the development of the block chain technology, the block chain technology is introduced as a bottom layer technology in various industries, and a large amount of data exists in the form of block chain data, so that the research on the data analysis problem based on the block chain has important theoretical and practical significance. The payment system based on the block chain implementation has anonymity and decentralization, while lawless persons use anonymity to carry out fraudulent activities, and the cheating flooding in the block chain system can block the acceptance and use of the block chain technology by users, thereby blocking the progress of the technology. Therefore, identifying abnormal behavior of users of a blockchain transaction network has become an urgent and critical issue in blockchain ecosystems. Transaction phishing fraud is a novel cyber crime which is raised along with the development of a block chain, phishing fraud acts on the way by utilizing the anonymous characteristic of the block chain, and the illegal acts become more rampant increasingly due to relatively lagged legal measures and still-developing data analysis means.
Existing phishing fraud identification methods can be broadly classified into two categories.
1) According to the characteristic engineering-based method, the topological characteristics of the blockchain transaction network are manually analyzed, and the statistical characteristics are constructed to be used as the input of a machine learning classifier for phishing fraud identification. 219 dimensional features based on first and second order wallet nodes are extracted from a transaction graph constructed based on transaction records, and are used as input of a LightGBM classifier to identify phishing transaction addresses.
2) Based on a random walk diagram representing learning method, a Deepwalk, node2vec algorithm is designed to acquire structural information of a diagram. node2vec to obtain the nodes represented by the etherhouse transaction and classify them with a single class SVM. trans2vec is extended by introducing biased trade samples. However, these outer encoders based on random walk algorithms cannot utilize the characteristic information of the nodes, thereby limiting their performance.
3) Based on the graph neural network method, the E-GCN is designed based on a graph automatic encoder. EdgeProp uses information of transaction edge characteristics. The MCGC uses a plurality of feature extraction channels of the graph upgrade network to extract features of a transaction pattern of the target address. TTAGN uses temporal edge representation and edge2node modules to efficiently identify phishing fraud. However, the existing graph neural network-based method only uses the node features, cannot fully utilize the structural information of the graph, and the edge features in the blockchain transaction of the edge features in the blockchain transaction are not fully utilized.
Disclosure of Invention
In order to overcome the problem that the structural information and the side information of a transaction network cannot be fully utilized by using a heuristic shallow embedding and graph neural network identification technology in the conventional abnormal transaction node identification, the invention provides a block chain phishing fraud identification method based on a graph neural network, which is used for improving the identification performance of the phishing fraud identification from the transaction network based on more effective information in the data mining from multiple transaction perspectives.
In order to solve the technical problems, the technical scheme of the invention is as follows:
a method of block chain phishing fraud identification based on a graph neural network, the method comprising the steps of:
preprocessing the transaction data, and processing the transaction data into a transaction network graph;
clustering the transaction network graph to obtain a global view angle graph;
sampling the transaction network graph to obtain a local visual angle graph;
constructing and training a multi-transaction visual angle attention-seeking neural network;
inputting the global view map into a trained multi-transaction view attention map neural network to obtain node embedding of the global transaction view; the node embedding comprises the structure and the side view information of a transaction network;
inputting the local view map into a trained multi-transaction view attention map neural network to obtain node embedding of the local transaction view;
and splicing the node embedding of the global transaction view and the node embedding of the local transaction view, and inputting the spliced nodes into the multilayer perceptron together to realize the classification and identification of the phishing addresses.
Preferably, the transaction network graph is clustered through a clustering function to obtain a global view graph, and the expression of the global view graph is as follows:
Figure BDA0003896669260000031
where p represents a graph clustering function, c represents the number of clusters,
Figure BDA0003896669260000032
representing a transaction network graph;
Figure BDA0003896669260000033
represents the ith sub-graph generated after clustering,
Figure BDA0003896669260000034
To represent
Figure BDA0003896669260000035
A set of middle nodes,
Figure BDA0003896669260000036
To represent
Figure BDA0003896669260000037
A set of medium edges.
Preferably, the transaction network graph is sampled by a neighbor sampling function to obtain a local view angle graph:
Figure BDA0003896669260000038
wherein,
Figure BDA0003896669260000039
and K-hop represents K-order neighbors to represent the node i, j represents the node in the graph, and K-hop represents K-hop, i.e. K-order neighbors are found.
Preferably, the multi-transaction perspective attention map neural network acquires the side perspective coefficient information by capturing the transaction network through the side features and the attention coefficients, and acquires the structural information of the transaction map by aggregating the features of the address nodes.
Further, the multi-transaction perspective attention-seeking neural network is composed of a plurality of MTvConv blocks; the input of the MTvConv block is the input characteristic of a group of nodes
Figure BDA00038966692600000310
Figure BDA00038966692600000311
Where N represents the number of nodes, F represents the dimension of the input MTvGAT feature in each node,
Figure BDA00038966692600000312
An input feature representing an ith dimension;
embedding of output nodes of each MTvConv block in multi-transaction visual-angle attention-seeking neural network after training
Figure BDA00038966692600000313
Wherein F' represents the dimension of output embedding,
Figure BDA00038966692600000314
An input feature representing an ith dimension;
the calculation formula of the multi-transaction perspective attention-seeking neural network MTvGAT is as follows:
Figure BDA00038966692600000315
wherein,
Figure BDA00038966692600000316
representing the incoming transaction network graph, a representing the adjacency matrix of the incoming transaction network graph, and z representing the embedding of the target nodes learned from the last layer of MTvGAT, MTvConv blocks.
Still further, the attention coefficient α is ij The calculation formula of (a) is as follows:
Figure BDA00038966692600000317
wherein,
Figure BDA00038966692600000318
representing a learnable weight matrix, and converting the input features into high-latitude features;
Figure BDA00038966692600000319
representing a shared attention mechanism, | | | represents a feature stitching operation.
Still further, the edge view angle coefficient δ i,j Is formed by combining edge characteristics and attention coefficients in a splicing way:
δ i,j =(e i,j ||α i,j )
each MTvConv block takes as input the node and edge characteristics; is expressed by the information forward propagation mechanism as follows:
Figure BDA0003896669260000041
wherein phi and
Figure BDA0003896669260000042
the sensor is a multilayer sensor, and output node embedding is calculated by splicing input; the number of the combinations of the plurality of aggregators and the combination of the scalers is indicated as |; the aggregator aggregates information from neighbors, and the scaler scales the aggregated information differently;
the information forward propagation mechanism aggregates the information of neighbor nodes in the node of the multi-transaction visual-angle attention-seeking neural network to generate a new feature vector, namely
Figure BDA0003896669260000043
l denotes the l-th layer neural network.
And further, training the multi-transaction perspective attention-seeking neural network by adopting neural network back propagation.
Furthermore, a loss function is adopted in the training process to measure the similarity between the input transaction network graph and the target node; the reconstructed loss function is defined as follows:
Figure BDA0003896669260000044
Figure BDA0003896669260000045
wherein,
Figure BDA0003896669260000046
l2-norm representing a vector, and σ represents a sigmoid function,
Figure BDA0003896669260000047
n;
And optimizing by minimizing reconstruction loss in each training iteration, and further outputting the structure and side view angle information of the node learning trading network.
And further, splicing node embedding of a global transaction view and node embedding of a local transaction view, and inputting the spliced node embedding and node embedding of a local transaction view into a multilayer sensor together to realize classification and identification of the phishing address, wherein a formula expression is as follows:
Figure BDA0003896669260000048
where, | | denotes a splicing operation, P n Representing the probability that a node is a phishing address;
Figure BDA0003896669260000049
representing the global embedded characteristics of the node,
Figure BDA00038966692600000410
Representing node local embedding features.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
the large-scale blockchain trading network is processed into a global view map and a local view map, the global view map and the local view map are used as the input of a multi-trading view attention map neural network to mine multi-level Ethernet shop trading network information, and node embedding containing structure and side information is output.
The invention improves the recognition performance of the ether phishing fraud by mining more effective block chain transaction network information through the multi-transaction visual-angle attention-seeking neural network.
The invention combines the edge features and the edge view coefficients to fuse the topological structure and the transaction information and generate the final node embedding.
Drawings
FIG. 1 is a flowchart of a method of block chain phishing recognition based on a graph neural network of the present invention.
FIG. 2 is a system block diagram of block chain phishing recognition based on a graph neural network of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and are used for illustration only, and should not be construed as limiting the patent. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.
The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
Example 1
As shown in FIG. 1, a method of block chain phishing fraud identification based on graph neural network, said method comprising the steps of:
preprocessing the transaction data, and processing the transaction data into a transaction network graph;
clustering the transaction network graph to obtain a global view graph;
sampling the transaction network graph to obtain a local visual angle graph;
constructing and training a multi-transaction visual angle attention diagram neural network;
inputting the global perspective diagram into a trained multi-transaction perspective attention diagram neural network to obtain node embedding of the global transaction perspective; the node embedding comprises the structure and the side view information of a transaction network;
inputting the local view map into a trained multi-transaction view attention map neural network to obtain node embedding of the local transaction view;
and splicing the node embedding of the global transaction view and the node embedding of the local transaction view, and inputting the spliced nodes into the multilayer perceptron together to realize the classification and identification of the phishing addresses.
The embodiment processes the transaction data into the form of graph structure
Figure BDA0003896669260000051
Wherein,
Figure BDA0003896669260000052
representing a transaction graph
Figure BDA0003896669260000053
The epsilon represents a transaction graph
Figure BDA0003896669260000054
The transaction relationship that occurs between the transaction account nodes in (1).
For the characteristics of a single transaction account node, each transaction account node i has a corresponding characteristic x i Using matrix X N*D And expressing, wherein N is the number of transaction account nodes, D is the characteristic number of each transaction account node, wherein the initial characteristics of the transaction nodes comprise the degree of entry of the nodes, the degree of exit of the nodes, the transaction quantity related to the nodes, the Ethernet currency value of all incomes of the nodes, the Ethernet currency value of all expenses of the nodes, the sum of all incomes of the nodes and the Ethernet currency value of all expenses of the nodes, the quantity of neighbor nodes and the reciprocal of transaction frequency.
Features e for each transaction edge j j Using a matrix E N′*D′ And representing, wherein N 'is the number of trading edges, and D' is the characteristic number of each trading edge.
Features for the entire transaction network graph (including the amount of money on the transaction edge, the timestamp of the transaction occurrence). The information of the whole trading network graph structure is represented by an adjacency matrix a. Since the relationship between neighboring nodes in the trading network graph should be undirected, the original directed graph needs to be converted into an undirected graph.
In a specific embodiment, the transaction network graph is clustered through a clustering function to obtain a global view graph, and the expression of the global view graph is as follows:
Figure BDA0003896669260000061
where p represents a graph clustering function, c represents the number of clusters,
Figure BDA0003896669260000062
representing a transaction network graph;
Figure BDA0003896669260000063
represents the ith sub-graph generated after clustering,
Figure BDA0003896669260000064
To represent
Figure BDA0003896669260000065
A set of middle nodes,
Figure BDA0003896669260000066
To represent
Figure BDA0003896669260000067
The set of middle edges, parameter c, determines the degree of computational complexity.
In a specific embodiment, the transaction network graph is sampled by a neighbor sampling function to obtain a local view graph:
Figure BDA0003896669260000068
wherein,
Figure BDA0003896669260000069
and K-hop represents K-order neighbors to represent the node i, j represents the node in the graph, and K-hop represents K-hop, i.e. K-order neighbors are found.
The choice of the parameter K should take into account both the computational complexity and the integrity of the node neighbor structure. In this embodiment, all first-order neighbor nodes are selected, and one node is selected from all first-order neighbor nodes.
The multi-transaction view attention-seeking neural network is obtained by performing an improved attention mechanism based on a GAT (goal-oriented architecture) of the neural network; the multi-transaction visual angle attention-oriented neural network MTvGAT has the function of mapping the characteristics of input nodes into node embedding with richer information, and the node embedding can be used as the input of a multi-layer perceptron to classify and identify phishing nodes.
In a specific embodiment, the multi-transaction perspective attention map neural network acquires the side perspective coefficient information by capturing the transaction network through the side features and the attention coefficients, and acquires the structural information of the transaction map by aggregating the features of the address nodes.
Further, the multi-transaction perspective attention-seeking neural network MTvGAT is composed of a plurality of MTvConv blocks;
the input of the MTvConv block is the input characteristic of a group of nodes
Figure BDA0003896669260000071
Figure BDA0003896669260000072
Where N represents the number of nodes, F represents the dimension of the input MTvGAT feature in each node,
Figure BDA0003896669260000073
An input feature representing an ith dimension;
embedding of graph neural network MTvGAT, each MTvConv block output node after training
Figure BDA0003896669260000074
Figure BDA0003896669260000075
Wherein F' represents the dimension of output embedding,
Figure BDA0003896669260000076
Input features representing an ith dimension;
in this embodiment, multiple MTvConv blocks are connected to construct a complete MTvGAT network structure. The calculation formula of the multi-transaction perspective attention-seeking neural network MTvGAT is as follows:
Figure BDA0003896669260000077
wherein,
Figure BDA0003896669260000078
representing the incoming transaction network graph, a representing the adjacency matrix of the incoming transaction network graph, and z representing the embedding of the target node learned from the last layer of MTvGAT, MTvConv.
In a specific embodiment, the attention factor α is ij The calculation formula of (a) is as follows:
Figure BDA0003896669260000079
wherein,
Figure BDA00038966692600000710
representing a learnable weight matrix, and converting the input features into high-latitude features;
Figure BDA00038966692600000711
indicating a shared attention mechanism, | | | denotes the feature stitching operation.
In one embodiment, the edge view factor δ is i,j Is formed by combining edge characteristics and attention coefficients in a splicing way:
δ i,j =(e i,j ||α i,j )
each MTvConv block takes as input the node and edge characteristics; the information forward propagation mechanism is expressed as follows:
Figure BDA00038966692600000712
wherein phi and
Figure BDA00038966692600000713
the sensor is a multilayer sensor, and output node embedding is calculated through splicing input; as indicates a combination of a plurality of aggregators and a combination of scalers; the aggregator aggregates data from neighborsThe scaler performs different scaling on the aggregated information;
the information forward propagation mechanism aggregates the information of neighbor nodes in the node of the multi-transaction visual-angle attention-seeking neural network to generate a new feature vector, namely
Figure BDA00038966692600000714
l denotes the l-th layer neural network.
In one embodiment, neural network back propagation is used to train a multi-transactional perspective attention-seeking neural network.
In a specific embodiment, a loss function is adopted in the training process to measure the similarity between an input transaction network graph and a target node; the reconstructed loss function is defined as follows:
Figure BDA0003896669260000081
Figure BDA0003896669260000082
wherein,
Figure BDA0003896669260000083
l2-norm representing a vector, σ represents a sigmoid function,
Figure BDA0003896669260000084
n;
And optimizing by minimizing reconstruction loss in each training iteration, and further outputting the structure and side view angle information of the node learning trading network.
In a specific embodiment, the node embedding of the global transaction view and the node embedding of the local transaction view are spliced and input into the multilayer perceptron together to realize the classification and identification of the phishing address, and the formula expression is as follows:
Figure BDA0003896669260000085
where, | | denotes a splicing operation, P n Representing the probability that a node is a phishing address;
Figure BDA0003896669260000086
representing the global embedded characteristics of the node,
Figure BDA0003896669260000087
Representing node local embedding features.
Example 2
Based on the method for identifying blockchain phishing fraud based on the graph neural network described in embodiment 1, the embodiment is directed to a specific implementation case of the phishing node of the blockchain data etherhouse platform.
Constructing a graph data structure of the Ethernet shop transaction data by using a Python-based DGL library through the Ethernet shop transaction data acquired by XBock and the tags of the phishing node addresses of Etherscan. in-degree of the in-degree node, out-degree of the out-degree node, number of transactions associated with the Total _ Tx node, etherkey value of all incomes of the in-value node, etherkey value of all payouts of the out-value node, sum of all incomes and payouts of the Total-value node, min _ TS: minimum timestamp in node related transactions, max _ TS: maximum timestamp in node related transactions, num _ neighbor: number of all neighbors of the node, tx _ Freq: the frequency of transactions occurring in the node. The initial characteristics of the edge are: the amount of money on the side of the account transaction, and the timestamp of the occurrence of the timestamp transaction.
Definition of etherhouse trading graph:
Figure BDA0003896669260000088
wherein,
Figure BDA0003896669260000089
as a transaction diagram
Figure BDA00038966692600000810
The epsilon is a transaction graph
Figure BDA00038966692600000811
The transaction relationship that occurs between the transaction account nodes in (1). For the characteristics of a single transaction account node, each transaction account node i has its characteristic x i Using matrix X N*D Wherein N is the number of transaction account nodes, D is the characteristic number of each transaction account node, wherein the initial characteristics of the transaction nodes (including the degree of entry of the node, the degree of exit of the node, the transaction quantity related to the node, the Ethernet currency value of all incomes of the node, the Ethernet currency value of all expenses of the node, the sum of all incomes of the node and the Ethernet currency value of expenses, the quantity of neighbor nodes and the reciprocal of transaction frequency) are represented. Feature e for each transaction edge j j Using a matrix E N′*D′ And representing, wherein N 'is the number of trading edges, and D' is the feature number of each trading edge, wherein the trading edges are initial features. Characteristics for the entire transaction graph (amount of transaction edge, timestamp of transaction occurrence). The information of the whole trading graph structure is represented by the adjacency matrix a. Map the transaction
Figure BDA0003896669260000091
Learning the feature representation of the graph as input, and outputting as Z at the node level N*F Where F is the number of features output by each node.
The pseudo code of the embodiment for inputting the transaction diagram into the multi-transaction perspective attention diagram neural network is as follows:
Figure BDA0003896669260000092
Figure BDA0003896669260000101
expressions (1) and (2) in the pseudo code represent that the embedding feature of the node v, the embedding feature of the neighbor node u and the edge view angle coefficient between the two nodes are spliced to obtain the embedding feature of each neighbor node, and then the embedding feature of the neighbor node v is spliced with the kth-1 layer feature of the target node v to obtain the kth layer embedding feature of the v node.
Equations (3) and (4) in the pseudo code are different from equations (1) and (2) in that the node u is selected from nodes in the global cluster map, and the global embedding characteristics of the time v node are acquired.
The sampling function in the pseudo code is used for randomly sampling a certain number of target nodes, learning embedded characteristics for the target nodes and then carrying out anomaly detection.
To further verify the effect of the method for block chain phishing fraud identification based on graph neural network described in the present embodiment, the experimental result data is as follows:
Figure BDA0003896669260000102
it can be seen from the experimental results that the graphical neural network model achieves better performance than the deep walk and feature-only model. In the graph neural network model, GAT is the worst, achieving similar performance to DeepWalk. This illustrates that relying solely on structural information and node characteristics does not significantly improve overall performance. The MTvGAT achieves the best performance on all metrics, which demonstrates the effectiveness of exploiting the multi-transaction perspective structural features and edge features.
Figure BDA0003896669260000103
Figure BDA0003896669260000111
Analyzing data using multiple trading perspectives may result in more comprehensive structural information and relationships between nodes in a large-scale ethernet trading network. In the local view dataset, GCN achieves the highest accuracy, while MTvGAT achieves the highest performance in other metrics. The reason is that MTvGAT can obtain more information through edge features in local views. From the results of the global perspective dataset, graphSAGE showed higher performance because the model was proposed with the goal of inductive token learning on large graphs. On the multi-transaction perspective data set, the MTvGAT achieves the highest performance on all indexes and is separated from other models on all indexes. The validity of our proposed method in identifying phishing address nodes is thus also verified.
Example 3
The present embodiment is based on the method for block chain phishing fraud identification based on graph neural network described in embodiment 1, and further provides a system for block chain phishing fraud identification based on graph neural network, as shown in fig. 2, the system comprises a multimode input module, a multi-transaction perspective attention-seeking neural network module, and a phishing fraud detection module;
the multi-mode input module is used for preprocessing transaction data and processing the transaction data into a transaction network diagram; the system comprises a transaction network graph, a global view angle graph and a cluster processing module, wherein the transaction network graph is used for clustering the transaction network graph to obtain the global view angle graph; the system comprises a transaction network graph, a local visual angle graph and a local visual angle graph, wherein the transaction network graph is used for sampling and processing the transaction network graph to obtain the local visual angle graph;
the multi-transaction perspective attention-seeking neural network is used for constructing and training the multi-transaction perspective attention-seeking neural network; and respectively calculating the global perspective diagram and the local perspective diagram of the input trained multi-transaction perspective attention diagram neural network to obtain node embedding of the global transaction perspective and node embedding of the local transaction perspective.
The phishing fraud detection module is used for splicing node embedding of a global transaction view angle and node embedding of a local transaction view angle and then inputting the spliced node embedding and the spliced node embedding into the multilayer sensor together so as to realize classification and identification of the phishing addresses.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. This need not be, nor should it be exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (10)

1. A method of block chain phishing fraud identification based on graph neural network, characterized by: the method comprises the following steps:
preprocessing the transaction data, and processing the transaction data into a transaction network diagram;
clustering the transaction network graph to obtain a global view graph;
sampling the transaction network graph to obtain a local visual angle graph;
constructing and training a multi-transaction visual angle attention diagram neural network;
inputting the global view map into a trained multi-transaction view attention map neural network to obtain node embedding of the global transaction view; the node embedding comprises the structure and the side view information of a transaction network;
inputting the local view map into a trained multi-transaction view attention map neural network to obtain node embedding of the local transaction view;
and splicing the node embedding of the global transaction view and the node embedding of the local transaction view, and inputting the spliced nodes into the multilayer perceptron together to realize the classification and identification of the phishing addresses.
2. The method of graph neural network-based block chain phishing fraud identification of claim 1, characterized in that: clustering the trading network graph through a clustering function to obtain a global view angle graph, wherein the expression of the global view angle graph is as follows:
Figure FDA0003896669250000011
where p represents a graph clustering function, c represents the number of clusters,
Figure FDA0003896669250000012
representing a transaction network graph;
Figure FDA0003896669250000013
represents the ith sub-graph generated after clustering,
Figure FDA0003896669250000014
Represent
Figure FDA0003896669250000015
A set of middle nodes,
Figure FDA0003896669250000016
To represent
Figure FDA0003896669250000017
A set of medium edges.
3. The method of graph neural network-based block chain phishing fraud identification of claim 1, characterized in that: sampling the transaction network graph through a neighbor sampling function to obtain a local view angle graph:
Figure FDA0003896669250000018
wherein,
Figure FDA0003896669250000019
and the node represents a K-order neighbor node of the node i, j represents a node in the graph, and K-hop represents K hop, namely K-order neighbor searching.
4. The method of graph neural network-based block chain phishing fraud identification of claim 1, characterized in that: the multi-transaction visual angle attention map neural network acquires the visual angle coefficient information by capturing the transaction network through the edge characteristics and the attention coefficients, and acquires the structure information of the transaction map through the characteristics of the aggregation address nodes.
5. The method of graph neural network-based block chain phishing fraud identification of claim 4, characterized in that: the multi-transaction perspective attention-seeking neural network is composed of a plurality of MTvConv blocks;
the input of the MTvConv block is the input characteristic of a group of nodes
Figure FDA0003896669250000021
Figure FDA0003896669250000022
Where N represents the number of nodes, F represents the dimension of the input features in each node,
Figure FDA0003896669250000023
Input features representing an ith dimension;
embedding of graph neural network MTvGAT, each MTvConv block output node after training
Figure FDA0003896669250000024
Figure FDA0003896669250000025
Where F' represents the dimension of the output embedding,
Figure FDA0003896669250000026
An embedded feature representing the ith dimension;
the calculation formula of the multi-transaction perspective attention-seeking neural network MTvGAT is as follows:
Figure FDA0003896669250000027
wherein,
Figure FDA0003896669250000028
representing the incoming transaction network graph, a representing the adjacency matrix of the incoming transaction network graph, and z representing the embedding of the target nodes learned from the last layer of MTvGAT, MTvConv blocks.
6. The method of graph neural network-based block chain phishing fraud identification of claim 5, characterized in that: said attention coefficient α ij The calculation formula of (c) is as follows:
Figure FDA0003896669250000029
wherein,
Figure FDA00038966692500000210
representing a learnable weight matrix, and converting the input features into high-latitude features;
Figure FDA00038966692500000211
indicating a shared attention mechanism, | | | denotes the feature stitching operation.
7. The method of graph neural network-based block chain phishing recognition of claim 6, characterized in that: the side view angle coefficient delta i,j Is formed by combining edge characteristics and attention coefficients in a splicing way:
δ i,j =(e i,j ||α i,j )
each MTvConv block takes as input the node and edge characteristics; is expressed by the information forward propagation mechanism as follows:
Figure FDA00038966692500000212
wherein phi and
Figure FDA00038966692500000213
the sensor is a multilayer sensor, and output node embedding is calculated through splicing input; as indicates a combination of a plurality of aggregators and a combination of scalers; the aggregator aggregates information from neighbors, and the scaler scales the aggregated information differently;
the information forward propagation mechanism aggregates the information of neighbor nodes in the node of the multi-transaction visual-angle attention-seeking neural network to generate a new feature vector, namely
Figure FDA00038966692500000214
And l represents the l-th layer neural network.
8. The method of graph neural network-based block chain phishing fraud identification of claim 7, characterized in that: and training the multi-transaction perspective attention-seeking neural network by adopting neural network back propagation.
9. The method of graph neural network-based block chain phishing fraud identification of claim 8, characterized in that: in the training process, a loss function is adopted to measure the similarity between the input transaction network graph and the target node; the reconstructed loss function is defined as follows:
Figure FDA0003896669250000031
Figure FDA0003896669250000032
wherein,
Figure FDA0003896669250000033
l2-norm representing a vector, σ represents a sigmoid function,
Figure FDA0003896669250000034
n;
And optimizing by minimizing reconstruction loss in each training iteration, and further outputting the structure and the side view angle information of the node learning trading network.
10. The method of graph neural network-based block chain phishing recognition of claim 8, wherein: the node embedding of the global transaction visual angle and the node embedding of the local transaction visual angle are spliced and then input into the multilayer perceptron together to realize the classification and the identification of the phishing address, and the formula expression is as follows:
Figure FDA0003896669250000035
where, | | denotes a stitching operation, P n Representing the probability that the node is a phishing address;
Figure FDA0003896669250000036
representing the global embedded characteristics of the node,
Figure FDA0003896669250000037
Representing node local embedding features.
CN202211275203.2A 2022-10-18 2022-10-18 Block chain phishing fraud identification method based on graph neural network Pending CN115438751A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211275203.2A CN115438751A (en) 2022-10-18 2022-10-18 Block chain phishing fraud identification method based on graph neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211275203.2A CN115438751A (en) 2022-10-18 2022-10-18 Block chain phishing fraud identification method based on graph neural network

Publications (1)

Publication Number Publication Date
CN115438751A true CN115438751A (en) 2022-12-06

Family

ID=84250944

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211275203.2A Pending CN115438751A (en) 2022-10-18 2022-10-18 Block chain phishing fraud identification method based on graph neural network

Country Status (1)

Country Link
CN (1) CN115438751A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116527313A (en) * 2023-03-23 2023-08-01 中国科学院信息工程研究所 Block chain fishing behavior detection method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116527313A (en) * 2023-03-23 2023-08-01 中国科学院信息工程研究所 Block chain fishing behavior detection method and device
CN116527313B (en) * 2023-03-23 2024-04-19 中国科学院信息工程研究所 Block chain fishing behavior detection method and device

Similar Documents

Publication Publication Date Title
CN104794192B (en) Multistage method for detecting abnormality based on exponential smoothing, integrated study model
Mubalaike et al. Deep learning approach for intelligent financial fraud detection system
Aldegheishem et al. Towards sustainable energy efficiency with intelligent electricity theft detection in smart grids emphasising enhanced neural networks
Majhi et al. Fuzzy clustering using salp swarm algorithm for automobile insurance fraud detection
CN109410036A (en) A kind of fraud detection model training method and device and fraud detection method and device
Ojugo et al. Forging a user-trust hybrid memetic modular neural network card fraud detection ensemble: A Pilot study
CN111652732A (en) Bit currency abnormal transaction entity identification method based on transaction graph matching
CN113283902B (en) Multichannel blockchain phishing node detection method based on graphic neural network
CN114240659A (en) Block chain abnormal node identification method based on dynamic graph convolutional neural network
CN114818999A (en) Account identification method and system based on self-encoder and generation countermeasure network
CN115375480A (en) Abnormal virtual coin wallet address detection method based on graph neural network
CN114782161A (en) Method, device, storage medium and electronic device for identifying risky users
CN114782051A (en) Ether phishing account detection device and method based on multi-feature learning
CN115953163A (en) Fraud risk detection method, apparatus, device and medium
CN115438751A (en) Block chain phishing fraud identification method based on graph neural network
Kaur et al. Analysis on Credit Card Fraud Detection and Prevention using Data Mining and Machine Learning Techniques
Pandey et al. A review of credit card fraud detection techniques
CN113538126A (en) Fraud risk prediction method and device based on GCN
Raj et al. Enhancing Security for Online Transactions through Supervised Machine Learning and Block Chain Technology in Credit Card Fraud Detection
CN114912927A (en) Block chain anti-fraud analysis method and system
CN112950222A (en) Resource processing abnormity detection method and device, electronic equipment and storage medium
Sulaiman et al. Credit Card Fraud Detection Challenges and Solutions: A Review.
Karim et al. Catch me if you can: Semi-supervised graph learning for spotting money laundering
Sinčák Machine Learning Methods in Payment Card Fraud Detection
Kalhotra et al. Data mining and machine learning techniques for credit card fraud detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination