CN116032670A - Ethernet phishing fraud detection method based on self-supervision depth map learning - Google Patents

Ethernet phishing fraud detection method based on self-supervision depth map learning Download PDF

Info

Publication number
CN116032670A
CN116032670A CN202310328325.1A CN202310328325A CN116032670A CN 116032670 A CN116032670 A CN 116032670A CN 202310328325 A CN202310328325 A CN 202310328325A CN 116032670 A CN116032670 A CN 116032670A
Authority
CN
China
Prior art keywords
transaction
training
ethernet
graph
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310328325.1A
Other languages
Chinese (zh)
Other versions
CN116032670B (en
Inventor
许封元
吴昊
李书城
王润川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202310328325.1A priority Critical patent/CN116032670B/en
Publication of CN116032670A publication Critical patent/CN116032670A/en
Application granted granted Critical
Publication of CN116032670B publication Critical patent/CN116032670B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention relates to an Ethernet phishing fraud detection method based on self-supervision depth map learning, and belongs to the technical field of self-supervision depth map learning. And (3) data mapping: and (3) based on the acquired Ethernet data, carrying out automatic information extraction, and merging the information into nodes which do not have available attributes originally to obtain a transaction graph with node characteristics. Preparing a model: and setting a spatial self-supervision pre-task, constructing a model and a training task, and mining and representing node attribute information and topology structure information in the diagram. Model training: setting training scale and convergence condition to obtain optimized training model for detection. The model can detect a new transaction graph on the Ethernet, and solves the problems of continuous change of the Ethernet scale, continuous evolution of the transaction graph, insufficient node label number and the like.

Description

Ethernet phishing fraud detection method based on self-supervision depth map learning
Technical Field
The invention relates to an Ethernet phishing fraud detection method based on self-supervision depth map learning, and belongs to the technical field of self-supervision depth map learning.
Background
Ethernet has fully explored the potential of smart contracts as one of the most popular scalable blockchains in the world today, and has created a large number of off-centered financial applications (DeFi) based on smart contracts, attracting a wide range of attention and funds. FIG. 1 illustrates a process for creating a smart contract that automatically manages and approves the transaction process without any centralized entity, guaranteeing transaction trust and transparency while eliminating delays and costs in the original transaction process. It is estimated that ethernet has currently reached a total market value of two billion dollars, but its explosive growth also exposes millions of users to the risk of malicious attacks, such as phishing fraud, luxury user funds, etc. These attacks are also difficult to defend by conventional methods such as software security or intelligent contract analysis, so intelligent analysis from the level of ethernet transaction is of great importance.
The existing phishing fraud detection methods based on intelligent analysis are mainly of two types. The first type mainly adopts a shallow learning model, such as a traditional machine learning method relying on feature engineering, or a random walk-based network embedding method like deep walk, node2Vec, and the like. The second category is mainly graph-based deep learning methods, such as graph convolutional neural networks, etc. Deep learning has achieved great success in computer vision, speech recognition, natural language processing, and the like due to its powerful representation learning capabilities. In recent years, how to apply deep learning to non-euclidean data such as graphs has received increasing attention, such as social networks, protein interface predictions, knowledge graph embedding, etc., which also contributes to many tasks in computer vision or natural language processing, such as object detection, action recognition, machine translation, semantic parsing, etc. In view of experience in other fields, the characteristics of the transaction activities of the Ethernet itself are combined, namely all the transaction activities of the Ethernet can be regarded as a large-scale transaction diagram, and if the advantages of the deep-drawing learning method on the Ethernet can be fully utilized, the effectiveness of security analysis can be greatly improved.
The basic workflow of the depth map learning method is shown in fig. 2, and the steps include: 1) Data collection-acquiring a data set for depth map learning; 2) Data mapping-constructing a map for a deep learning method based on a dataset; 3) Model preparation-designing a training model and creating a training method; 4) Model training-taking a data subgraph for training as input, and transmitting the data subgraph to a training model until a training result converges; 5) Model application-after training, deploying the trained model, taking the data subgraph for evaluation as input, and obtaining an application result.
At present, two main problems exist in the application of a depth map learning method in the detection of the phishing fraud of the Ethernet. On the one hand, the constructed ethernet transaction data is large in scale. By the time there are more than 18 hundred million transactions in the chain, and thousands of new transactions per second are continuously added to the chain, i.e. the constructed ethernet transaction graph is an evolution graph which keeps dynamic change; on the other hand, the number of the node labels of the constructed Ethernet transaction graph is small, and especially the phishing node labels are lack for detection training of phishing fraud, in addition, the label data in the transaction graph generated by the new transaction data is small, and the problem of label imbalance is further caused. The existing depth map learning methods are not very good in solving the problems, are mostly direct push training methods, can only detect in a single fixed map, need retraining each time new nodes and subgraphs appear, and are not suitable for scenes where the Ethernet continuously has new subgraphs. Meanwhile, in order to cope with the situation that the number of labels is too small, a sampling method with offset is adopted in the process of generating the transaction graph, namely nodes with labels are selected, then the nodes are used as centers, random walk and other algorithms are used for carrying out expansion sampling, finally the nodes with labels and neighbors of the nodes are selected, and the node distribution condition of the original graph of the Ethernet is changed to a certain extent, so that the method is not suitable for actual scenes.
The problems are further analyzed and summarized, the existing deep map learning is only used for limiting the research scale of the Ethernet transaction map by means of direct pushing training, biased sampling and the like, and the original structure of the map is changed, but the problems that the Ethernet data scale is continuously changed, the transaction map is continuously evolved, the node labels are insufficient and the like cannot be completely solved.
Disclosure of Invention
The invention aims to: aiming at the existing problems and defects, the invention aims to provide an Ethernet phishing fraud detection method based on self-supervision depth map learning, and provides an effective self-supervision learning pre-task, which is used for carrying out self-supervision model training on existing Ethernet transaction data, applying a training convergence model to a new subgraph needing to be detected, and finding out nodes with phishing fraud in the subgraph.
In terms of flow, the user using the method only needs to train out a converged phishing fraud detection model based on the existing Ethernet transaction data, and the latest transaction data to be detected is constructed into a transaction diagram for input, so that corresponding normal nodes and nodes possibly having phishing fraud are returned, and the detection of the phishing fraud of the Ethernet is realized.
From the aspect of characteristics, the method can detect the new data subgraph based on the existing Ethernet data training model, and can solve the problems of continuous change of the Ethernet scale, continuous evolution of the transaction graph, insufficient node label number and the like.
The technical scheme is as follows: in order to achieve the above purpose, the present invention adopts the following technical scheme:
the method for detecting the phishing fraud of the Ethernet based on the self-supervision depth map learning comprises the following steps:
step 1: and (3) data mapping: based on the acquired Ethernet data, automatic information extraction is carried out, and the information is combined to nodes which do not have available attributes originally, so that a transaction diagram with node characteristics is obtained;
step 2: preparing a model: setting a spatial self-supervision pre-task, constructing a model and a training task, and mining and representing node attribute information and topology structure information in the diagram;
step 3: model training: setting training scale and convergence condition to obtain optimized training model for detection.
Further, the specific steps of the step 1 are as follows:
step 1.1: collecting the Ethernet transaction data for training, and filtering out the transaction data with failure and transaction value of 0;
step 1.2: dividing the transaction data obtained in the step 1.1 into S parts according to block numbers or time stamps, and further dividing each part of transaction data into S pieces with similar transaction numbers;
step 1.3: and calculating node feature vectors in each piece of transaction data according to the node attributes of the transaction graph, and constructing a one-to-one Ethernet transaction graph.
Further, the specific steps of the step 2 are as follows:
step 2.1: setting model parameters, and selecting direct pushing type or inductive type learning;
step 2.2: if the direct push learning is selected, sequentially inputting a transaction graph formed by S-2 pieces of transaction data before each piece of transaction data for model training, and evaluating model training results on the S-1 th transaction graph; if the inductive learning is selected, the S-1 transaction graphs are sequentially input for model training, and model training results are evaluated on the S-th transaction graph.
Further, the specific steps of the step 3 are as follows: and (2) repeating the step 2.2 on the S transaction data respectively to obtain a final training convergence model.
Further, S is 5.
Further, the specific steps of the step 2 are as follows: the transaction diagram obtained in the step 1 is set as
Figure SMS_3
Dividing into +/according to the sequence of block number or time stamp>
Figure SMS_5
Co (all ]>
Figure SMS_10
A plurality of sub-pictures, wherein each sub-picture +.>
Figure SMS_4
Figure SMS_6
I.e. +.>
Figure SMS_9
Is +.>
Figure SMS_13
A set of individual nodes->
Figure SMS_2
Is the collection of edges in the sub-graph,
Figure SMS_7
is a feature matrix constructed from 17 features of the child graph nodes. To->
Figure SMS_11
An adjacency matrix representing a sub-graph, i.e. if there is an edge between any two nodes m, n in the sub-graph, +.>
Figure SMS_12
Otherwise->
Figure SMS_1
Then, the self-supervised learning training method is applied to the transaction subgraph without labels and with node characteristics, and the loss value is calculated through the following formula
Figure SMS_8
Figure SMS_14
Where g is the encoder of the graph neural network, i.e. feature extractionThe taking device is used for taking out the liquid,
Figure SMS_16
is a transaction graph node set in the front-end task, < >>
Figure SMS_19
Transaction graph node for pre-task training>
Figure SMS_23
Is>
Figure SMS_18
Is used for measuring nodes
Figure SMS_20
Feature embedding vector and true feature value +.>
Figure SMS_25
The error judging model, after the training of the front-end task of the round, the encoder g of the graph neural network is reserved to the next round of training, and meanwhile, the representation of the sub-graph node is generated
Figure SMS_28
The node represents the phishing node detection task that will be fed into the final classifier for downstream calculation of the given sub-graph +_ by the feature extractor of the graph neural network>
Figure SMS_15
Feature embedding vector representation of all nodes in a network
Figure SMS_21
Second, for subgraph->
Figure SMS_24
Node->
Figure SMS_27
If->
Figure SMS_17
Is located at the sub->
Figure SMS_22
Of all sets of points that start not more than k hops reachable (i.e. going through paths of k hops or less), irrespective of the direction on the edge, i.e +.>
Figure SMS_26
Is that
Figure SMS_29
The feature embedded vector of two nodes after model training represents that the similarity should be as high as possible, and vice versa, and the general expression of the task is as follows:
Figure SMS_30
wherein ,
Figure SMS_31
and
Figure SMS_35
Respectively represent sub-graph->
Figure SMS_37
All nodes->
Figure SMS_32
Node group set of k-hop neighbors and node group set of non-k-hop neighbors, respectively,
Figure SMS_36
Is a similarity judging function, and the method is set as follows
Figure SMS_38
Figure SMS_40
Represented is a linear transformation layer, ">
Figure SMS_33
Is->
Figure SMS_34
Middle node->
Figure SMS_39
Is embedded in the vector representation.
The beneficial effects are that: compared with the prior art, the invention has the following advantages: the main problems solved come from three aspects:
the available attributes of the nodes cannot be found in the original transaction data of the Ethernet, and meanwhile, multiple transactions possibly exist between two nodes, so that a polygonal transaction diagram is generated, and the reasonable design of the structure of the transaction diagram of the Ethernet and the setting of the attributes of the representative nodes and edges are required;
the label data of the Ethernet transaction diagram is less, and the label data quantity of the transaction diagram generated by the new transaction is less, so that the problem of unbalance of the label is further aggravated;
the data size of the ethernet is large, and if the whole generated transaction diagram is trained, the time resource cost is large.
Aiming at the problems, the patent provides an Ethernet phishing fraud detection method based on self-supervision depth map learning, which can divide three problems to be solved into three modules of the method, namely Ethernet data mapping, preparation of a phishing fraud detection model and training of the phishing fraud detection model. And mapping the Ethernet data. Unlike bitcoin transactions, which may have multiple inputs and outputs, ethernet transactions are one-to-one, but the original transaction graph has no available attributes at the nodes, multiple sides may occur between two nodes, etc. The method does not directly combine the polygons of the transaction graph into one edge, but extracts relatively useful information from the original transaction data, artificially combines the information into the node attribute to obtain a unilateral directed graph with the node attribute, and then selects the maximum Weakly Connected Component (WCC) in the graph as a final training model input graph.
Preparation of a phishing fraud detection model. In order to fully utilize large-scale Ethernet transaction data under the condition that the number of node labels is small, the self-supervision learning method is adopted in the patent, so that the self-supervision learning method can directly mine more information from the data through supervision signals without worrying about the problems of labels and comments. The effectiveness of the current self-supervised learning methods has been demonstrated by many successful cases, such as natural language processing, computer vision, etc. The topology of the ethernet transaction graph shows that the nodes on the graph are not independent, which makes it almost impossible to use the existing framework directly on the graph. For this reason, the technology described in this patent will put forward the design of a pre-task, propose an effective spatial self-supervision pre-task.
Training of phishing fraud detection models. Huge Ethernet data brings huge resource pressure to model training, transaction data are divided into five parts according to the block number (or time stamp) of the Ethernet, variability is reduced through multiple experiments, and the result is ensured to be more stable. Meanwhile, each transaction data is further divided into five pieces, and under the condition of limited resources, the detection model is trained by sequentially passing through the five pieces of transaction data, so that the problems of strong expansibility and large data scale of the Ethernet data are solved.
Based on the analysis, the patent designs an Ethernet phishing fraud detection method based on self-supervision depth map learning in three modules. The data mapping module obtains a unilateral directed graph with node attributes by automatically extracting relatively useful information from the Ethernet original transaction data and combining the information with the node attributes; the model preparation module designs an effective spatial pre-task based on a self-supervision learning method, and is used for mining and representing rich node attribute information and topological structure information in the diagram; the model training module divides the transaction data/transaction graph according to the block number (or the timestamp) of the Ethernet, sets convergence conditions to continuously optimize the feature extractor, and simultaneously ensures the stability of the result through multiple experiments.
Drawings
FIG. 1 is a flow chart for the creation of a smart contract of the present invention;
FIG. 2 is a basic workflow diagram of depth map learning of the present invention;
FIG. 3 is a spatially pre-tasking of the present invention;
FIG. 4 is a visual result of an embodiment of the present invention using the method described in this patent to detect the original characteristics of Ethernet phishing fraud as compared to other methods;
FIG. 5 is a visual result of an untrained graph convolution neural network of an embodiment of the present invention using the method described in this patent to detect Ethernet phishing fraud as compared to other methods;
FIG. 6 is a visual result of the method of the present patent for detecting Ethernet phishing fraud using the method of the present patent in comparison to other methods in accordance with an embodiment of the present invention;
FIG. 7 is a visual result of an embodiment of the present invention for detecting Ethernet phishing fraud under 1 round of training iterations using the method described in this patent;
FIG. 8 is a visual result of an embodiment of the present invention for detecting Ethernet phishing fraud using different training iterations 10 rounds of the method described in this patent;
fig. 9 is a visual result of an embodiment of the present invention for detecting ethernet fishing fraud using different training iterations 50 rounds of the method described in this patent.
Detailed Description
The present invention is further illustrated in the accompanying drawings and detailed description which are to be understood as being merely illustrative of the invention and not limiting of its scope, and various modifications of the invention, which are equivalent to those skilled in the art upon reading the invention, will fall within the scope of the invention as defined in the appended claims.
Compared with other methods, the method for detecting the phishing fraud of the Ethernet based on self-supervision depth map learning has the visual results shown in figures 4-9.
In the data mapping part, the method automatically extracts information from the original transaction data of the Ethernet and merges the information into nodes which originally do not have available attributes, so as to obtain a transaction graph with node characteristics.
In the model preparation part, the method provides an effective spatial self-supervision pre-task, solves the problem of label scarcity, and obtains a convergence model for detection in a mode of minimizing accumulated errors.
In the model training part, the method trains the transaction graphs according to block numbers (or time stamps) in a slicing way, continuously optimizes the training of each transaction graph, controls the use of resources and simultaneously adapts to the latest transaction graph change.
Examples
The generation block of the ethernet stems from month 2015, 7. Without loss of generality, the present method gathers all transaction data from 2018 month 1 to 2020 month 5 (including external transaction data originating from user addresses and internal transaction data originating from smart contract addresses) for model training. The original data structure is complex, and meanwhile, a plurality of noise data which are meaningless for detecting phishing fraud, such as failed transaction data and transaction data with a transaction value of 0, are obtained, and 75382756 transaction data are finally obtained after the filtering of the method.
The data mapping part automatically extracts general characteristics of 17 pieces of Ethernet original transaction data as transaction map node attributes, and the current characteristics have better extraction effect as shown in the following table 1 and 17 pieces of basic characteristics as the transaction map node attributes.
TABLE 1
Figure SMS_41
The model preparation part, before self-supervision learning training, presumes the original transaction diagram of the Ethernet
Figure SMS_43
Has been divided into { ++in the order of block number (or time stamp)>
Figure SMS_48
N subgraphs in total, wherein each subgraph
Figure SMS_52
Figure SMS_46
I.e. +.>
Figure SMS_49
Is +.>
Figure SMS_51
A set of the individual nodes of the network,
Figure SMS_53
is a set of edges in the subgraph, +.>
Figure SMS_42
Is a feature matrix constructed from 17 features of the child graph nodes. To->
Figure SMS_45
An adjacency matrix representing a sub-graph, i.e. if there is an edge between any two nodes m, n in the sub-graph, +.>
Figure SMS_47
Otherwise
Figure SMS_50
. The self-supervised learning training method is applied to the unlabeled transaction subgraph with node characteristics, and the training target of the self-supervised learning can be summarized as minimizing the loss value of the front-end task +.>
Figure SMS_44
As shown in equation one:
equation one:
Figure SMS_54
where g is the encoder of the neural network, i.e. the feature extractor,
Figure SMS_55
transaction graph node for pre-task training>
Figure SMS_58
Is>
Figure SMS_60
Is the original transaction diagramA certain sub-picture divided by block number (or time stamp), is +.>
Figure SMS_57
Is a transaction graph node set in the front-end task, < >>
Figure SMS_59
Is used for measuring node->
Figure SMS_61
Feature embedding vector and true feature value +.>
Figure SMS_62
And (5) an error judgment model. After the training of the front-end task of the present round, the encoder g of the graph neural network is reserved to the next round of training, and the representation of the sub-graph node is generated>
Figure SMS_56
The node represents the phishing node detection task that will be fed into the final classifier for downstream.
Intuitively, for the problem of phishing fraud detection, a qualified pre-tasking design should satisfy the following preconditions: 1) The extracted data tag can reflect the characteristics of the data; 2) Because of the large size of the ethernet transaction data, the extracted data tags should be available with low time complexity, otherwise the time cost of training is unacceptable; 3) The pre-task may contain some domain knowledge, but cannot be too detailed, otherwise the applicability of the pre-task will be limited and resistant attacks will be easily received. For nodes on the ethernet transaction graph, the method considers that neighbor nodes with frequent transactions should have similar node representations. In order to effectively capture the spatial relationship between neighboring nodes in the ethernet transaction graph, the method described in this patent designs a spatial pre-task as shown in fig. 3.
First, a given sub-graph is calculated by a feature extractor of a graph neural network
Figure SMS_63
Feature embedding vector representation of all nodes in +.>
Figure SMS_66
. Second->
Figure SMS_68
Node->
Figure SMS_65
If->
Figure SMS_67
Is located at the sub->
Figure SMS_69
All points in the set (regardless of the direction on the edge) that start not more than k hops reachable (i.e. travel a path of k hops or less), i.e.
Figure SMS_70
Is->
Figure SMS_64
The feature embedded vectors of the two nodes after model training represent the similarity should be as high as possible, and vice versa. The general expression of this task is shown in equation two:
formula II:
Figure SMS_71
wherein ,
Figure SMS_73
and
Figure SMS_77
Respectively represent sub-graph->
Figure SMS_80
All nodes->
Figure SMS_74
Node group set of k-hop neighbors and node group set of non-k-hop neighbors, respectively,
Figure SMS_75
Is a similarity judging function, and the method is set as follows
Figure SMS_78
Figure SMS_81
Represented is a linear transformation layer, ">
Figure SMS_72
Is->
Figure SMS_76
Middle node->
Figure SMS_79
Is embedded in the vector representation.
The market of the Ethernet is much more severe than that of the traditional trading platform, the scale of the trading data is huge, and in the process of continuously expanding, the trading graph is also continuously evolving, so that it is impractical to put all acquired data into training at one time. So, as described above, the method divides the transaction data by block number (or timestamp), inputs only a portion of the transaction subgraph each time training is performed, and uses the trained model for training the new subgraph. The problem that the Ethernet data is continuously expanded is relieved, and the model is helped to adapt to new Ethernet transaction data distribution.
The final algorithm of the model is as follows:
(1) Ethernet original transaction diagram
Figure SMS_82
Has been divided into { in the order of block numbers (or time stamps)
Figure SMS_83
N subgraphs in total, wherein ∈ ->
Figure SMS_84
= (Ai, Xi)。/>
(2) Through the current diagramNeural network encoder gi obtains subgraph
Figure SMS_85
Is expressed as zi=gi (Ai, xi).
(3) The current graph neural network encoder gi and subgraph
Figure SMS_86
Is used for inputting a spatial pre-task by using a node characteristic embedded vector of Zi, and the task is lost based on a formula I>
Figure SMS_87
And minimizing, converging the model training to obtain an updated graphic neural network encoder gi'.
(4) Taking the updated graph neural network encoder gi' as the next sub-graph
Figure SMS_88
And (4) repeating the steps (2) - (4) until all training set subgraphs are trained.
Compared with the existing most depth map learning methods, the model training part is directly pushed, and the method disclosed by the patent can realize two training methods, namely a direct pushing method and a generalization method. The direct push type learning method means that training and test data are used at the same time when a model is trained, while the inductive type learning method only uses training data and does not use test data when training. For the direct push learning training model, the method uses the first three pieces (five pieces in total) of each transaction data (five in total) divided according to the block number (or time stamp) of the ethernet for model training, and performs model evaluation in the fourth piece of transaction data. When the model is trained through inductive learning, the method uses the first four pieces of transaction data for model training, and carries out model evaluation in the fifth piece of transaction data. Taking the first three pieces of transaction data as an example, the statistical information of the first three pieces of transaction data for training according to the present invention is shown in the following table 2:
TABLE 2
Figure SMS_89
The method selects 2 layers of graphSage with a mean value pool aggregator as a main body of a node classification model, and uses a logistic regression model as a binary classifier. The Ethernet phishing node labels for node classification mainly come from Etherscan' io and blacklists issued by some companies, and 6588 total nodes; the normal node label of the Ethernet is to randomly extract non-phishing nodes from each transaction graph, and the number of the non-phishing nodes is about 3 times of the number of the phishing nodes in the graph. These tag nodes are as follows: 2: the ratio of 3 is assigned to the training set, the evaluation set and the test set. The method selects Adam as an optimizer, sets learning rate traversal {0.01, 0.001, 0.0001}, dropout to 0.5, hidden layer node number traversal {32, 64, 128, 256}, and batch size to 512. While k is set to 2 in the spatial pre-task of the model training part.
The method has the main advantages that the original structure of the Ethernet transaction diagram is guaranteed, and meanwhile, the problems of continuous change of the Ethernet data scale and few data labels are solved based on a self-supervision depth diagram learning method. Next, the beneficial results achieved by this method will be described.
Briefly, the method comprises:
the method is realized and comprehensively evaluated, and a large number of realizations on a large-scale Ethernet transaction diagram show that the method is better than a baseline model, and the F-1 score performance of the method is better than that of the baseline by about 4% -16%.
Specifically:
comparing the direct push learning of the method described in this patent with six baseline methods, comprising: 1) Original features; 2) The deep walk algorithm; 3) Combining method of deep node feature embedding and original feature; 4) The GraphSage algorithm is a flexible inductive graph neural network algorithm; 5) DGI, an unsupervised inductive graph neural network algorithm for representing interaction information by maximizing local batch representation and high-dimensional graph representation; 6) An untrained graph convolution neural network algorithm. For inductive learning, the node feature embedding vector generated by the deep walk algorithm will rotate relative to the original embedding space and therefore not compare to the baseline. Based on five transaction data divided according to the block number (or time stamp) of the ethernet, the prediction results of the direct push and the inductive are shown in tables 3 and 4, respectively, and include four indexes of accuracy, prediction rate, regression rate and F1-score.
Table 3: comparing the prediction result of direct push learning with six baseline methods
Figure SMS_90
Table 4: comparison of the prediction results of inductive learning with six baseline methods
Figure SMS_91
From the observations of the results in tables 3 and 4, the method described in this patent is far superior to the baseline model.
The user uses the method of the patent as follows:
a. collecting the Ethernet transaction data for training, and filtering the transaction data with failure and 0 transaction value;
b. dividing the collected transaction data into five parts according to block numbers (or time stamps), and further dividing each part of transaction data into five pieces with similar transaction numbers;
c. according to the 17 transaction diagram node attributes set in the table 1, manually calculating node feature vectors in each transaction data, and constructing a one-to-one Ethernet transaction diagram;
d. setting model parameters of the method, and simultaneously selecting direct-push learning or inductive learning;
e. taking the first transaction data as an example, if direct push learning is selected, sequentially inputting the first three transaction graphs for model training, and evaluating model training results on the fourth transaction graph; if inductive learning is selected, sequentially inputting the first four transaction graphs for model training, and evaluating model training results on a fifth transaction graph;
f. and e, repeating the step e on five transaction data respectively to obtain a final training convergence model.
And c, constructing a corresponding Ethernet transaction diagram according to the step c, inputting the deployed model, and generating a detection result.
The detection result of the phishing fraud of the Ethernet is visualized, and the visualization result is shown in fig. 4-9. To ensure that the visible view is clear enough, the number of nodes of the visible view is controlled to be within 500. The ratio of the normal number of nodes sampled randomly to the number of nodes with phishing fraud is about 4:1, in fig. 4-9, the round nodes represent normal nodes, and the five-pointed star nodes represent nodes with phishing fraud.
Fig. 4-6 show the node characteristic visualization results of three methods at the same time, and compared with the original characteristic and the untrained graph convolution neural network, the method disclosed by the patent has the advantages that the point separation is better and more concentrated, and the effectiveness of the method disclosed by the patent is further proved. Thus, the method can achieve the effects which cannot be achieved by the prior art.
Fig. 7-9 show the visualization results of the method under different data training iteration rounds at the same time, and the method can be found to be more centralized in the same type of node characteristics and more easily separated from the phishing fraud nodes in a certain range along with the increase of the data training iteration rounds.

Claims (6)

1. The method for detecting the phishing fraud of the Ethernet based on the self-supervision depth map learning is characterized by comprising the following steps of: the method comprises the following steps:
step 1: and (3) data mapping: based on the acquired Ethernet data, automatic information extraction is carried out, and the information is combined to nodes which do not have available attributes originally, so that a transaction diagram with node characteristics is obtained;
step 2: preparing a model: setting a spatial self-supervision pre-task, constructing a model and a training task, and mining and representing node attribute information and topology structure information in the diagram;
step 3: model training: setting training scale and convergence condition to obtain optimized training model for detection.
2. The method for detecting the phishing fraud of the ethernet based on self-supervision depth map learning as claimed in claim 1, wherein the method comprises the following steps: the specific steps of the step 1 are as follows:
step 1.1: collecting the Ethernet transaction data for training, and filtering out the transaction data with failure and transaction value of 0;
step 1.2: dividing the transaction data obtained in the step 1.1 into S parts according to block numbers or time stamps, and further dividing each part of transaction data into S pieces with similar transaction numbers;
step 1.3: and calculating node feature vectors in each piece of transaction data according to the node attributes of the transaction graph, and constructing a one-to-one Ethernet transaction graph.
3. The method for detecting the phishing fraud of the ethernet based on the self-supervision depth map learning as claimed in claim 2, wherein the method comprises the following steps: the specific steps of the step 2 are as follows:
step 2.1: setting model parameters, and selecting direct pushing type or inductive type learning;
step 2.2: if direct push learning is selected, sequentially inputting a transaction graph formed by S-2 pieces of transaction data before each piece of transaction data for model training, and evaluating model training results on the S-1 th transaction graph; if inductive learning is selected, the former S-1 transaction graphs are sequentially input for model training, and model training results are evaluated on the S-th transaction graph.
4. The method for detecting the phishing fraud of the ethernet based on self-supervision depth map learning as claimed in claim 3, wherein: the specific steps of the step 3 are as follows: and (2) repeating the step 2.2 on the S transaction data respectively to obtain a final training convergence model.
5. The method for detecting the phishing fraud of the ethernet based on the self-supervision depth map learning as claimed in claim 2, wherein the method comprises the following steps: and S is 5.
6. The method for detecting the phishing fraud of the ethernet house based on the self-supervision depth map learning as claimed in claim 5, wherein: the specific steps of the step 2 are as follows: the transaction diagram obtained in the step 1 is set as
Figure QLYQS_4
Dividing into +/according to the sequence of block number or time stamp>
Figure QLYQS_8
Co (all ]>
Figure QLYQS_10
A plurality of sub-pictures, wherein each sub-picture +.>
Figure QLYQS_1
Figure QLYQS_7
I.e. +.>
Figure QLYQS_9
Is +.>
Figure QLYQS_12
A set of individual nodes->
Figure QLYQS_2
Is the collection of edges in the sub-graph,
Figure QLYQS_6
is a feature matrix constructed according to 17 features of the sub-graph nodes to +.>
Figure QLYQS_11
Adjacency matrix representing sub-graphs, i.e. if any two of the sub-graphsThere is an edge between the nodes m, n, then +.>
Figure QLYQS_13
Otherwise->
Figure QLYQS_3
Then, the transaction subgraph is applied with a training method of self-supervision learning, and the loss value +.>
Figure QLYQS_5
:/>
Figure QLYQS_14
Where g is the encoder of the neural network, i.e. the feature extractor,
Figure QLYQS_17
is a collection of transaction graph nodes in the pre-task,
Figure QLYQS_19
transaction graph node for pre-task training>
Figure QLYQS_25
Is>
Figure QLYQS_18
Is used for measuring node->
Figure QLYQS_22
Feature embedding vector and true feature value +.>
Figure QLYQS_28
The error judging model, after the training of the front-end task of the round, the encoder g of the graphic neural network is reserved to the next round of training, and simultaneously the representation of the sub-graph node is generated>
Figure QLYQS_29
The node represents the phishing node detection task that will be fed into the final classifier for downstream calculation of the given sub-graph +_ by the feature extractor of the graph neural network>
Figure QLYQS_16
Feature embedding vector representation of all nodes in +.>
Figure QLYQS_20
Second, for subgraph->
Figure QLYQS_24
Node->
Figure QLYQS_27
If->
Figure QLYQS_15
Is located at the sub->
Figure QLYQS_21
Of all sets of points that start not more than k hops reachable (i.e. going through paths of k hops or less), irrespective of the direction on the edge, i.e +.>
Figure QLYQS_23
Is->
Figure QLYQS_26
The feature embedded vectors of two nodes after model training represent high similarity, and the general expression of the task is as follows:
Figure QLYQS_30
wherein ,
Figure QLYQS_32
and
Figure QLYQS_34
Respectively represent sub-graph->
Figure QLYQS_37
All nodes->
Figure QLYQS_33
Node group set of k-hop neighbors and node group set of non-k-hop neighbors, respectively,
Figure QLYQS_36
Is a similarity judging function, the method is set as +.>
Figure QLYQS_39
Figure QLYQS_40
Represented is a linear transformation layer, ">
Figure QLYQS_31
Is->
Figure QLYQS_35
Middle node->
Figure QLYQS_38
Is embedded in the vector representation. />
CN202310328325.1A 2023-03-30 2023-03-30 Ethernet phishing fraud detection method based on self-supervision depth map learning Active CN116032670B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310328325.1A CN116032670B (en) 2023-03-30 2023-03-30 Ethernet phishing fraud detection method based on self-supervision depth map learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310328325.1A CN116032670B (en) 2023-03-30 2023-03-30 Ethernet phishing fraud detection method based on self-supervision depth map learning

Publications (2)

Publication Number Publication Date
CN116032670A true CN116032670A (en) 2023-04-28
CN116032670B CN116032670B (en) 2023-07-18

Family

ID=86089796

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310328325.1A Active CN116032670B (en) 2023-03-30 2023-03-30 Ethernet phishing fraud detection method based on self-supervision depth map learning

Country Status (1)

Country Link
CN (1) CN116032670B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113364748A (en) * 2021-05-25 2021-09-07 浙江工业大学 Ether house phishing node detection method and system based on transaction subgraph network
WO2022088408A1 (en) * 2020-11-02 2022-05-05 南京博雅区块链研究院有限公司 Graph neural network-based transaction fraud detection method and system
US20220253856A1 (en) * 2021-02-11 2022-08-11 The Toronto-Dominion Bank System and method for machine learning based detection of fraud
CN115378629A (en) * 2022-05-13 2022-11-22 北京邮电大学 Ether mill network anomaly detection method and system based on graph neural network and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022088408A1 (en) * 2020-11-02 2022-05-05 南京博雅区块链研究院有限公司 Graph neural network-based transaction fraud detection method and system
US20220253856A1 (en) * 2021-02-11 2022-08-11 The Toronto-Dominion Bank System and method for machine learning based detection of fraud
CN113364748A (en) * 2021-05-25 2021-09-07 浙江工业大学 Ether house phishing node detection method and system based on transaction subgraph network
CN115378629A (en) * 2022-05-13 2022-11-22 北京邮电大学 Ether mill network anomaly detection method and system based on graph neural network and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YUAN ZHANG ETA.: "Towards Thwarting Template Side-Channel Attacks in Secure Cloud Deduplications", IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, vol. 18, no. 3, XP011854385, DOI: 10.1109/TDSC.2019.2911502 *
孙权;汤韬;郑建宾;潘婧;赵金涛;: "金融交易数据驱动的图谱网络智能化欺诈侦测", 应用科学学报, no. 05 *

Also Published As

Publication number Publication date
CN116032670B (en) 2023-07-18

Similar Documents

Publication Publication Date Title
CN107705212B (en) Role identification method based on particle swarm random walk
Wang et al. Weakly supervised person re-id: Differentiable graphical learning and a new benchmark
CN104346481B (en) A kind of community detection method based on dynamic synchronization model
CN108647577A (en) A kind of pedestrian&#39;s weight identification model that adaptive difficult example is excavated, method and system
CN111008337B (en) Deep attention rumor identification method and device based on ternary characteristics
Zhang et al. Dualgraph: A graph-based method for reasoning about label noise
CN112329444B (en) Early rumor detection method fusing text and propagation structure
CN107480213B (en) Community detection and user relation prediction method based on time sequence text network
Liu et al. Learning graph topological features via GAN
CN112559764A (en) Content recommendation method based on domain knowledge graph
CN113779169B (en) Space-time data stream model self-enhancement method
CN109271488A (en) Causal relationship discovery method and system between a kind of bonding behavior sequence and the social network user of text information
CN105512301A (en) User grouping method based on social content
Li et al. Zero-shot camouflaged object detection
CN113052225A (en) Alarm convergence method and device based on clustering algorithm and time sequence association rule
CN108764541B (en) Wind energy prediction method combining space characteristic and error processing
CN113705099A (en) Social platform rumor detection model construction method and detection method based on contrast learning
CN105184654A (en) Public opinion hotspot real-time acquisition method and acquisition device based on community division
Li et al. Adaptive subgraph neural network with reinforced critical structure mining
Wang et al. Multi-task multimodal learning for disaster situation assessment
Zhang et al. A graph-voxel joint convolution neural network for ALS point cloud segmentation
CN114973416A (en) Sign language recognition algorithm based on three-dimensional convolution network
Ni et al. Edge guidance network for semantic segmentation of high resolution remote sensing images
Kulkarni et al. Glovenor: Glove for node representations with second order random walks
Xi et al. Data-correlation-aware unsupervised deep-learning model for anomaly detection in cyber–physical systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant