CN116032670A - Ethernet phishing fraud detection method based on self-supervision depth map learning - Google Patents
Ethernet phishing fraud detection method based on self-supervision depth map learning Download PDFInfo
- Publication number
- CN116032670A CN116032670A CN202310328325.1A CN202310328325A CN116032670A CN 116032670 A CN116032670 A CN 116032670A CN 202310328325 A CN202310328325 A CN 202310328325A CN 116032670 A CN116032670 A CN 116032670A
- Authority
- CN
- China
- Prior art keywords
- transaction
- training
- ethernet
- graph
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 28
- 238000012549 training Methods 0.000 claims abstract description 89
- 238000010586 diagram Methods 0.000 claims abstract description 24
- 238000013506 data mapping Methods 0.000 claims abstract description 7
- 238000000605 extraction Methods 0.000 claims abstract description 4
- 238000005065 mining Methods 0.000 claims abstract description 4
- 238000000034 method Methods 0.000 claims description 93
- 238000013528 artificial neural network Methods 0.000 claims description 17
- 239000013598 vector Substances 0.000 claims description 17
- 230000001939 inductive effect Effects 0.000 claims description 13
- 235000008694 Humulus lupulus Nutrition 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 6
- 238000001914 filtration Methods 0.000 claims description 4
- 230000006870 function Effects 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 2
- 230000008859 change Effects 0.000 abstract description 5
- 238000004422 calculation algorithm Methods 0.000 description 8
- 230000000007 visual effect Effects 0.000 description 7
- 238000013461 design Methods 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 5
- 238000002360 preparation method Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000011156 evaluation Methods 0.000 description 4
- 239000000284 extract Substances 0.000 description 4
- 238000003058 natural language processing Methods 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000012800 visualization Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000005295 random walk Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- NQLVQOSNDJXLKG-UHFFFAOYSA-N prosulfocarb Chemical compound CCCN(CCC)C(=O)SCC1=CC=CC=C1 NQLVQOSNDJXLKG-UHFFFAOYSA-N 0.000 description 1
- 102000004169 proteins and genes Human genes 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Landscapes
- Image Analysis (AREA)
Abstract
The invention relates to an Ethernet phishing fraud detection method based on self-supervision depth map learning, and belongs to the technical field of self-supervision depth map learning. And (3) data mapping: and (3) based on the acquired Ethernet data, carrying out automatic information extraction, and merging the information into nodes which do not have available attributes originally to obtain a transaction graph with node characteristics. Preparing a model: and setting a spatial self-supervision pre-task, constructing a model and a training task, and mining and representing node attribute information and topology structure information in the diagram. Model training: setting training scale and convergence condition to obtain optimized training model for detection. The model can detect a new transaction graph on the Ethernet, and solves the problems of continuous change of the Ethernet scale, continuous evolution of the transaction graph, insufficient node label number and the like.
Description
Technical Field
The invention relates to an Ethernet phishing fraud detection method based on self-supervision depth map learning, and belongs to the technical field of self-supervision depth map learning.
Background
Ethernet has fully explored the potential of smart contracts as one of the most popular scalable blockchains in the world today, and has created a large number of off-centered financial applications (DeFi) based on smart contracts, attracting a wide range of attention and funds. FIG. 1 illustrates a process for creating a smart contract that automatically manages and approves the transaction process without any centralized entity, guaranteeing transaction trust and transparency while eliminating delays and costs in the original transaction process. It is estimated that ethernet has currently reached a total market value of two billion dollars, but its explosive growth also exposes millions of users to the risk of malicious attacks, such as phishing fraud, luxury user funds, etc. These attacks are also difficult to defend by conventional methods such as software security or intelligent contract analysis, so intelligent analysis from the level of ethernet transaction is of great importance.
The existing phishing fraud detection methods based on intelligent analysis are mainly of two types. The first type mainly adopts a shallow learning model, such as a traditional machine learning method relying on feature engineering, or a random walk-based network embedding method like deep walk, node2Vec, and the like. The second category is mainly graph-based deep learning methods, such as graph convolutional neural networks, etc. Deep learning has achieved great success in computer vision, speech recognition, natural language processing, and the like due to its powerful representation learning capabilities. In recent years, how to apply deep learning to non-euclidean data such as graphs has received increasing attention, such as social networks, protein interface predictions, knowledge graph embedding, etc., which also contributes to many tasks in computer vision or natural language processing, such as object detection, action recognition, machine translation, semantic parsing, etc. In view of experience in other fields, the characteristics of the transaction activities of the Ethernet itself are combined, namely all the transaction activities of the Ethernet can be regarded as a large-scale transaction diagram, and if the advantages of the deep-drawing learning method on the Ethernet can be fully utilized, the effectiveness of security analysis can be greatly improved.
The basic workflow of the depth map learning method is shown in fig. 2, and the steps include: 1) Data collection-acquiring a data set for depth map learning; 2) Data mapping-constructing a map for a deep learning method based on a dataset; 3) Model preparation-designing a training model and creating a training method; 4) Model training-taking a data subgraph for training as input, and transmitting the data subgraph to a training model until a training result converges; 5) Model application-after training, deploying the trained model, taking the data subgraph for evaluation as input, and obtaining an application result.
At present, two main problems exist in the application of a depth map learning method in the detection of the phishing fraud of the Ethernet. On the one hand, the constructed ethernet transaction data is large in scale. By the time there are more than 18 hundred million transactions in the chain, and thousands of new transactions per second are continuously added to the chain, i.e. the constructed ethernet transaction graph is an evolution graph which keeps dynamic change; on the other hand, the number of the node labels of the constructed Ethernet transaction graph is small, and especially the phishing node labels are lack for detection training of phishing fraud, in addition, the label data in the transaction graph generated by the new transaction data is small, and the problem of label imbalance is further caused. The existing depth map learning methods are not very good in solving the problems, are mostly direct push training methods, can only detect in a single fixed map, need retraining each time new nodes and subgraphs appear, and are not suitable for scenes where the Ethernet continuously has new subgraphs. Meanwhile, in order to cope with the situation that the number of labels is too small, a sampling method with offset is adopted in the process of generating the transaction graph, namely nodes with labels are selected, then the nodes are used as centers, random walk and other algorithms are used for carrying out expansion sampling, finally the nodes with labels and neighbors of the nodes are selected, and the node distribution condition of the original graph of the Ethernet is changed to a certain extent, so that the method is not suitable for actual scenes.
The problems are further analyzed and summarized, the existing deep map learning is only used for limiting the research scale of the Ethernet transaction map by means of direct pushing training, biased sampling and the like, and the original structure of the map is changed, but the problems that the Ethernet data scale is continuously changed, the transaction map is continuously evolved, the node labels are insufficient and the like cannot be completely solved.
Disclosure of Invention
The invention aims to: aiming at the existing problems and defects, the invention aims to provide an Ethernet phishing fraud detection method based on self-supervision depth map learning, and provides an effective self-supervision learning pre-task, which is used for carrying out self-supervision model training on existing Ethernet transaction data, applying a training convergence model to a new subgraph needing to be detected, and finding out nodes with phishing fraud in the subgraph.
In terms of flow, the user using the method only needs to train out a converged phishing fraud detection model based on the existing Ethernet transaction data, and the latest transaction data to be detected is constructed into a transaction diagram for input, so that corresponding normal nodes and nodes possibly having phishing fraud are returned, and the detection of the phishing fraud of the Ethernet is realized.
From the aspect of characteristics, the method can detect the new data subgraph based on the existing Ethernet data training model, and can solve the problems of continuous change of the Ethernet scale, continuous evolution of the transaction graph, insufficient node label number and the like.
The technical scheme is as follows: in order to achieve the above purpose, the present invention adopts the following technical scheme:
the method for detecting the phishing fraud of the Ethernet based on the self-supervision depth map learning comprises the following steps:
step 1: and (3) data mapping: based on the acquired Ethernet data, automatic information extraction is carried out, and the information is combined to nodes which do not have available attributes originally, so that a transaction diagram with node characteristics is obtained;
step 2: preparing a model: setting a spatial self-supervision pre-task, constructing a model and a training task, and mining and representing node attribute information and topology structure information in the diagram;
step 3: model training: setting training scale and convergence condition to obtain optimized training model for detection.
Further, the specific steps of the step 1 are as follows:
step 1.1: collecting the Ethernet transaction data for training, and filtering out the transaction data with failure and transaction value of 0;
step 1.2: dividing the transaction data obtained in the step 1.1 into S parts according to block numbers or time stamps, and further dividing each part of transaction data into S pieces with similar transaction numbers;
step 1.3: and calculating node feature vectors in each piece of transaction data according to the node attributes of the transaction graph, and constructing a one-to-one Ethernet transaction graph.
Further, the specific steps of the step 2 are as follows:
step 2.1: setting model parameters, and selecting direct pushing type or inductive type learning;
step 2.2: if the direct push learning is selected, sequentially inputting a transaction graph formed by S-2 pieces of transaction data before each piece of transaction data for model training, and evaluating model training results on the S-1 th transaction graph; if the inductive learning is selected, the S-1 transaction graphs are sequentially input for model training, and model training results are evaluated on the S-th transaction graph.
Further, the specific steps of the step 3 are as follows: and (2) repeating the step 2.2 on the S transaction data respectively to obtain a final training convergence model.
Further, S is 5.
Further, the specific steps of the step 2 are as follows: the transaction diagram obtained in the step 1 is set asDividing into +/according to the sequence of block number or time stamp>Co (all ]>A plurality of sub-pictures, wherein each sub-picture +.>,I.e. +.>Is +.>A set of individual nodes->Is the collection of edges in the sub-graph,is a feature matrix constructed from 17 features of the child graph nodes. To->An adjacency matrix representing a sub-graph, i.e. if there is an edge between any two nodes m, n in the sub-graph, +.>Otherwise->Then, the self-supervised learning training method is applied to the transaction subgraph without labels and with node characteristics, and the loss value is calculated through the following formula:
Where g is the encoder of the graph neural network, i.e. feature extractionThe taking device is used for taking out the liquid,is a transaction graph node set in the front-end task, < >>Transaction graph node for pre-task training>Is>Is used for measuring nodesFeature embedding vector and true feature value +.>The error judging model, after the training of the front-end task of the round, the encoder g of the graph neural network is reserved to the next round of training, and meanwhile, the representation of the sub-graph node is generatedThe node represents the phishing node detection task that will be fed into the final classifier for downstream calculation of the given sub-graph +_ by the feature extractor of the graph neural network>Feature embedding vector representation of all nodes in a networkSecond, for subgraph->Node->If->Is located at the sub->Of all sets of points that start not more than k hops reachable (i.e. going through paths of k hops or less), irrespective of the direction on the edge, i.e +.>Is thatThe feature embedded vector of two nodes after model training represents that the similarity should be as high as possible, and vice versa, and the general expression of the task is as follows:
wherein , andRespectively represent sub-graph->All nodes->Node group set of k-hop neighbors and node group set of non-k-hop neighbors, respectively,Is a similarity judging function, and the method is set as follows,Represented is a linear transformation layer, ">Is->Middle node->Is embedded in the vector representation.
The beneficial effects are that: compared with the prior art, the invention has the following advantages: the main problems solved come from three aspects:
the available attributes of the nodes cannot be found in the original transaction data of the Ethernet, and meanwhile, multiple transactions possibly exist between two nodes, so that a polygonal transaction diagram is generated, and the reasonable design of the structure of the transaction diagram of the Ethernet and the setting of the attributes of the representative nodes and edges are required;
the label data of the Ethernet transaction diagram is less, and the label data quantity of the transaction diagram generated by the new transaction is less, so that the problem of unbalance of the label is further aggravated;
the data size of the ethernet is large, and if the whole generated transaction diagram is trained, the time resource cost is large.
Aiming at the problems, the patent provides an Ethernet phishing fraud detection method based on self-supervision depth map learning, which can divide three problems to be solved into three modules of the method, namely Ethernet data mapping, preparation of a phishing fraud detection model and training of the phishing fraud detection model. And mapping the Ethernet data. Unlike bitcoin transactions, which may have multiple inputs and outputs, ethernet transactions are one-to-one, but the original transaction graph has no available attributes at the nodes, multiple sides may occur between two nodes, etc. The method does not directly combine the polygons of the transaction graph into one edge, but extracts relatively useful information from the original transaction data, artificially combines the information into the node attribute to obtain a unilateral directed graph with the node attribute, and then selects the maximum Weakly Connected Component (WCC) in the graph as a final training model input graph.
Preparation of a phishing fraud detection model. In order to fully utilize large-scale Ethernet transaction data under the condition that the number of node labels is small, the self-supervision learning method is adopted in the patent, so that the self-supervision learning method can directly mine more information from the data through supervision signals without worrying about the problems of labels and comments. The effectiveness of the current self-supervised learning methods has been demonstrated by many successful cases, such as natural language processing, computer vision, etc. The topology of the ethernet transaction graph shows that the nodes on the graph are not independent, which makes it almost impossible to use the existing framework directly on the graph. For this reason, the technology described in this patent will put forward the design of a pre-task, propose an effective spatial self-supervision pre-task.
Training of phishing fraud detection models. Huge Ethernet data brings huge resource pressure to model training, transaction data are divided into five parts according to the block number (or time stamp) of the Ethernet, variability is reduced through multiple experiments, and the result is ensured to be more stable. Meanwhile, each transaction data is further divided into five pieces, and under the condition of limited resources, the detection model is trained by sequentially passing through the five pieces of transaction data, so that the problems of strong expansibility and large data scale of the Ethernet data are solved.
Based on the analysis, the patent designs an Ethernet phishing fraud detection method based on self-supervision depth map learning in three modules. The data mapping module obtains a unilateral directed graph with node attributes by automatically extracting relatively useful information from the Ethernet original transaction data and combining the information with the node attributes; the model preparation module designs an effective spatial pre-task based on a self-supervision learning method, and is used for mining and representing rich node attribute information and topological structure information in the diagram; the model training module divides the transaction data/transaction graph according to the block number (or the timestamp) of the Ethernet, sets convergence conditions to continuously optimize the feature extractor, and simultaneously ensures the stability of the result through multiple experiments.
Drawings
FIG. 1 is a flow chart for the creation of a smart contract of the present invention;
FIG. 2 is a basic workflow diagram of depth map learning of the present invention;
FIG. 3 is a spatially pre-tasking of the present invention;
FIG. 4 is a visual result of an embodiment of the present invention using the method described in this patent to detect the original characteristics of Ethernet phishing fraud as compared to other methods;
FIG. 5 is a visual result of an untrained graph convolution neural network of an embodiment of the present invention using the method described in this patent to detect Ethernet phishing fraud as compared to other methods;
FIG. 6 is a visual result of the method of the present patent for detecting Ethernet phishing fraud using the method of the present patent in comparison to other methods in accordance with an embodiment of the present invention;
FIG. 7 is a visual result of an embodiment of the present invention for detecting Ethernet phishing fraud under 1 round of training iterations using the method described in this patent;
FIG. 8 is a visual result of an embodiment of the present invention for detecting Ethernet phishing fraud using different training iterations 10 rounds of the method described in this patent;
fig. 9 is a visual result of an embodiment of the present invention for detecting ethernet fishing fraud using different training iterations 50 rounds of the method described in this patent.
Detailed Description
The present invention is further illustrated in the accompanying drawings and detailed description which are to be understood as being merely illustrative of the invention and not limiting of its scope, and various modifications of the invention, which are equivalent to those skilled in the art upon reading the invention, will fall within the scope of the invention as defined in the appended claims.
Compared with other methods, the method for detecting the phishing fraud of the Ethernet based on self-supervision depth map learning has the visual results shown in figures 4-9.
In the data mapping part, the method automatically extracts information from the original transaction data of the Ethernet and merges the information into nodes which originally do not have available attributes, so as to obtain a transaction graph with node characteristics.
In the model preparation part, the method provides an effective spatial self-supervision pre-task, solves the problem of label scarcity, and obtains a convergence model for detection in a mode of minimizing accumulated errors.
In the model training part, the method trains the transaction graphs according to block numbers (or time stamps) in a slicing way, continuously optimizes the training of each transaction graph, controls the use of resources and simultaneously adapts to the latest transaction graph change.
Examples
The generation block of the ethernet stems from month 2015, 7. Without loss of generality, the present method gathers all transaction data from 2018 month 1 to 2020 month 5 (including external transaction data originating from user addresses and internal transaction data originating from smart contract addresses) for model training. The original data structure is complex, and meanwhile, a plurality of noise data which are meaningless for detecting phishing fraud, such as failed transaction data and transaction data with a transaction value of 0, are obtained, and 75382756 transaction data are finally obtained after the filtering of the method.
The data mapping part automatically extracts general characteristics of 17 pieces of Ethernet original transaction data as transaction map node attributes, and the current characteristics have better extraction effect as shown in the following table 1 and 17 pieces of basic characteristics as the transaction map node attributes.
TABLE 1
The model preparation part, before self-supervision learning training, presumes the original transaction diagram of the EthernetHas been divided into { ++in the order of block number (or time stamp)>N subgraphs in total, wherein each subgraph,I.e. +.>Is +.>A set of the individual nodes of the network,is a set of edges in the subgraph, +.>Is a feature matrix constructed from 17 features of the child graph nodes. To->An adjacency matrix representing a sub-graph, i.e. if there is an edge between any two nodes m, n in the sub-graph, +.>Otherwise. The self-supervised learning training method is applied to the unlabeled transaction subgraph with node characteristics, and the training target of the self-supervised learning can be summarized as minimizing the loss value of the front-end task +.>As shown in equation one:
where g is the encoder of the neural network, i.e. the feature extractor,transaction graph node for pre-task training>Is>Is the original transaction diagramA certain sub-picture divided by block number (or time stamp), is +.>Is a transaction graph node set in the front-end task, < >>Is used for measuring node->Feature embedding vector and true feature value +.>And (5) an error judgment model. After the training of the front-end task of the present round, the encoder g of the graph neural network is reserved to the next round of training, and the representation of the sub-graph node is generated>The node represents the phishing node detection task that will be fed into the final classifier for downstream.
Intuitively, for the problem of phishing fraud detection, a qualified pre-tasking design should satisfy the following preconditions: 1) The extracted data tag can reflect the characteristics of the data; 2) Because of the large size of the ethernet transaction data, the extracted data tags should be available with low time complexity, otherwise the time cost of training is unacceptable; 3) The pre-task may contain some domain knowledge, but cannot be too detailed, otherwise the applicability of the pre-task will be limited and resistant attacks will be easily received. For nodes on the ethernet transaction graph, the method considers that neighbor nodes with frequent transactions should have similar node representations. In order to effectively capture the spatial relationship between neighboring nodes in the ethernet transaction graph, the method described in this patent designs a spatial pre-task as shown in fig. 3.
First, a given sub-graph is calculated by a feature extractor of a graph neural networkFeature embedding vector representation of all nodes in +.>. Second->Node->If->Is located at the sub->All points in the set (regardless of the direction on the edge) that start not more than k hops reachable (i.e. travel a path of k hops or less), i.e.Is->The feature embedded vectors of the two nodes after model training represent the similarity should be as high as possible, and vice versa. The general expression of this task is shown in equation two:
wherein , andRespectively represent sub-graph->All nodes->Node group set of k-hop neighbors and node group set of non-k-hop neighbors, respectively,Is a similarity judging function, and the method is set as follows,Represented is a linear transformation layer, ">Is->Middle node->Is embedded in the vector representation.
The market of the Ethernet is much more severe than that of the traditional trading platform, the scale of the trading data is huge, and in the process of continuously expanding, the trading graph is also continuously evolving, so that it is impractical to put all acquired data into training at one time. So, as described above, the method divides the transaction data by block number (or timestamp), inputs only a portion of the transaction subgraph each time training is performed, and uses the trained model for training the new subgraph. The problem that the Ethernet data is continuously expanded is relieved, and the model is helped to adapt to new Ethernet transaction data distribution.
The final algorithm of the model is as follows:
(1) Ethernet original transaction diagramHas been divided into { in the order of block numbers (or time stamps)N subgraphs in total, wherein ∈ ->= (Ai, Xi)。/>
(2) Through the current diagramNeural network encoder gi obtains subgraphIs expressed as zi=gi (Ai, xi).
(3) The current graph neural network encoder gi and subgraphIs used for inputting a spatial pre-task by using a node characteristic embedded vector of Zi, and the task is lost based on a formula I>And minimizing, converging the model training to obtain an updated graphic neural network encoder gi'.
(4) Taking the updated graph neural network encoder gi' as the next sub-graphAnd (4) repeating the steps (2) - (4) until all training set subgraphs are trained.
Compared with the existing most depth map learning methods, the model training part is directly pushed, and the method disclosed by the patent can realize two training methods, namely a direct pushing method and a generalization method. The direct push type learning method means that training and test data are used at the same time when a model is trained, while the inductive type learning method only uses training data and does not use test data when training. For the direct push learning training model, the method uses the first three pieces (five pieces in total) of each transaction data (five in total) divided according to the block number (or time stamp) of the ethernet for model training, and performs model evaluation in the fourth piece of transaction data. When the model is trained through inductive learning, the method uses the first four pieces of transaction data for model training, and carries out model evaluation in the fifth piece of transaction data. Taking the first three pieces of transaction data as an example, the statistical information of the first three pieces of transaction data for training according to the present invention is shown in the following table 2:
TABLE 2
The method selects 2 layers of graphSage with a mean value pool aggregator as a main body of a node classification model, and uses a logistic regression model as a binary classifier. The Ethernet phishing node labels for node classification mainly come from Etherscan' io and blacklists issued by some companies, and 6588 total nodes; the normal node label of the Ethernet is to randomly extract non-phishing nodes from each transaction graph, and the number of the non-phishing nodes is about 3 times of the number of the phishing nodes in the graph. These tag nodes are as follows: 2: the ratio of 3 is assigned to the training set, the evaluation set and the test set. The method selects Adam as an optimizer, sets learning rate traversal {0.01, 0.001, 0.0001}, dropout to 0.5, hidden layer node number traversal {32, 64, 128, 256}, and batch size to 512. While k is set to 2 in the spatial pre-task of the model training part.
The method has the main advantages that the original structure of the Ethernet transaction diagram is guaranteed, and meanwhile, the problems of continuous change of the Ethernet data scale and few data labels are solved based on a self-supervision depth diagram learning method. Next, the beneficial results achieved by this method will be described.
Briefly, the method comprises:
the method is realized and comprehensively evaluated, and a large number of realizations on a large-scale Ethernet transaction diagram show that the method is better than a baseline model, and the F-1 score performance of the method is better than that of the baseline by about 4% -16%.
Specifically:
comparing the direct push learning of the method described in this patent with six baseline methods, comprising: 1) Original features; 2) The deep walk algorithm; 3) Combining method of deep node feature embedding and original feature; 4) The GraphSage algorithm is a flexible inductive graph neural network algorithm; 5) DGI, an unsupervised inductive graph neural network algorithm for representing interaction information by maximizing local batch representation and high-dimensional graph representation; 6) An untrained graph convolution neural network algorithm. For inductive learning, the node feature embedding vector generated by the deep walk algorithm will rotate relative to the original embedding space and therefore not compare to the baseline. Based on five transaction data divided according to the block number (or time stamp) of the ethernet, the prediction results of the direct push and the inductive are shown in tables 3 and 4, respectively, and include four indexes of accuracy, prediction rate, regression rate and F1-score.
Table 3: comparing the prediction result of direct push learning with six baseline methods
Table 4: comparison of the prediction results of inductive learning with six baseline methods
From the observations of the results in tables 3 and 4, the method described in this patent is far superior to the baseline model.
The user uses the method of the patent as follows:
a. collecting the Ethernet transaction data for training, and filtering the transaction data with failure and 0 transaction value;
b. dividing the collected transaction data into five parts according to block numbers (or time stamps), and further dividing each part of transaction data into five pieces with similar transaction numbers;
c. according to the 17 transaction diagram node attributes set in the table 1, manually calculating node feature vectors in each transaction data, and constructing a one-to-one Ethernet transaction diagram;
d. setting model parameters of the method, and simultaneously selecting direct-push learning or inductive learning;
e. taking the first transaction data as an example, if direct push learning is selected, sequentially inputting the first three transaction graphs for model training, and evaluating model training results on the fourth transaction graph; if inductive learning is selected, sequentially inputting the first four transaction graphs for model training, and evaluating model training results on a fifth transaction graph;
f. and e, repeating the step e on five transaction data respectively to obtain a final training convergence model.
And c, constructing a corresponding Ethernet transaction diagram according to the step c, inputting the deployed model, and generating a detection result.
The detection result of the phishing fraud of the Ethernet is visualized, and the visualization result is shown in fig. 4-9. To ensure that the visible view is clear enough, the number of nodes of the visible view is controlled to be within 500. The ratio of the normal number of nodes sampled randomly to the number of nodes with phishing fraud is about 4:1, in fig. 4-9, the round nodes represent normal nodes, and the five-pointed star nodes represent nodes with phishing fraud.
Fig. 4-6 show the node characteristic visualization results of three methods at the same time, and compared with the original characteristic and the untrained graph convolution neural network, the method disclosed by the patent has the advantages that the point separation is better and more concentrated, and the effectiveness of the method disclosed by the patent is further proved. Thus, the method can achieve the effects which cannot be achieved by the prior art.
Fig. 7-9 show the visualization results of the method under different data training iteration rounds at the same time, and the method can be found to be more centralized in the same type of node characteristics and more easily separated from the phishing fraud nodes in a certain range along with the increase of the data training iteration rounds.
Claims (6)
1. The method for detecting the phishing fraud of the Ethernet based on the self-supervision depth map learning is characterized by comprising the following steps of: the method comprises the following steps:
step 1: and (3) data mapping: based on the acquired Ethernet data, automatic information extraction is carried out, and the information is combined to nodes which do not have available attributes originally, so that a transaction diagram with node characteristics is obtained;
step 2: preparing a model: setting a spatial self-supervision pre-task, constructing a model and a training task, and mining and representing node attribute information and topology structure information in the diagram;
step 3: model training: setting training scale and convergence condition to obtain optimized training model for detection.
2. The method for detecting the phishing fraud of the ethernet based on self-supervision depth map learning as claimed in claim 1, wherein the method comprises the following steps: the specific steps of the step 1 are as follows:
step 1.1: collecting the Ethernet transaction data for training, and filtering out the transaction data with failure and transaction value of 0;
step 1.2: dividing the transaction data obtained in the step 1.1 into S parts according to block numbers or time stamps, and further dividing each part of transaction data into S pieces with similar transaction numbers;
step 1.3: and calculating node feature vectors in each piece of transaction data according to the node attributes of the transaction graph, and constructing a one-to-one Ethernet transaction graph.
3. The method for detecting the phishing fraud of the ethernet based on the self-supervision depth map learning as claimed in claim 2, wherein the method comprises the following steps: the specific steps of the step 2 are as follows:
step 2.1: setting model parameters, and selecting direct pushing type or inductive type learning;
step 2.2: if direct push learning is selected, sequentially inputting a transaction graph formed by S-2 pieces of transaction data before each piece of transaction data for model training, and evaluating model training results on the S-1 th transaction graph; if inductive learning is selected, the former S-1 transaction graphs are sequentially input for model training, and model training results are evaluated on the S-th transaction graph.
4. The method for detecting the phishing fraud of the ethernet based on self-supervision depth map learning as claimed in claim 3, wherein: the specific steps of the step 3 are as follows: and (2) repeating the step 2.2 on the S transaction data respectively to obtain a final training convergence model.
5. The method for detecting the phishing fraud of the ethernet based on the self-supervision depth map learning as claimed in claim 2, wherein the method comprises the following steps: and S is 5.
6. The method for detecting the phishing fraud of the ethernet house based on the self-supervision depth map learning as claimed in claim 5, wherein: the specific steps of the step 2 are as follows: the transaction diagram obtained in the step 1 is set asDividing into +/according to the sequence of block number or time stamp>Co (all ]>A plurality of sub-pictures, wherein each sub-picture +.>,I.e. +.>Is +.>A set of individual nodes->Is the collection of edges in the sub-graph,is a feature matrix constructed according to 17 features of the sub-graph nodes to +.>Adjacency matrix representing sub-graphs, i.e. if any two of the sub-graphsThere is an edge between the nodes m, n, then +.>Otherwise->Then, the transaction subgraph is applied with a training method of self-supervision learning, and the loss value +.>:/>
Where g is the encoder of the neural network, i.e. the feature extractor,is a collection of transaction graph nodes in the pre-task,transaction graph node for pre-task training>Is>Is used for measuring node->Feature embedding vector and true feature value +.>The error judging model, after the training of the front-end task of the round, the encoder g of the graphic neural network is reserved to the next round of training, and simultaneously the representation of the sub-graph node is generated>The node represents the phishing node detection task that will be fed into the final classifier for downstream calculation of the given sub-graph +_ by the feature extractor of the graph neural network>Feature embedding vector representation of all nodes in +.>Second, for subgraph->Node->If->Is located at the sub->Of all sets of points that start not more than k hops reachable (i.e. going through paths of k hops or less), irrespective of the direction on the edge, i.e +.>Is->The feature embedded vectors of two nodes after model training represent high similarity, and the general expression of the task is as follows:
wherein , andRespectively represent sub-graph->All nodes->Node group set of k-hop neighbors and node group set of non-k-hop neighbors, respectively,Is a similarity judging function, the method is set as +.>,Represented is a linear transformation layer, ">Is->Middle node->Is embedded in the vector representation. />
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310328325.1A CN116032670B (en) | 2023-03-30 | 2023-03-30 | Ethernet phishing fraud detection method based on self-supervision depth map learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310328325.1A CN116032670B (en) | 2023-03-30 | 2023-03-30 | Ethernet phishing fraud detection method based on self-supervision depth map learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116032670A true CN116032670A (en) | 2023-04-28 |
CN116032670B CN116032670B (en) | 2023-07-18 |
Family
ID=86089796
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310328325.1A Active CN116032670B (en) | 2023-03-30 | 2023-03-30 | Ethernet phishing fraud detection method based on self-supervision depth map learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116032670B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113364748A (en) * | 2021-05-25 | 2021-09-07 | 浙江工业大学 | Ether house phishing node detection method and system based on transaction subgraph network |
WO2022088408A1 (en) * | 2020-11-02 | 2022-05-05 | 南京博雅区块链研究院有限公司 | Graph neural network-based transaction fraud detection method and system |
US20220253856A1 (en) * | 2021-02-11 | 2022-08-11 | The Toronto-Dominion Bank | System and method for machine learning based detection of fraud |
CN115378629A (en) * | 2022-05-13 | 2022-11-22 | 北京邮电大学 | Ether mill network anomaly detection method and system based on graph neural network and storage medium |
-
2023
- 2023-03-30 CN CN202310328325.1A patent/CN116032670B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022088408A1 (en) * | 2020-11-02 | 2022-05-05 | 南京博雅区块链研究院有限公司 | Graph neural network-based transaction fraud detection method and system |
US20220253856A1 (en) * | 2021-02-11 | 2022-08-11 | The Toronto-Dominion Bank | System and method for machine learning based detection of fraud |
CN113364748A (en) * | 2021-05-25 | 2021-09-07 | 浙江工业大学 | Ether house phishing node detection method and system based on transaction subgraph network |
CN115378629A (en) * | 2022-05-13 | 2022-11-22 | 北京邮电大学 | Ether mill network anomaly detection method and system based on graph neural network and storage medium |
Non-Patent Citations (2)
Title |
---|
YUAN ZHANG ETA.: "Towards Thwarting Template Side-Channel Attacks in Secure Cloud Deduplications", IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, vol. 18, no. 3, XP011854385, DOI: 10.1109/TDSC.2019.2911502 * |
孙权;汤韬;郑建宾;潘婧;赵金涛;: "金融交易数据驱动的图谱网络智能化欺诈侦测", 应用科学学报, no. 05 * |
Also Published As
Publication number | Publication date |
---|---|
CN116032670B (en) | 2023-07-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107705212B (en) | Role identification method based on particle swarm random walk | |
Wang et al. | Weakly supervised person re-id: Differentiable graphical learning and a new benchmark | |
CN104346481B (en) | A kind of community detection method based on dynamic synchronization model | |
CN108647577A (en) | A kind of pedestrian's weight identification model that adaptive difficult example is excavated, method and system | |
CN111008337B (en) | Deep attention rumor identification method and device based on ternary characteristics | |
Zhang et al. | Dualgraph: A graph-based method for reasoning about label noise | |
CN112329444B (en) | Early rumor detection method fusing text and propagation structure | |
CN107480213B (en) | Community detection and user relation prediction method based on time sequence text network | |
Liu et al. | Learning graph topological features via GAN | |
CN112559764A (en) | Content recommendation method based on domain knowledge graph | |
CN113779169B (en) | Space-time data stream model self-enhancement method | |
CN109271488A (en) | Causal relationship discovery method and system between a kind of bonding behavior sequence and the social network user of text information | |
CN105512301A (en) | User grouping method based on social content | |
Li et al. | Zero-shot camouflaged object detection | |
CN113052225A (en) | Alarm convergence method and device based on clustering algorithm and time sequence association rule | |
CN108764541B (en) | Wind energy prediction method combining space characteristic and error processing | |
CN113705099A (en) | Social platform rumor detection model construction method and detection method based on contrast learning | |
CN105184654A (en) | Public opinion hotspot real-time acquisition method and acquisition device based on community division | |
Li et al. | Adaptive subgraph neural network with reinforced critical structure mining | |
Wang et al. | Multi-task multimodal learning for disaster situation assessment | |
Zhang et al. | A graph-voxel joint convolution neural network for ALS point cloud segmentation | |
CN114973416A (en) | Sign language recognition algorithm based on three-dimensional convolution network | |
Ni et al. | Edge guidance network for semantic segmentation of high resolution remote sensing images | |
Kulkarni et al. | Glovenor: Glove for node representations with second order random walks | |
Xi et al. | Data-correlation-aware unsupervised deep-learning model for anomaly detection in cyber–physical systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |