WO2022121145A1 - 一种基于图分类的以太坊网络钓鱼诈骗检测方法及装置 - Google Patents
一种基于图分类的以太坊网络钓鱼诈骗检测方法及装置 Download PDFInfo
- Publication number
- WO2022121145A1 WO2022121145A1 PCT/CN2021/081726 CN2021081726W WO2022121145A1 WO 2022121145 A1 WO2022121145 A1 WO 2022121145A1 CN 2021081726 W CN2021081726 W CN 2021081726W WO 2022121145 A1 WO2022121145 A1 WO 2022121145A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- node
- transaction
- network
- order
- graph
- Prior art date
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 9
- 238000000034 method Methods 0.000 claims abstract description 53
- 238000012545 processing Methods 0.000 claims abstract description 31
- 230000008569 process Effects 0.000 claims abstract description 28
- 238000007670 refining Methods 0.000 claims abstract description 18
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 10
- 230000006399 behavior Effects 0.000 claims description 15
- 238000004458 analytical method Methods 0.000 claims description 9
- 230000006870 function Effects 0.000 claims description 7
- 238000000605 extraction Methods 0.000 claims description 6
- 238000002372 labelling Methods 0.000 claims description 6
- 238000012952 Resampling Methods 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 description 6
- 239000000284 extract Substances 0.000 description 5
- 230000008901 benefit Effects 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000012217 deletion Methods 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 238000005242 forging Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012407 engineering method Methods 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
- H04L63/1483—Countermeasures against malicious traffic service impersonation, e.g. phishing, pharming or web spoofing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/04—Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1425—Traffic logging, e.g. anomaly detection
Definitions
- the present invention relates to the field of ethernet network security, and more particularly, to a method and device for detecting ethereum phishing fraud based on graph classification.
- Blockchain is a distributed ledger technology that guarantees the implementation of trusted intermediary transactions between nodes in a non-trusted environment.
- Blockchain can also be described as a trusted distributed database maintained in a peer-to-peer network based on a consensus mechanism.
- Blockchain technology has outstanding advantages in decentralization, unforgeability, anonymity, openness, etc., and because it is considered to be the next-generation disruptive core technology, it is widely used in various fields, among which the most important The most well-known application is digital cryptocurrencies.
- blockchain platforms such as Bitcoin and Ethereum have achieved great development worldwide as emerging cryptocurrency trading platforms.
- Ethereum is the second largest cryptocurrency exchange in the world today and the largest blockchain platform capable of supporting smart contracts. Ethereum supports users to perform Turing-complete language programming in the form of smart contracts, which greatly enriches the levels and scenarios of cryptocurrency trade, and then derives a number of applications of blockchain technology in the economic and financial fields. But at the same time, due to the security supervision issues accompanying the blockchain, Ethereum has gradually become the main target of cybercriminals, including frequent frauds such as phishing scams and Ponzi schemes, which has seriously affected the community on Ethereum. Blockchain financial ecology.
- the present application provides a method and device for detecting phishing fraud in Ethereum based on graph classification, which are used to solve the technical problems that the manual feature limitations of the prior art are relatively obvious, and the processing process is complicated, resulting in high time and computing costs.
- the first aspect of the present application provides a method for detecting Ethereum phishing scams based on graph classification, including:
- the target node includes a marked phishing node and a non-phishing node
- the preset-order neighbor node includes a first-order neighbor node and a second-order neighbor node
- the extracting the target node and the preset order neighbor node from the Ethereum network includes:
- the historical transaction records include node account addresses, historical transaction amounts, historical transaction timestamps, historical transaction flow directions, and historical transaction times;
- the target node and the preset order neighbor node are extracted from the historical transaction records.
- the construction of a second-order transaction subgraph network with the target node as a central node according to the first-order neighbor nodes and the second-order neighbor nodes includes:
- the target node as a central node, and constructing a first-order network connection edge according to the central node and the first-order neighbor node;
- the historical transaction records are stored in each node to obtain the second-order transaction subgraph network.
- the second-order transaction sub-graph network is refined according to the relevant transaction information data of each node in the second-order transaction sub-graph network to obtain the target transaction sub-graph network, including:
- the relevant transaction information data includes a preset transaction amount, a preset transaction timestamp, and a preset transaction flow direction;
- the second-order transaction sub-graph network is refined according to the association strength value to obtain a target transaction sub-graph network, and the refining process includes re-sampling and re-labeling.
- the analyzing and processing the relevant transaction information data to obtain the association strength value between each node and the central node and then further comprising:
- the association strength value is normalized by using a preset activation function.
- the process of the resampling process is:
- Redundant nodes in the second-order transaction subgraph network are deleted according to the association strength value.
- the re-labeling process is as follows:
- the behavior difference between each node is analyzed and processed according to the historical label and the association strength value, and the refined node label is added according to the analysis result.
- a second aspect of the present application provides a device for detecting phishing fraud in Ethereum based on graph classification, including:
- the extraction module is used to extract a target node and a preset order neighbor node from the Ethereum network, the target node includes a marked phishing node and a non-phishing node, and the preset order neighbor node includes a first order neighbor node and second-order neighbor nodes;
- a refining module configured to perform refining processing on the second-order transaction sub-graph network according to the relevant transaction information data of each node in the second-order transaction sub-graph network to obtain a target transaction sub-graph network;
- a learning module used for extracting features in the target transaction sub-graph network by using a preset graph embedding algorithm to obtain a network representation vector
- the classification module is used for inputting the network representation vector into the preset classifier for binary classification processing to obtain the target phishing node.
- the extraction module is specifically used for:
- the historical transaction records include node account addresses, historical transaction amounts, historical transaction timestamps, historical transaction flow directions, and historical transaction times;
- the target node and the preset order neighbor node are extracted from the historical transaction records.
- the building block is specifically used for:
- the target node as a central node, and constructing a first-order network connection edge according to the central node and the first-order neighbor node;
- the historical transaction records are stored in each node to obtain the second-order transaction subgraph network.
- the embodiments of the present application have the following advantages:
- a method for detecting phishing fraud in Ethereum based on graph classification including: extracting a target node and a preset order neighbor node from the Ethereum network, and the target node includes a marked phishing node and a non-phishing node,
- the preset order neighbor nodes include first-order neighbor nodes and second-order neighbor nodes; build a second-order transaction subgraph network with the target node as the center node according to the first-order neighbor nodes and second-order neighbor nodes; Relevant transaction information data of nodes is used to refine the second-order transaction subgraph network to obtain the target transaction subgraph network; the features in the target transaction subgraph network are extracted by the preset graph embedding algorithm, and the network representation vector is obtained; the network representation vector is input Perform binary classification processing in the preset classifier to obtain the target phishing node.
- the method for detecting phishing fraud in Ethereum based on graph classification starts from the theory that different types of nodes often correspond to different behavior patterns, which will then be reflected in the associated transaction subgraph network. description, and then feature expression for the subgraph network, thus ensuring the accuracy of the feature description, and it has good applicability for different situations; by refining the lighter second-order transaction subgraph network, it can not only improve the network It can also reduce data processing time and save computing costs. Therefore, the present application can solve the technical problems that the manual features of the prior art have obvious limitations, and the processing process is complicated, resulting in high time and computational cost.
- FIG. 1 is a schematic flowchart of a method for detecting phishing fraud in Ethereum based on graph classification provided by an embodiment of the present application
- FIG. 2 is another schematic flowchart of a method for detecting phishing fraud in Ethereum based on graph classification provided by an embodiment of the present application
- FIG. 3 is a schematic structural diagram of a device for detecting phishing fraud in Ethereum based on graph classification provided by an embodiment of the present application;
- FIG. 4 is a schematic diagram of an overall flow of phishing fraud detection in Ethereum based on graph classification according to an embodiment of the present application.
- Blockchain A data structure in the form of a linked list formed by connecting the hash value of the block header before and after. Each block consists of transactions generated within a period of time, packaged by computer nodes that have obtained accounting rights, and independently verified by each computer node.
- Transaction The smallest unit of state transition on the blockchain, initiated by the sender’s signature, to transfer specific digital assets, or perform operations that affect the state of the blockchain such as smart contract calls.
- Phishing fraud fraudulent behaviors that defraud users to trade trust and profit from illegally forging official channel information.
- Graph classification method take the graph as the analysis and processing target, and carry out the learning and training of the model through the known set of graphs with labels to predict the labels of the unclassified graphs.
- Graph embedding learning Embed nodes, edges or the whole graph into a low-dimensional vector space and preserve the network topology, node attributes and other information.
- the transaction data of the Ethereum network are all stored on the public blockchain, and the data involved are all It is publicly accessible, therefore, the embodiments of the present application can implement phishing fraud detection based on analysis of complete observable data.
- phishing scams on the Ethereum network are different from traditional fraud through fake websites and emails, but take advantage of the convenience brought by cryptocurrency trading platforms to derive more forms. This makes the focus of phishing fraud detection shift from the identification and reporting of fraudulent and forged information to the analysis of user transaction behavior data.
- the transaction history between Ethereum accounts is modeled as a connected directed network, in which the node represents an Ethereum account address, and the link between two nodes represents the transaction between the two Ethereum accounts.
- the different network characteristics of the target research node can be extracted from the Ethereum network, and the classification algorithm of machine learning can be used to detect the node category, and identify the phishing fraud node, that is, the fraudulent account.
- Embodiment 1 of a method for detecting phishing fraud in Ethereum based on graph classification includes:
- Step 101 extract a target node and a preset order neighbor node from the Ethereum network, the target node includes a marked phishing node and a non-phishing node, and the preset order neighbor node includes a first order neighbor node and a second order neighbor node.
- a node in the Ethereum network is an Ethereum account with a corresponding account address;
- a first-order neighbor node is a node that has a direct transaction record with the target node, and a second-order neighbor node refers to a node with a first-order neighbor.
- the first-order neighbor nodes have related nodes with direct transaction records, and there are historical transaction record information between the associated nodes and nodes. Through the historical transaction record information, the behavior differences between nodes can be analyzed to reflect different behavior characteristics.
- Step 102 construct a second-order transaction subgraph network with the target node as the central node according to the first-order neighbor nodes and the second-order neighbor nodes.
- each target node can obtain a corresponding second-order transaction sub-graph network
- the center node of the second-order transaction sub-graph network is the target node.
- To construct a sub-graph network is to connect the associated nodes to the edge. , the process of obtaining the topology network structure.
- the target node is represented by the subgraph network as the subsequent behavior pattern analysis object, which can more accurately reflect the behavior characteristics of the target node.
- Step 103 Refining the second-order transaction sub-graph network according to the relevant transaction information data of each node in the second-order transaction sub-graph network to obtain the target transaction sub-graph network.
- the relevant transaction information data mainly includes factors such as transaction amount, transaction timestamp and transaction flow direction.
- the transaction situation between nodes in the same second-order transaction subgraph network can more realistically describe nodes and nodes. Therefore, the second-order transaction sub-graph network can be refined according to this feature, and the obtained target transaction sub-graph network is more streamlined and targeted, retaining the nodes with strong correlation and removing the weak correlation. Nodes and edges are connected, which greatly saves the cost of computing time.
- Step 104 using a preset graph embedding algorithm to extract features in the target transaction subgraph network to obtain a network representation vector.
- the graph embedding algorithm (Graph Embedding) aims to learn low-dimensional latent representations of nodes in the network, and the learned feature representations can be used as features for various tasks based on graphs; , edge, or the entire graph is embedded into a low-dimensional vector space, preserving the network topology, node attributes and other information; among them, embedding also means compression; graph embedding is the process of converting an attribute graph into a vector or vector set. Embedding The topology of the graph, vertex-to-vertex relationships, and other relevant information about the graph, subgraphs, and vertices should be captured.
- Step 105 Input the network representation vector into the preset classifier to perform two-classification processing to obtain the target phishing node.
- the preset classifier may be an SVM model, and the result of the second classification is the target phishing node and the non-phishing node.
- the traditional transaction network node analysis it is necessary to combine the connected network where the node is located, and dig out sufficient network features corresponding to the node through network walking and feature engineering methods.
- the premise of a complete and available related connected network is the need for a large amount of data acquisition and cleaning.
- a lighter second-order transaction subgraph network is used to express the transaction behavior pattern of the target node. , and obtain discriminative features through subsequent directional data analysis and processing on the sub-graph network. While ensuring that the amount of data to be obtained is second-order controllable, the network structure vector representation representing the transaction behavior pattern is completed by combining the graph embedding method. It brings great convenience to the data processing process.
- the method for detecting phishing fraud in Ethereum based on graph classification starts from the theory that different types of nodes often correspond to different behavior patterns, which are then reflected in the associated transaction subgraph network.
- the nodes are described, and then the subgraph network is characterized, so as to ensure the accuracy of the feature description, and it has good applicability for different situations; by refining the lighter second-order transaction subgraph network, not only can Improving the expressiveness of the network can also reduce data processing time and save computing costs. Therefore, the embodiments of the present application can solve the technical problems that the manual feature limitations of the prior art are relatively obvious, and the processing process is complicated, resulting in high time and computational cost.
- the present application provides a second embodiment of a method for detecting phishing fraud in Ethereum based on graph classification, including:
- Step 201 Obtain historical transaction records in the Ethereum network, where the historical transaction records include node account addresses, historical transaction amounts, historical transaction timestamps, historical transaction flow directions, and historical transaction times.
- Step 202 Extract the target node and the preset order neighbor node from the historical transaction record.
- the target nodes include labeled phishing nodes and non-phishing nodes, and the preset-order neighbor nodes include first-order neighbor nodes and second-order neighbor nodes.
- First-order neighbor nodes are related nodes that have direct transaction records with the target node; second-order neighbor nodes are related nodes that have direct transaction records with first-order neighbor nodes.
- Step 203 taking the target node as the central node, and constructing a first-order network connection edge according to the central node and the first-order neighbor nodes.
- Step 204 Construct a second-order network connection edge according to the first-order neighbor nodes and the second-order neighbor nodes.
- Step 205 Store the historical transaction records in each node to obtain a second-order transaction subgraph network.
- the target node is used as the central node of the second-order transaction subgraph network, and the node account that has a direct transaction with the target node account is regarded as the first-order neighbor node, and then the central node is connected with the first-order neighbor node.
- the first-order neighbor nodes also have accounts that have direct transactions with them, and these node accounts are used as second-order neighbor nodes, and finally the first-order neighbor nodes are connected with the second-order neighbor nodes.
- Second-order network connections The historical transaction information of the first-order node account and the second-order node account in the historical transaction record needs to be stored in the node as a node attribute, that is, the historical transaction record is stored in the second-order transaction subgraph network.
- Step 206 Extract relevant transaction information data of all nodes in the second-order transaction subgraph network corresponding to the central node, where the relevant transaction information data includes a preset transaction amount, a preset transaction timestamp, and a preset transaction flow direction.
- Step 207 analyze and process the relevant transaction information data, and obtain the association strength value between each node and the central node.
- Step 208 using a preset activation function to normalize the association strength value.
- Step 209 perform refining processing on the second-order transaction subgraph network according to the association strength value to obtain the target transaction subgraph network, and the refining processing includes resampling processing and relabeling processing.
- Correlation strength describes the correlation between nodes.
- the relevant transaction information data is the preset transaction amount
- the value of the association strength between each node and the central node can be calculated based on the transaction amount. The larger the value, the closer the relationship between them in the transaction network, so the larger the total transaction amount related to the node, the larger the correlation strength value related to the transaction amount in the transaction subgraph network.
- the total transaction amount is obtained according to the historical transaction records, and the corresponding transaction amount correlation strength value is:
- x is a transaction neighbor node in the second-order transaction subgraph network
- a x is the transaction amount related to node x
- i is the transaction neighbor node set of the second-order transaction subgraph network
- a i is any node in the node set the relevant transaction amount.
- the correlation strength value between each node and the central node can be calculated based on the transaction time stamp; in the transaction network, the more active the transaction behavior of the nodes, the greater their impact on the transaction network. The larger the value is, the more transactions are performed in the unit transaction time associated with the node, and the higher the transaction time correlation strength value in the transaction subgraph network. For each node, obtain its initial and latest transaction timestamps according to the historical transaction records, subtract the length of the transaction time interval, and count the number of transactions of the device, then the corresponding transaction timestamp correlation strength value is:
- x is a certain transaction neighbor node in the second-order transaction subgraph network
- Tx is the ratio of the number of transactions related to node x to the length of the transaction time interval
- i is any transaction neighbor node set of the second-order transaction subgraph network
- T i is the ratio of the number of transactions related to any node in the node set to the length of the transaction time interval.
- the transaction flow point label of each node can be obtained based on the direction of the transaction flow; in the transaction network, the transaction direction is an important information, which can represent the general situation of the currency flow in the network.
- the nodes connected by the edges pointing to itself in the transaction direction are called aggregate transaction nodes, and the nodes connected by the edges pointing to the connected nodes are called decentralized transaction nodes.
- the transaction flow characteristics are different. For wallet accounts, there will be more evenly distributed aggregated transaction nodes and decentralized transaction nodes; while for phishing fraud accounts, there will be a large number of aggregated transaction nodes and very few decentralized transaction nodes.
- Transaction nodes which are mainly related to their transaction behavior patterns. For each node in the transaction subgraph, according to the transaction direction information in the historical transaction records, the preliminary label division of the aggregated transaction nodes or the decentralized transaction nodes is performed.
- the normalization process is: for the node-related association strength value, the maximum value of the corresponding information strength value in the network is set as the standard score of 1, and the preset activation function Sigmoid is used to normalize the remaining values:
- x is the different association strength values.
- the resampling process is as follows: delete redundant nodes in the second-order transaction subgraph network according to the association strength value. The essence is to delete the nodes with weak transaction relationship with the target node and related edges, simplify the redundant data in the transaction subgraph network, reduce the subsequent operation cost, and retain the effective topology information.
- delete redundant nodes in the second-order transaction subgraph network according to the association strength value The essence is to delete the nodes with weak transaction relationship with the target node and related edges, simplify the redundant data in the transaction subgraph network, reduce the subsequent operation cost, and retain the effective topology information.
- the associated value of the transaction amount and the associated value of the transaction time are combined through the calculation formula to obtain the integrated associated value:
- the node is included in the deletion node set of the corresponding transaction subgraph network, traverses all nodes in the entire transaction subgraph network, and the nodes included in the deletion node are related to it. Edges are deleted in the transaction subgraph to complete the resampling process.
- the process of re-labeling is as follows: analyze and process the behavior differences between nodes according to the historical labels and association strength values, and add refined node labels according to the analysis results.
- the essence of this process is to reflect the behavioral differences between related transaction nodes by marking, and further describe the surrounding transaction network portrait centered on the target node, and strengthen the network characteristic information.
- a threshold can be set to divide the nodes more finely, and a variety of different types of labels can be obtained by combining them. For example, with 0.5 as the separation value, the nodes are classified as large-value transaction nodes (greater than 0.5) and small-value transaction nodes (less than 0.5) according to the transaction amount associated value; ), low-frequency transaction nodes (less than 0.5).
- node labels can be obtained, 1-large-amount high-frequency transaction node, 2-large-amount low-frequency transaction node, 3-small-amount high-frequency transaction node, and 4-small-amount low-frequency transaction node.
- transaction flow direction information according to the flow direction of node transactions relative to the central node, nodes can be classified into aggregated transaction nodes and decentralized transaction nodes.
- the final combination can obtain eight types of node labels. For each node in the transaction subgraph after re-sampling, based on its transaction associated value and transaction flow direction information, it is classified into categories according to the above rules, and corresponding node labels are added to it to complete the re-labeling process.
- Step 210 using a preset graph embedding algorithm to extract features in the target transaction subgraph network to obtain a network representation vector.
- the Graph2Vec model is used to obtain the network representation vector of each target transaction subgraph network.
- the Graph2Vec model extracts a single target transaction subgraph network into a series of smaller subgraph sequences based on the Weisfeiler-Lehman relabeling strategy.
- Weisfeiler-Lehman's relabeling strategy for a certain node of the target subgraph, a label composite set sequence ordered by certain rules is generated based on the labels of all its neighbor nodes, and then in multiple rounds of iterations While integrating the information of the entire subgraph, the update of each sequence is completed at the same time, and finally each node in the subgraph can obtain its corresponding point label sequence.
- the sub-graph sequence is processed, specifically by maximizing the distance in the vector space of the transaction sub-graphs with similar representation sub-graph sequences, and establishing the objective function with the goal of mapping the graph structure to the low-dimensional vector space:
- V is the target node input by the model
- is the number of target nodes
- d is the dimension of the network representation vector
- ⁇ d is the vector space formed by the network representation vector related to the target node.
- V OC is the set of all representation sub-graphs generated by all target transaction sub-graphs, and sg is one of the representation sub-graphs in the set. Then maximize the likelihood estimation function:
- Step 211 Input the network representation vector into the preset classifier to perform binary classification processing to obtain the target phishing node.
- FIG. 4 is a schematic diagram of the overall flow of Ethereum phishing fraud detection based on graph classification in this embodiment. Determine whether the corresponding target node is a phishing fraud node through the output of the SVM classification model.
- the present application also provides an embodiment of a device for detecting phishing fraud in Ethereum based on graph classification, including:
- the extraction module 301 is used to extract a target node and a preset order neighbor node from the Ethereum network, the target node includes a marked phishing node and a non-phishing node, and the preset order neighbor node includes a first order neighbor node and a second order neighbor node;
- the building module 302 is used for constructing the second-order transaction subgraph network with the target node as the central node according to the first-order neighbor node and the second-order neighbor node;
- the refining module 303 is used for refining the second-order transaction sub-graph network according to the relevant transaction information data of each node in the second-order transaction sub-graph network to obtain the target transaction sub-graph network;
- the learning module 304 is used for extracting features in the target transaction sub-graph network by using a preset graph embedding algorithm to obtain a network representation vector;
- the classification module 305 is used for inputting the network representation vector into the preset classifier for two-classification processing to obtain the target phishing node.
- the extraction module 301 is specifically used for:
- the building module 302 is specifically used for:
- the disclosed apparatus and method may be implemented in other manners.
- the apparatus embodiments described above are only illustrative.
- the division of the units is only a logical function division. In actual implementation, there may be other division methods.
- multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
- the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
- the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
- each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
- the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
- the integrated unit if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium.
- the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions for executing all or part of the steps of the methods described in the various embodiments of the present application through a computer device (which may be a personal computer, a server, or a network device, etc.).
- the aforementioned storage media include: U disk, mobile hard disk, read-only memory (full English name: Read-Only Memory, English abbreviation: ROM), random access memory (English full name: RandomAccess Memory, English abbreviation: RAM), magnetic disks Or various media such as optical discs that can store program codes.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Finance (AREA)
- Physics & Mathematics (AREA)
- Accounting & Taxation (AREA)
- General Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Technology Law (AREA)
- General Business, Economics & Management (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本申请公开了一种基于图分类的以太坊网络钓鱼诈骗检测方法及装置,方法包括: 从以太坊网络中提取目标节点和预置阶邻居节点,预置阶邻居节点包括一阶邻居节点和二阶邻居节点; 根据一阶邻居节点和二阶邻居节点构建以目标节点为中心节点的二阶交易子图网络; 根据二阶交易子图网络中各节点的相关交易信息数据对二阶交易子图网络进行提炼处理,得到目标交易子图网络; 采用预置图嵌入算法提取目标交易子图网络中的特征,得到网络表示向量; 将网络表示向量输入预置分类器中进行二分类处理,得到目标钓鱼节点。本申请能够解决现有技术的手工特征局限性较为明显,且处理过程复杂,导致时间和运算成本较高的技术问题。
Description
本申请要求于2020年12月07日提交中国专利局、申请号为202011417306.9、发明名称为“一种基于图分类的以太坊网络钓鱼诈骗检测方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本发明涉及以太坊网络安全领域,更具体地说,尤其涉及一种基于图分类的以太坊网络钓鱼诈骗检测方法及装置。
区块链是一个分布式账簿技术,能够在非互信环境中保证实施节点之间的可信中介交易。区块链也可以被描述为在点对点网络中基于共识机制所维护的一个可信分布式数据库。区块链技术在去中心化、不可伪造性、匿名性、开放性等方面都具有突出优势,也因为被认为是下一代颠覆性核心技术,被广泛应用于各个领域之中,而其中最重要也是最知名的应用便是数字加密货币。在区块链技术的支持下,比特币、以太坊等区块链平台作为新兴的加密货币交易平台在世界范围内获得了极大的发展。
以太坊是当今世界上第二大加密货币交易平台,同时也是能够支持智能合约的最大区块链平台。以太坊支持用户通过智能合约的形式进行图灵完备语言编程,极大地丰富了加密货币贸易的层次和场景,进而衍生出区块链技术在经济金融领域的多项应用。但与此同时,由于区块链所伴随的安全监管问题,以太坊也逐渐成为了网络犯罪者的主要目标,包括钓鱼诈骗、庞氏骗局等诈骗行为频频发生,严重影响了以太坊上的区块链金融生态。
随着虚拟贸易的蓬勃发展,人们越发倾向于在线进行货币以及物品服务之间的交易,这给予了网络钓鱼诈骗非常多的可乘之机。钓鱼诈骗罪犯往往通过非法伪造官方网站以及电子邮件骗取用户的隐私信息,诸如密码、住址以及信用卡详细信息等。对于风险监管仍然不够完备的区块链生态系统来说,钓鱼诈骗无疑对其安全性构成了极大的威胁。如今,钓鱼诈骗已 经逐步成为了以太坊中最被广泛使用的诈骗方式之一,这也迫使我们需要对该问题投入更多关注,找到预防和检测钓鱼诈骗行为的有效方法。
目前,针对以太坊网络的钓鱼诈骗检测技术中,多是结合领域认知以及统计学分析进行手动特征设计,然而这些手工特征缺乏自适应的学习训练过程,难以保证在不同情境中的适用性;另外,以太坊交易网络数据集非常庞大,从其中获取交易相关信息数据需要构建并处理大体量交易网络,这无疑是一个巨大的挑战,同时需要付出极大的时间及运算成本。
发明内容
本申请提供了一种基于图分类的以太坊网络钓鱼诈骗检测方法及装置,用于解决现有技术的手工特征局限性较为明显,且处理过程复杂,导致时间和运算成本较高的技术问题。
有鉴于此,本申请第一方面提供了一种基于图分类的以太坊网络钓鱼诈骗检测方法,包括:
从所述以太坊网络中提取目标节点和预置阶邻居节点,所述目标节点包括带标记的钓鱼节点和非钓鱼节点,所述预置阶邻居节点包括一阶邻居节点和二阶邻居节点;
根据所述一阶邻居节点和所述二阶邻居节点构建以所述目标节点为中心节点的二阶交易子图网络;
根据所述二阶交易子图网络中各节点的相关交易信息数据对所述二阶交易子图网络进行提炼处理,得到目标交易子图网络;
采用预置图嵌入算法提取所述目标交易子图网络中的特征,得到网络表示向量;
将所述网络表示向量输入预置分类器中进行二分类处理,得到目标钓鱼节点。
优选地,所述从所述以太坊网络中提取目标节点和预置阶邻居节点,包括:
在以太坊网络中获取历史交易记录,所述历史交易记录包括节点账户地址、历史交易金额、历史交易时间戳、历史交易流方向和历史交易次数;
从所述历史交易记录中提取所述目标节点和所述预置阶邻居节点。
优选地,所述根据所述一阶邻居节点和所述二阶邻居节点构建以所述目标节点为中心节点的二阶交易子图网络,包括:
将所述目标节点作为中心节点,并根据所述中心节点和所述一阶邻居节点构建一阶网络连边;
根据所述一阶邻居节点和所述二阶邻居节点构建二阶网络连边;
将所述历史交易记录存储至各节点中,得到所述二阶交易子图网络。
优选地,所述根据所述二阶交易子图网络中各节点的相关交易信息数据对所述二阶交易子图网络进行提炼处理,得到目标交易子图网络,包括:
提取所述中心节点对应的所述二阶交易子图网络中所有节点的相关交易信息数据,所述相关交易信息数据包括预置交易金额、预置交易时间戳和预置交易流方向;
对所述相关交易信息数据进行分析处理,获取各节点与所述中心节点之间的关联强度值;
根据所述关联强度值对所述二阶交易子图网络进行提炼处理,得到目标交易子图网络,所述提炼处理包括再采样处理和再标签处理。
优选地,所述对所述相关交易信息数据进行分析处理,获取各节点与所述中心节点之间的关联强度值,之后还包括:
采用预置激活函数对所述关联强度值进行归一化处理。
优选地,所述再采样处理的过程为:
根据所述关联强度值删除所述二阶交易子图网络中的冗余节点。
优选地,所述再标签处理的过程为:
根据历史标签和所述关联强度值对各节点之间的行为差异进行分析处理,并根据分析结果增加细化节点标签。
本申请第二方面提供了一种基于图分类的以太坊网络钓鱼诈骗检测装置,包括:
提取模块,用于从所述以太坊网络中提取目标节点和预置阶邻居节点,所述目标节点包括带标记的钓鱼节点和非钓鱼节点,所述预置阶邻居节点包括一阶邻居节点和二阶邻居节点;
构建模块,用于根据所述一阶邻居节点和所述二阶邻居节点构建以所 述目标节点为中心节点的二阶交易子图网络;
提炼模块,用于根据所述二阶交易子图网络中各节点的相关交易信息数据对所述二阶交易子图网络进行提炼处理,得到目标交易子图网络;
学习模块,用于采用预置图嵌入算法提取所述目标交易子图网络中的特征,得到网络表示向量;
分类模块,用于将所述网络表示向量输入预置分类器中进行二分类处理,得到目标钓鱼节点。
优选地,所述提取模块,具体用于:
在以太坊网络中获取历史交易记录,所述历史交易记录包括节点账户地址、历史交易金额、历史交易时间戳、历史交易流方向和历史交易次数;
从所述历史交易记录中提取所述目标节点和所述预置阶邻居节点。
优选地,所述构建模块,具体用于:
将所述目标节点作为中心节点,并根据所述中心节点和所述一阶邻居节点构建一阶网络连边;
根据所述一阶邻居节点和所述二阶邻居节点构建二阶网络连边;
将所述历史交易记录存储至各节点中,得到所述二阶交易子图网络。
从以上技术方案可以看出,本申请实施例具有以下优点:
本申请中,提供了一种基于图分类的以太坊网络钓鱼诈骗检测方法,包括:从以太坊网络中提取目标节点和预置阶邻居节点,目标节点包括带标记的钓鱼节点和非钓鱼节点,预置阶邻居节点包括一阶邻居节点和二阶邻居节点;根据一阶邻居节点和二阶邻居节点构建以目标节点为中心节点的二阶交易子图网络;根据二阶交易子图网络中各节点的相关交易信息数据对二阶交易子图网络进行提炼处理,得到目标交易子图网络;采用预置图嵌入算法提取目标交易子图网络中的特征,得到网络表示向量;将网络表示向量输入预置分类器中进行二分类处理,得到目标钓鱼节点。
本申请提供的基于图分类的以太坊网络钓鱼诈骗检测方法,从不同类别节点往往对应着不同的行为模式,进而会反映在关联的交易子图网络中这一理论出发,在网络层面对节点进行描述,然后再对子图网络进行特征表达,从而保证了特征描述的准确性,针对不同的情况都具有良好适用性;通过对较轻量的二阶交易子图网络进行提炼处理不仅能够提升网络的表达 能力,还可以减少数据处理时间,节省运算成本。因此,本申请能够解决现有技术的手工特征局限性较为明显,且处理过程复杂,导致时间和运算成本较高的技术问题。
图1为本申请实施例提供的一种基于图分类的以太坊网络钓鱼诈骗检测方法的一个流程示意图;
图2为本申请实施例提供的一种基于图分类的以太坊网络钓鱼诈骗检测方法的另一个流程示意图;
图3为本申请实施例提供的一种基于图分类的以太坊网络钓鱼诈骗检测装置的结构示意图;
图4为本申请实施例提供的基于图分类的以太坊网络钓鱼诈骗检测总体流程示意图。
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
术语介绍:
1.区块链:根据区块头哈希值前后连接组成的链表形式的数据结构。每个区块由一段时间内产生的交易组成,由获得记账权的计算机节点打包,并被各个计算机节点独立验证。
2.交易:即区块链上状态转换的最小单位,由发送方签名发起,将特定的数字资产进行转移、或进行智能合约调用等影响区块链状态的操作。
3.钓鱼诈骗:通过非法伪造官方渠道信息骗取用户交易信任并从中获利的诈骗行为。
4.图分类方法:以图为分析处理目标,通过已知的带有标签的图集合进行模型的学习训练,以进行对未被分类图的标签预测。
5.图嵌入学习:把节点、边或整个图嵌入到低维向量空间中,并且保留网络的拓扑结构,节点属性和其他信息。
对以太坊网络场景下的钓鱼诈骗进行检测过程中,基于区块链的特性会存在两个显著特点:首先,以太坊网络的交易数据全部都被存放在公共区块链上,涉及的数据都是公开可访问的,因此,本申请实施例可以实现基于完整可观察数据分析进行的钓鱼诈骗检测。其次,以太坊网络的钓鱼诈骗不同于传统方式中通过伪造网站以及电子邮件进行欺诈,而是利用了加密货币交易平台所带来的便捷性衍生出更多样的形式。这使得钓鱼诈骗行为检测的重心要从对于欺诈伪造信息的辨别检举,转移到对于用户交易行为数据的分析上。一般把以太坊账户之间的交易历史建模成一个联通的有向网络,其中,节点表示一个以太坊账户地址,两个节点中间的连边则表示该两个以太坊账户之间发生的交易,最终可以从该以太坊网络中提取出目标研究节点的不同网络特征,并利用机器学习的分类算法进行节点类别的检测,识别出钓鱼诈骗节点,即诈骗账户。
为了便于理解,请参阅图1,本申请提供的一种基于图分类的以太坊网络钓鱼诈骗检测方法的实施例一,包括:
步骤101、从以太坊网络中提取目标节点和预置阶邻居节点,目标节点包括带标记的钓鱼节点和非钓鱼节点,预置阶邻居节点包括一阶邻居节点和二阶邻居节点。
需要说明的是,以太坊网络中的一个节点即为一个以太坊账户,拥有一个对应的账户地址;一阶邻居节点是与目标节点存在直接交易记录的相关节点,二阶邻居节点是指与一阶邻居节点存在直接交易记录的相关节点,关联的节点与节点之间存在历史交易记录信息,通过历史交易记录信息能够分析节点之间的行为差异,从而反映出不同的行为特征。
步骤102、根据一阶邻居节点和二阶邻居节点构建以目标节点为中心节点的二阶交易子图网络。
需要说明的是,每一个目标节点都可以获取一个对应的二阶交易子图网络,且二阶交易子图网络的中心节点即为该目标节点,构建子图网络就是将关联节点进行连边处理,得到拓扑网络结构的过程。通过子图网络代表目标节点作为后续的行为模式分析对象,能够更加准确的反映出目标节 点的行为特征。
步骤103、根据二阶交易子图网络中各节点的相关交易信息数据对二阶交易子图网络进行提炼处理,得到目标交易子图网络。
需要说明的是,相关交易信息数据主要包括的是交易金额,交易时间戳和交易流方向等因素,同一个二阶交易子图网络中各节点之间的交易情况能够较真实的描述节点与节点之间的关联强弱,因此,可以根据这一特性对二阶交易子图网络进行提炼,得到的目标交易子图网络更加精简且具有针对性,保留了具有强相关的节点,去除了弱相关节点以及连边,从而较大程度的节省了运算时间成本。
步骤104、采用预置图嵌入算法提取目标交易子图网络中的特征,得到网络表示向量。
需要说明的是,图嵌入算法(Graph Embedding)旨在学习网络中节点的低维度潜在表示,所学习到的特征表示可以用作基于图的各种任务的特征;在本实施例中就是将节点,边,或者整个图嵌入到低维向量空间中,保留网络的拓扑结构、节点属性等信息;其中,嵌入也就是压缩的意思;图嵌入是将属性图转换为向量或向量集的过程,嵌入应该捕获图的拓扑结构、顶点到顶点的关系以及关于图、子图和顶点的其他相关信息。
步骤105、将网络表示向量输入预置分类器中进行二分类处理,得到目标钓鱼节点。
需要说明的是,预置分类器可以是SVM模型,二分类的结果为目标钓鱼节点和非钓鱼节点。传统交易网络节点分析中需要结合节点所在的连通网络,通过网络游走以及特征工程方法挖掘出节点所对应的充足网络特征,而由于以太坊中的历史交易记录繁杂且冗余信息较多,构建一个完备可利用的相关连通网络的前提就是需要进行大量的数据获取以及清洗工作。在本实施例中,对于每个需要检测的目标节点,则仅爬取其在以太坊网络中的周边交易记录信息,用一个较轻量的二阶交易子图网络表达目标节点的交易行为模式,并通过后续在子图网络上有方向性的数据分析处理获取辨别特征,在保证所需获取数据量为二阶可控的同时,结合图嵌入方法完成代表交易行为模式的网络结构向量表达也为数据处理过程带来了极大的便利。
本申请实施例提供的基于图分类的以太坊网络钓鱼诈骗检测方法,从不同类别节点往往对应着不同的行为模式,进而会反映在关联的交易子图网络中这一理论出发,在网络层面对节点进行描述,然后再对子图网络进行特征表达,从而保证了特征描述的准确性,针对不同的情况都具有良好适用性;通过对较轻量的二阶交易子图网络进行提炼处理不仅能够提升网络的表达能力,还可以减少数据处理时间,节省运算成本。因此,本申请实施例能够解决现有技术的手工特征局限性较为明显,且处理过程复杂,导致时间和运算成本较高的技术问题。
为了便于理解,请参阅图2,本申请提供了一种基于图分类的以太坊网络钓鱼诈骗检测方法的实施例二,包括:
步骤201、在以太坊网络中获取历史交易记录,历史交易记录包括节点账户地址、历史交易金额、历史交易时间戳、历史交易流方向和历史交易次数。
步骤202、从历史交易记录中提取目标节点和预置阶邻居节点。
需要说明的是,历史交易记录的信息较多,主要描述交易过程中的动态变化情况。目标节点包括带标记的钓鱼节点和非钓鱼节点,预置阶邻居节点包括一阶邻居节点和二阶邻居节点。一阶邻居节点为与目标节点有直接交易记录的相关节点;二阶邻居节点为与一阶邻居节点有直接交易记录的相关节点。
步骤203、将目标节点作为中心节点,并根据中心节点和一阶邻居节点构建一阶网络连边。
步骤204、根据一阶邻居节点和二阶邻居节点构建二阶网络连边。
步骤205、将历史交易记录存储至各节点中,得到二阶交易子图网络。
需要说明的是,以目标节点作为二阶交易子图网络的中心节点,将与目标节点账户存在直接交易的节点账户作为一阶邻居节点,然后将中心节点与一阶邻居节点建立连接,得到一阶网络连边;同理,一阶邻居节点同样存在与之有直接交易的账户,将这些节点账户作为二阶邻居节点,最后将一阶邻居节点与二阶邻居节点建立连边,就可以得到二阶网络连边。历史交易记录中的一阶节点账户和二阶节点账户的历史交易信息都需要作为节点属性保存在节点中,即将历史交易记录保存在二阶交易子图网络中。
步骤206、提取中心节点对应的二阶交易子图网络中所有节点的相关交易信息数据,相关交易信息数据包括预置交易金额、预置交易时间戳和预置交易流方向。
步骤207、对相关交易信息数据进行分析处理,获取各节点与中心节点之间的关联强度值。
步骤208、采用预置激活函数对关联强度值进行归一化处理。
步骤209、根据关联强度值对二阶交易子图网络进行提炼处理,得到目标交易子图网络,提炼处理包括再采样处理和再标签处理。
需要说明的是,不同的相关交易数据有不同的关联强度的分析处理方式,关联强度描述的是节点与节点之间的相关性,就二阶交易子图网络而言,主要是周边节点与中心节点之间的相关性。在相关交易信息数据为预置交易金额时,可以基于交易金额计算各个节点与中心节点之间的关联强度数值,其基本原理是:在交易网络中,节点之间发生的交易所包含的交易金额越大,它们在交易网络中的关系越紧密,所以节点相关交易金额总量越大,其在交易子图网络中的交易金额相关的关联强度值也越大。针对每个节点,依据历史交易记录获取交易金额总量,则对应的交易金额关联强度值为:
其中,x为二阶交易子图网络中的某一交易邻居节点,A
x为节点x相关的交易金额,i为二阶交易子图网络的交易邻居节点集合,A
i为节点集合中任意节点相关的交易金额。
在相关交易信息数据为预置交易时间戳时,可以基于交易时间戳计算各个节点与中心节点之间的关联强度值;在交易网络中,节点的交易行为越活跃,它们在交易网络中的影响就越大,所以节点相关的单位交易时间内进行的交易次数越多,在交易子图网络中的交易时间关联强度值越高。对于每个节点,依据历史交易记录获取其最初以及最新交易时间戳,相减得到交易时间区间长度,并统计器件发生交易的次数,那么对应的交易时间戳关联强度值为:
其中,x为二阶交易子图网络中的某一交易邻居节点,T
x为节点x相关的交易次数与交易时间区间长度的比值,i为二阶交易子图网络的任意交易邻居节点集合,T
i为节点集合中任意节点相关的交易次数与交易时间区间长度的比值。
在相关交易信息数据为交易流方向时,可以基于交易流方向得到各个节点的交易流点标签;在交易网络中,交易方向是一项重要信息,可以表示出网络中的货币流动概况。对于交易网络中的一个节点来说,以交易方向指向其本身的边相连的节点被称为其聚拢交易节点,而以指向相连节点的边相连的节点被称为其分散交易节点。对于不同类别的节点,交易流特征不同,对于钱包账户,会有较为平均分布的聚拢交易节点以及分散交易节点;而对于钓鱼诈骗账户,则会有大额数量的聚拢交易节点以及极少的分散交易节点,这主要与其交易行为模式相关。对于交易子图中的每个节点,依据历史交易记录中的交易方向信息,进行聚拢交易节点或者分散交易节点的初步标签划分。
需要说明的是,归一化过程为:对于节点相关的关联强度数值,把网络中对应信息强度数值最大值定为标准分值1,利用预置激活函数Sigmoid对其余数值进行归一化处理:
其中,x即为不同的关联强度值。
需要说明的是,具体的根据关联强度值对二阶交易子图网络进行提炼处理的过程是一种子图网络优化的过程,关联强度值在提炼处理过程中的作用方式并不一样,具体的,再采样处理的过程为:根据关联强度值删除二阶交易子图网络中的冗余节点。实质就是删减与目标节点交易关联关系弱的节点以及相关连边,精简交易子图网络中的冗余数据,减少后续运算开销,保留有效的拓扑结构信息。对于二阶交易子图网络中的某一节点,通过计算公式综合其交易金额关联值以及交易时间关联值,得到整合关联值:
S_Final=S_Amount
0.5*S_Time
0.5;
若是该节点的整合关联值低于0.1,则该节点被列入对应交易子图网络的删减节点集合中,遍历整个交易子图网络中所有的节点,对删减节点包 含的节点与其相关连边在交易子图中进行删除处理,从而完成再采样处理过程。
再标签处理的过程为:根据历史标签和关联强度值对各节点之间的行为差异进行分析处理,并根据分析结果增加细化节点标签。该过程实质是通过标记体现相关交易节点之间的行为差异,进一步刻画以目标节点为中心的周边交易网络画像,强化网络特征信息。对于关联强度值,均可以设置阈值将节点进行更加精细的划分,组合得到多种不同类型的标签。举例说明,以0.5作为分隔值,把节点以交易金额关联值分类为大额交易节点(大于0.5)、小额交易节点(小于0.5);以交易时间关联值分类为高频交易节点(大于0.5)、低频交易节点(小于0.5)。按照此组合可以得到四类节点标签,1-大额高频交易节点、2-大额低频交易节点、3-小额高频交易节点、4-小额低频交易节点。对于交易流方向信息,根据节点交易相对于中心节点的流向,又可以把节点分类为聚拢交易节点、分散交易节点。同理可得,最终组合得到八类节点标签。对于再采样后交易子图中的每个节点,基于其交易关联数值以及交易流方向信息,依据上述规则对其进行类别的划分,并为其添加相应节点标签,完成再标签过程。
步骤210、采用预置图嵌入算法提取目标交易子图网络中的特征,得到网络表示向量。
需要说明的是,本实施例中采用Graph2Vec模型得到各个目标交易子图网络的网络表示向量。Graph2Vec模型基于威斯费勒-莱曼(Weisfeiler-Lehman)再标签策略把单个目标交易子图网络提取成一系列更小的子图序列进行表示。威斯费勒-莱曼(Weisfeiler-Lehman)的再标签策略中,对于目标子图的某个节点,基于其所有邻居节点标签生成以一定规则排序的标签复合集序列,并且随后在多轮迭代中整合整个子图信息的同时完成对各个序列的更新,最后子图中的每个节点均可以得到其所对应的表示点标签序列。即对于目标交易子图网络G,可以生成得到表示子图序列集合c(G)={sg
1,sg
2,...,sg
n},采用skip-gram模型对每个交易子图对应的子图序列进行处理,具体是通过最大化有着相似表示子图序列的交易子图在向量空间的距离,以把图结构映射到低维向量空间为目标建立目标函数:
f:G→R
|V|×d;
其中,V为模型输入的目标节点,|V|为目标节点数量,d为网络表示向量维度,R
|V|×d为目标节点相关的网络表示向量构成的向量空间。对于第i个目标节点的目标交易子图G
i,其得到的子图序列集合中的序列长度为l,sg
j为子图序列中的第j个表示子图,其概率为:
其中,V
OC为全部目标交易子图产生的所有表示子图集合,sg为集合中某一个表示子图。然后最大化处理一下似然估计函数:
最后采用随机梯度下降法对上式进行优化,得到网络表示向量。
步骤211、将网络表示向量输入预置分类器中进行二分类处理,得到目标钓鱼节点。
需要说明的是,请参阅图4,图4即为本实施例基于图分类的以太坊网络钓鱼诈骗检测总体流程示意图。通过SVM分类模型输出判断对应目标节点是否为钓鱼诈骗节点。
为了便于理解,请参阅图3,本申请还提供了一种基于图分类的以太坊网络钓鱼诈骗检测装置的实施例,包括:
提取模块301,用于从以太坊网络中提取目标节点和预置阶邻居节点,目标节点包括带标记的钓鱼节点和非钓鱼节点,预置阶邻居节点包括一阶邻居节点和二阶邻居节点;
构建模块302,用于根据一阶邻居节点和二阶邻居节点构建以目标节点为中心节点的二阶交易子图网络;
提炼模块303,用于根据二阶交易子图网络中各节点的相关交易信息数据对二阶交易子图网络进行提炼处理,得到目标交易子图网络;
学习模块304,用于采用预置图嵌入算法提取目标交易子图网络中的特征,得到网络表示向量;
分类模块305,用于将网络表示向量输入预置分类器中进行二分类处理,得到目标钓鱼节点。
作为更进一步地,提取模块301,具体用于:
在以太坊网络中获取历史交易记录,历史交易记录包括节点账户地址、 历史交易金额、历史交易时间戳、历史交易流方向和历史交易次数;
从历史交易记录中提取目标节点和预置阶邻居节点。
作为更进一步地,构建模块302,具体用于:
将目标节点作为中心节点,并根据中心节点和一阶邻居节点构建一阶网络连边;
根据一阶邻居节点和二阶邻居节点构建二阶网络连边;
将历史交易记录存储至各节点中,得到二阶交易子图网络。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以通过一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(英文全称:Read-Only Memory,英文缩写:ROM)、随机存取存储器(英文 全称:RandomAccess Memory,英文缩写:RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。
Claims (10)
- 一种基于图分类的以太坊网络钓鱼诈骗检测方法,其特征在于,包括:从所述以太坊网络中提取目标节点和预置阶邻居节点,所述目标节点包括带标记的钓鱼节点和非钓鱼节点,所述预置阶邻居节点包括一阶邻居节点和二阶邻居节点;根据所述一阶邻居节点和所述二阶邻居节点构建以所述目标节点为中心节点的二阶交易子图网络;根据所述二阶交易子图网络中各节点的相关交易信息数据对所述二阶交易子图网络进行提炼处理,得到目标交易子图网络;采用预置图嵌入算法提取所述目标交易子图网络中的特征,得到网络表示向量;将所述网络表示向量输入预置分类器中进行二分类处理,得到目标钓鱼节点。
- 根据权利要求1所述的基于图分类的以太坊网络钓鱼诈骗检测方法,其特征在于,所述从所述以太坊网络中提取目标节点和预置阶邻居节点,包括:在以太坊网络中获取历史交易记录,所述历史交易记录包括节点账户地址、历史交易金额、历史交易时间戳、历史交易流方向和历史交易次数;从所述历史交易记录中提取所述目标节点和所述预置阶邻居节点。
- 根据权利要求2所述的基于图分类的以太坊网络钓鱼诈骗检测方法,其特征在于,所述根据所述一阶邻居节点和所述二阶邻居节点构建以所述目标节点为中心节点的二阶交易子图网络,包括:将所述目标节点作为中心节点,并根据所述中心节点和所述一阶邻居节点构建一阶网络连边;根据所述一阶邻居节点和所述二阶邻居节点构建二阶网络连边;将所述历史交易记录存储至各节点中,得到所述二阶交易子图网络。
- 根据权利要求1所述的基于图分类的以太坊网络钓鱼诈骗检测方法,其特征在于,所述根据所述二阶交易子图网络中各节点的相关交易信息数据对所述二阶交易子图网络进行提炼处理,得到目标交易子图网络, 包括:提取所述中心节点对应的所述二阶交易子图网络中所有节点的相关交易信息数据,所述相关交易信息数据包括预置交易金额、预置交易时间戳和预置交易流方向;对所述相关交易信息数据进行分析处理,获取各节点与所述中心节点之间的关联强度值;根据所述关联强度值对所述二阶交易子图网络进行提炼处理,得到目标交易子图网络,所述提炼处理包括再采样处理和再标签处理。
- 根据权利要求4所述的基于图分类的以太坊网络钓鱼诈骗检测方法,其特征在于,所述对所述相关交易信息数据进行分析处理,获取各节点与所述中心节点之间的关联强度值,之后还包括:采用预置激活函数对所述关联强度值进行归一化处理。
- 根据权利要求4所述的基于图分类的以太坊网络钓鱼诈骗检测方法,其特征在于,所述再采样处理的过程为:根据所述关联强度值删除所述二阶交易子图网络中的冗余节点。
- 根据权利要求4所述的基于图分类的以太坊网络钓鱼诈骗检测方法,其特征在于,所述再标签处理的过程为:根据历史标签和所述关联强度值对各节点之间的行为差异进行分析处理,并根据分析结果增加细化节点标签。
- 一种基于图分类的以太坊网络钓鱼诈骗检测装置,其特征在于,包括:提取模块,用于从所述以太坊网络中提取目标节点和预置阶邻居节点,所述目标节点包括带标记的钓鱼节点和非钓鱼节点,所述预置阶邻居节点包括一阶邻居节点和二阶邻居节点;构建模块,用于根据所述一阶邻居节点和所述二阶邻居节点构建以所述目标节点为中心节点的二阶交易子图网络;提炼模块,用于根据所述二阶交易子图网络中各节点的相关交易信息数据对所述二阶交易子图网络进行提炼处理,得到目标交易子图网络;学习模块,用于采用预置图嵌入算法提取所述目标交易子图网络中的特征,得到网络表示向量;分类模块,用于将所述网络表示向量输入预置分类器中进行二分类处理,得到目标钓鱼节点。
- 根据权利要求8所述的基于图分类的以太坊网络钓鱼诈骗检测装置,其特征在于,所述提取模块,具体用于:在以太坊网络中获取历史交易记录,所述历史交易记录包括节点账户地址、历史交易金额、历史交易时间戳、历史交易流方向和历史交易次数;从所述历史交易记录中提取所述目标节点和所述预置阶邻居节点。
- 根据权利要求9所述的基于图分类的以太坊网络钓鱼诈骗检测装置,其特征在于,所述构建模块,具体用于:将所述目标节点作为中心节点,并根据所述中心节点和所述一阶邻居节点构建一阶网络连边;根据所述一阶邻居节点和所述二阶邻居节点构建二阶网络连边;将所述历史交易记录存储至各节点中,得到所述二阶交易子图网络。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011417306.9 | 2020-12-07 | ||
CN202011417306.9A CN112600810B (zh) | 2020-12-07 | 2020-12-07 | 一种基于图分类的以太坊网络钓鱼诈骗检测方法及装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022121145A1 true WO2022121145A1 (zh) | 2022-06-16 |
Family
ID=75188630
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/081726 WO2022121145A1 (zh) | 2020-12-07 | 2021-03-19 | 一种基于图分类的以太坊网络钓鱼诈骗检测方法及装置 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112600810B (zh) |
WO (1) | WO2022121145A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116192650A (zh) * | 2023-02-21 | 2023-05-30 | 湖南大学 | 一种基于子图特征的链路预测方法 |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113064953B (zh) * | 2021-04-21 | 2023-08-22 | 湖南天河国云科技有限公司 | 基于邻居信息聚合的区块链地址聚类方法及装置 |
CN113364748B (zh) * | 2021-05-25 | 2022-04-19 | 浙江工业大学 | 一种基于交易子图网络的以太坊钓鱼节点检测方法及系统 |
CN113283902B (zh) * | 2021-06-11 | 2023-05-09 | 浙江工业大学 | 一种基于图神经网络的多通道区块链钓鱼节点检测方法 |
CN113362071A (zh) * | 2021-06-21 | 2021-09-07 | 浙江工业大学 | 一种针对以太坊平台的庞氏骗局识别方法及系统 |
CN113627947A (zh) * | 2021-08-10 | 2021-11-09 | 同盾科技有限公司 | 交易行为检测方法、装置、电子设备及存储介质 |
CN114520739A (zh) * | 2022-02-14 | 2022-05-20 | 东南大学 | 一种基于加密货币交易网络节点分类的钓鱼地址识别方法 |
CN114677217B (zh) * | 2022-03-14 | 2023-02-07 | 北京交通大学 | 一种基于子图匹配的面向以太坊的异常交易行为检测方法 |
CN115907770B (zh) * | 2022-11-18 | 2023-09-29 | 北京理工大学 | 一种基于时序特征融合的以太坊钓鱼欺诈识别与预警方法 |
CN116524723B (zh) * | 2023-06-27 | 2023-09-12 | 天津大学 | 一种货车轨迹异常识别方法及系统 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109447109A (zh) * | 2018-09-17 | 2019-03-08 | 浙江工业大学 | 一种基于子图网络的图分类方法 |
CN111447179A (zh) * | 2020-03-03 | 2020-07-24 | 中山大学 | 一种针对以太网钓鱼诈骗的网络表示学习方法 |
US20200311735A1 (en) * | 2019-03-26 | 2020-10-01 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method, apparatus and storage medium for processing ethereum-based falsified transaction |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108537342A (zh) * | 2018-03-05 | 2018-09-14 | 浙江大学 | 一种基于邻居信息的网络表示学习方法及系统 |
CN110177179B (zh) * | 2019-05-16 | 2020-12-29 | 国家计算机网络与信息安全管理中心 | 一种基于图嵌入的诈骗号码识别方法 |
CN110555455A (zh) * | 2019-06-18 | 2019-12-10 | 东华大学 | 一种基于实体关系的在线交易欺诈检测方法 |
CN111008872B (zh) * | 2019-12-16 | 2022-06-14 | 华中科技大学 | 一种适用于以太坊的用户画像构建方法及系统 |
CN111260462B (zh) * | 2020-01-16 | 2022-05-27 | 东华大学 | 一种基于异质关系网络注意力机制的交易欺诈检测方法 |
CN111368147B (zh) * | 2020-02-25 | 2021-07-06 | 支付宝(杭州)信息技术有限公司 | 图特征处理的方法及装置 |
-
2020
- 2020-12-07 CN CN202011417306.9A patent/CN112600810B/zh active Active
-
2021
- 2021-03-19 WO PCT/CN2021/081726 patent/WO2022121145A1/zh active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109447109A (zh) * | 2018-09-17 | 2019-03-08 | 浙江工业大学 | 一种基于子图网络的图分类方法 |
US20200311735A1 (en) * | 2019-03-26 | 2020-10-01 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method, apparatus and storage medium for processing ethereum-based falsified transaction |
CN111447179A (zh) * | 2020-03-03 | 2020-07-24 | 中山大学 | 一种针对以太网钓鱼诈骗的网络表示学习方法 |
Non-Patent Citations (1)
Title |
---|
YUAN ZIHAO, YUAN QI, WU JIAJING: "Phishing Detection on Ethereum via Learning Representation of Transaction Subgraphs", BLOCKCHAIN AND TRUSTWORTHY SYSTEMS : SECOND INTERNATIONAL CONFERENCE, BLOCKSYS 2020, DALI, CHINA, AUGUST 6–7, 2020, vol. 1267, 1 January 2020 (2020-01-01) - 7 August 2020 (2020-08-07), Singapore, pages 178 - 191, XP009538168, ISBN: 978-981-15-9213-3, DOI: 10.1007/978-981-15-9213-3_14 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116192650A (zh) * | 2023-02-21 | 2023-05-30 | 湖南大学 | 一种基于子图特征的链路预测方法 |
CN116192650B (zh) * | 2023-02-21 | 2024-04-30 | 湖南大学 | 一种基于子图特征的链路预测方法 |
Also Published As
Publication number | Publication date |
---|---|
CN112600810A (zh) | 2021-04-02 |
CN112600810B (zh) | 2021-10-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022121145A1 (zh) | 一种基于图分类的以太坊网络钓鱼诈骗检测方法及装置 | |
WO2020199621A1 (zh) | 基于知识图谱检测欺诈 | |
CN111447179A (zh) | 一种针对以太网钓鱼诈骗的网络表示学习方法 | |
CN110930246A (zh) | 信贷反欺诈识别方法、装置、计算机设备及计算机可读存储介质 | |
Zhou et al. | Analyzing and detecting money-laundering accounts in online social networks | |
CN109754258B (zh) | 一种基于个体行为建模的面向线上交易欺诈检测方法 | |
CN115378629B (zh) | 基于图神经网络的以太坊网络异常检测方法、系统和存储介质 | |
TWI752349B (zh) | 風險識別方法及裝置 | |
CN111325619A (zh) | 一种基于联合学习的信用卡欺诈检测模型更新方法及装置 | |
CN111179089B (zh) | 洗钱交易识别方法、装置和设备 | |
CN115943406A (zh) | 基于身份图的欺诈检测系统和方法 | |
CN110084609B (zh) | 一种基于表征学习的交易欺诈行为深度检测方法 | |
CN114187112A (zh) | 账户风险模型的训练方法和风险用户群体的确定方法 | |
CN108492001A (zh) | 一种用于担保贷款网络风险管理的方法 | |
CN114240659A (zh) | 一种基于动态图卷积神经网络的区块链异常节点识别方法 | |
WO2023045691A1 (zh) | 对象识别方法、装置、电子设备及存储介质 | |
JP2022548501A (ja) | 暗号通貨取引を分析するためのデータ取得方法及び装置 | |
WO2021053646A1 (en) | Detection of presence of malicious tools on mobile devices | |
Tong et al. | Financial transaction fraud detector based on imbalance learning and graph neural network | |
Thammareddi et al. | Analysis On cybersecurity threats in modern banking and machine learning techniques for fraud detection | |
CN112750038A (zh) | 交易风险的确定方法、装置和服务器 | |
CN116883151A (zh) | 对用户风险的评估系统进行训练的方法及装置 | |
Xiao et al. | Explainable fraud detection for few labeled time series data | |
CN110097258A (zh) | 一种用户关系网络建立方法、装置及计算机可读存储介质 | |
Wang | Overview of Digital Finance Anti-fraud |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21901862 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 02.02.2024) |