CN114722920A - Deep map convolution model phishing account identification method based on map classification - Google Patents

Deep map convolution model phishing account identification method based on map classification Download PDF

Info

Publication number
CN114722920A
CN114722920A CN202210276108.8A CN202210276108A CN114722920A CN 114722920 A CN114722920 A CN 114722920A CN 202210276108 A CN202210276108 A CN 202210276108A CN 114722920 A CN114722920 A CN 114722920A
Authority
CN
China
Prior art keywords
account
transaction
network
phishing
order
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202210276108.8A
Other languages
Chinese (zh)
Inventor
宣琦
徐欣瑶
李盼盼
王金焕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202210276108.8A priority Critical patent/CN114722920A/en
Publication of CN114722920A publication Critical patent/CN114722920A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Security & Cryptography (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Computer Hardware Design (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Virology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a depth map convolution model phishing account identification method based on map classification, which comprises the following steps: step S1: constructing a lightweight data set from the published Etherhouse transaction records; s2: comprehensively considering a network topological structure, and sampling the transaction subgraph to obtain a small-scale subgraph; s3: and (3) learning potential transaction behavior patterns of the account through a graph volume deep neural network of Chebyshev, and realizing classification and detection of the phishing account for the Etheng account. The invention reasonably reduces the calculation data scale, improves the calculation efficiency, can accurately distinguish the phishing account from the non-phishing account, and helps the digital currency platform and the user to avoid fraud risks.

Description

Deep map convolution model phishing account identification method based on map classification
Technical Field
The invention relates to the technical field of block chains, in particular to a detection method for an Ether house phishing account.
Background
With the development of computer technology and the popularization of internet applications, electronic money is beginning to rise and is becoming a large component of the electronic financial field. The development of electronic money based on blockchain technology is started by a decentralized encrypted electronic money system based on P2P network, which is proposed by the inventor, and this also marks the formal start of the operation of bitcoin system. The blockchain is a distributed account book technology, and can guarantee the credible intermediary transaction between real-time nodes in a non-mutual trust environment. The blockchain technology is widely applied in various fields, wherein the cryptocurrency technology is one of the most widely applied fields of the blockchain. The block chain technology has outstanding advantages in the aspects of decentralization, openness and the like. Through the cryptocurrency technology, the account can freely transact currency and information without depending on the traditional third party; the transaction between the two addresses is permanently recorded in a public block and broadcasted to the whole network, and public and transparent security is guaranteed. However, in recent years, the cryptocurrency market inevitably has proliferated many cyber crime events due to the anonymity of the blockchain and the characteristics of unsupervised organizations.
EtherFang is the second largest cryptocurrency platform next to Bingjin and is also the largest intelligent contract support platform based on blockchain. A smart contract is a piece of code that is not tamperproof, process transparent, and uninterrupted in execution. The Ethengfang supports users to carry out picture-based complete language programming in the form of intelligent contracts, greatly enriches the levels and scenes of encryption currency trade, and further derives multiple applications of the block chain technology in the economic and financial field. While the hashing mechanism provided in the blockchain can prevent transactions from being tampered with, no internal tool has been available to date to detect illegal accounts and suspicious transactions on the network. Therefore, phishing fraud has become a key issue for etherhouses, deserving long-term attention and research and taking effective countermeasures.
Common approaches based on email detection and website detection are not suitable in this context due to the differences of phishing fraud approaches at etherhouses from traditional phishing accounts. Therefore, the relevant algorithm based on the network data mining field is considered to be used for extracting and learning effective information from the transaction network topological structure, the difference of the phishing account and the normal account in the transaction behavior is distinguished, and the phishing behavior is detected.
There are currently some methods for identifying phishing accounts based on network data mining techniques. Chinese patent application publication No. CN 112600810 a provides a graph-classification-based method for detecting phishing fraud in an ethernet workshop, which extracts a target node and preset first-order and second-order transaction neighbor nodes from the ethernet workshop network, learns a graph representation vector by using a graph-embedded algorithm, and learns and classifies through a classifier. Chinese patent application publication No. CN 112734425 a proposes a method for extracting transaction characteristics by using a transaction topology network and an intelligent contract, and then inputting the transaction characteristics into a classifier for identification. After the features are extracted by the two methods, the classifier is required to be trained again to detect the phishing account, and the end-to-end rapidity cannot be realized. The method of the Chinese patent application with the publication number of CN 113111930A is that from the perspective of a transaction subgraph, 20 neighbor information with the largest transaction amount of a target node is screened out, a second-order transaction network is constructed, a graph neural network is trained, and whether the target node is a phishing account or not is predicted.
Disclosure of Invention
The invention provides a deep map convolution model phishing account identification method based on map classification, which aims to overcome the defects in the technology, utilizes the map convolution neural network technology to dig out potential information of a transaction network to identify the phishing account, improves the calculation efficiency of network analysis, and ensures the end-to-end rapidity.
The invention provides a deep model fishing account identification method based on graph classification, which comprises the following steps:
s1: a lightweight data set is constructed. Sampling is carried out from the open Ether house transaction records, after the large-scale data are lightened, a second-order transaction sub-graph network is constructed, and the characteristics of the account in the network are extracted. Wherein the target account contains fishing nodes and non-fishing nodes that have been marked; the transaction object comprises a first-order neighbor node and a second-order neighbor node of the target node; the characteristics comprise designated characteristics of a fishing account and a non-fishing account in the lightweight data set;
s2: sampling the transaction subgraph, comprehensively considering the topological structure of the network, constructing a calculation formula of the number of the neighbors of the target node according to the attributes of the network average value, the network density, the number of the nodes and the number of the connecting edges, and obtaining the subgraph scale k with uniform and reasonable size. When the number of neighbors is less than k, all the neighbor nodes are reserved; if the number of the neighbors is larger than k, the attributes of the transaction amount and the transaction times of the neighbor nodes of the target node are sorted, and then k neighbors are reserved to obtain a small-scale sub-graph after sampling;
s3: through a graph volume deep neural network of Chebyshev, potential transaction behavior patterns of the account are learned, and end-to-end identification of the phishing account is realized.
Further, the step S1 specifically includes:
s1.1: extracting small-scale transaction data by a second-order breadth-first search algorithm (BFS) by taking a target account address as a starting point;
s1.2: based on the lightweight data of step S1.1, the dataset is again lightweight using a random walk sampling algorithm. The walking algorithm firstly randomly selects an account as a starting node, and samples forward by taking the account as a starting point to obtain a walking sequence with a fixed length. If the sequence does not reach the preset length in the sampling process, and a certain account does not have a transaction account, an account accessed in the sequence needs to be randomly selected to restart the wandering process.
S1.3: the accounts in the second-order trading networks of phishing and non-phishing are characterized separately.
Further, the step S2 specifically includes:
s2.1: when a trading network is constructed, an excessively large trading sample size causes large time complexity to affect the calculation efficiency, so that certain constraints are required on the number of neighbors and the neighborhood order. The patent provides a formula for calculating the number of neighbors, which is used for sequencing the neighbors of h order and obtaining k neighbor nodes, wherein the formula for calculating the number k of the neighbor nodes is as follows:
Figure BDA0003555801860000031
wherein,
Figure BDA0003555801860000032
represents the average value of the network, Density represents the network Density,
Figure BDA0003555801860000033
indicating rounding up on ·, | V | and | E | indicate the number of nodes and edges of the network, respectively.
Further, the step S3 specifically includes:
s3.1: the second order transaction network for each account is represented in the form of a set of vectors. The second order transaction network for each target account may be denoted by G ═ V, E, a, X, y. Where V is the set of all nodes that the trading network contains. E is a set of directed edges in the transaction network, defined as
Figure BDA0003555801860000034
A is an adjacency matrix of a transaction network and is expressed as A epsilon Rn×n. X is a node characteristic, and can be used as X belonging to Rn×dWhere d represents the dimension of the feature and n represents the total number of nodes. y represents whether the target node is a phishing account, y-1 represents that the target node is a phishing account, and y-0 represents that the target node is not a phishing account.
S3.2: by using the graph convolutional layer automatic aggregation node field information of the Chebyshev GCN, the convolutional layer form of the Chebyshev GCN is defined as:
Figure BDA0003555801860000035
wherein, betakAre coefficients corresponding to the Chebyshev polynomial, these parameters will be updated iteratively in the training, and X is the node feature vector of the second order trading network.
Figure BDA0003555801860000036
Is a Chebyshev polynomial of order k due to Tk(x) Cos (k. arccos (x)), hence the diagonal matrix of eigenvalues
Figure BDA0003555801860000037
Needs to be fixed at [ -1,1 [)]In between, expressed as:
Figure BDA0003555801860000041
where lambda ismaxIs obtained by a power iteration method, and L is a Laplace matrix
Figure BDA0003555801860000042
The advantage of such a transformation is that the computation process does not need to perform the eigenvector decomposition anymore. Since the extracted second-order transaction subgraph is a directed network, the laplacian matrix is transformed to:
Figure BDA0003555801860000043
where a is the adjacency matrix of the transaction subgraph,
Figure BDA0003555801860000044
is the sum of the adjacency matrix and its transpose,
Figure BDA0003555801860000045
is a deformed adjacency matrix
Figure BDA0003555801860000046
The degree matrix of (2) is a diagonal matrix. σ () is the activation function, and ReLu () max (0,) is chosen as the activation function.
In the actual operation process, the property of the Chebyshev polynomial can be utilized to obtain the recursion:
Figure BDA0003555801860000047
Figure BDA0003555801860000048
the scheme adopts two layers of Chebyshev GCN to aggregate neighborhood information of the target node, and the transaction subgraph feature extracted by taking the target account u as the center is represented as ou=gs。
S3.3: and extracting feature information after convolution of two layers of Chebyshev GCN in the step S3.2 by using a pooling function. The pooling function here is an average pooling function, and the node features are pooled into graph features by an average pooling layer, defined as:
ypooling=AvgPooling(ou) (6)
s3.4: further training a full connection layer to distinguish phishing accounts from non-phishing accounts by using features:
Figure BDA0003555801860000049
where W and b are the trainable weight matrix and bias matrix respectively,
Figure BDA00035558018600000410
is the probability matrix of the final prediction result.
All the above trainable parameters are updated optimally by minimizing the following cross entropy loss function and using a gradient descent method:
Figure BDA00035558018600000411
the invention has the advantages that:
1. the method of extracting the second-order transaction network of the account and dynamically selecting the number of neighbors effectively avoids huge storage loss and operation loss required by using complete network data;
2. the high requirement on professional knowledge is relieved through the depth map neural network;
3. the phishing account is distinguished through the graph neural network, and phishing behaviors in the virtual currency field are effectively predicted.
4. The precision of the phishing account detection method provided by the invention is superior to that of the existing detection methods such as walking, graph embedding and the like.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments will be briefly described below.
FIG. 1 is a flow chart of the present invention.
Fig. 2 is a sampling method of the present invention.
Fig. 3 is a schematic diagram of the second order transaction sub-graph sampling process of the present invention.
Detailed Description
In order that those skilled in the art will better understand the disclosure, the following detailed description will be given of the embodiments of the present invention, which are described as only a part of the embodiments of the present invention, but not as all embodiments. This description is not to be taken in a limiting sense, but is intended to be a more detailed description of certain aspects and embodiments of the invention. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
Example 1
The technical scheme provides a method for identifying a phishing account in a transaction network based on a deep network model aiming at the transaction information of an Ether house, and specifically comprises the following steps:
s1: a lightweight data set is constructed. Sampling is carried out from the open Ether house transaction records, the large-scale data is lightened, a second-order transaction subgraph network is constructed, and the characteristics of accounts in the network are extracted.
S1.1: 1165 phishing accounts are collected from the tag cloud of the etherhouse blockchain browser etherscan. The lightweight data set has 1686003 accounts and 4380616 transaction records. The extracted transaction data set has 167 weak connected components, and only the maximum weak connected component is used, so that 1684164 accounts and 4378716 transaction records are shared;
s1.2: on the basis of the light weight data, the data set is subjected to light weight operation again by using a random walk sampling algorithm. Starting from one node, random walks were performed to obtain five networks of different sizes, as shown in table 1:
data set Number of nodes Number of connecting edges Number of fishing nodes
G1 20000 131189 242
G2 30000 172011 363
G3 40000 202595 462
G4 50000 227854 556
G5 60000 250402 604
Table 1 five data set information
S1.3: the accounts in the trading networks of phishing and non-phishing are characterized separately.
S2: sampling the transaction subgraph, comprehensively considering the topological structure of the network, constructing a calculation formula of the number of neighbors of the target node according to the attributes of the network average value, the network density, the number of nodes and the number of connecting edges, and obtaining a subgraph scale k with uniform and reasonable size. When the number of neighbors is less than k, all the neighbor nodes are reserved; if the number of neighbors is larger than k, sorting neighbor nodes of the target node according to attributes of transaction amount and transaction times, and then reserving k neighbors to obtain a sampled small-scale sub-graph;
s2.1: when a trading network is constructed, an excessively large trading sample size causes large time complexity to affect the calculation efficiency, so that certain constraints are required on the number of neighbors and the neighborhood order. Sequencing the neighbors of the h order and obtaining k neighbor nodes, wherein the calculation formula of the number k of the neighbor nodes is as follows:
Figure BDA0003555801860000061
wherein,
Figure BDA0003555801860000062
represents the average value of the network, Density represents the network Density,
Figure BDA0003555801860000063
representing the rounding of the pairs, upwards, | V | and | E | represent the number of nodes and connecting edges of the network, respectivelyAnd (4) counting.
S3: through a graph volume deep neural network of Chebyshev, potential transaction behavior patterns of the account are learned, and end-to-end identification of the phishing account is achieved.
S3.1: the second order transaction network for each account is represented in the form of a set of vectors. The second order transaction network for each target account may be denoted by G ═ V, E, a, X, y. Where V is the set of all nodes that the trading network contains. E is a set of directed edges in the transaction network, defined as
Figure BDA0003555801860000064
A is an adjacency matrix of a transaction network and is expressed as A epsilon Rn×n. X is a node characteristic, and can be used as X belonging to Rn×dWhere d represents the dimension of the feature and n represents the total number of nodes. y represents whether the target node is a phishing account, y-1 represents that the target node is a phishing account, and y-0 represents that the target node is not a phishing account.
S3.2: by using the graph convolutional layer automatic aggregation node field information of the Chebyshev GCN, the convolutional layer form of the Chebyshev GCN is defined as:
Figure BDA0003555801860000071
wherein, betakAre coefficients corresponding to the Chebyshev polynomial, these parameters will be updated iteratively in the training, and X is the node feature vector of the second order trading network.
Figure BDA0003555801860000072
Is a Chebyshev polynomial of order k due to Tk(x) Cos (k. arccos (x)), hence the diagonal matrix of eigenvalues
Figure BDA0003555801860000073
Needs to be fixed at [ -1,1 [)]In between, expressed as:
Figure BDA0003555801860000074
where lambda ismaxIs obtained by a power iteration method, and L is a Laplace matrix
Figure BDA0003555801860000075
The benefit of such a transformation is that the computation process does not need to perform the feature vector decomposition any more. Since the extracted second-order transaction subgraph is a directed network, the laplacian matrix is transformed to:
Figure BDA0003555801860000076
where a is the adjacency matrix of the transaction subgraph,
Figure BDA0003555801860000077
is the sum of the adjacency matrix and its transpose,
Figure BDA0003555801860000078
is a deformed adjacency matrix
Figure BDA0003555801860000079
The degree matrix of (2) is a diagonal matrix. σ () is an activation function, and ReLu () (max (0,) is selected as the activation function.
In the actual operation process, the property of the Chebyshev polynomial can be utilized to obtain the recursion:
Figure BDA00035558018600000710
Figure BDA00035558018600000711
the scheme adopts two layers of Chebyshev GCN to aggregate neighborhood information of the target node, and the transaction subgraph feature extracted by taking the target account u as the center is represented as ou=gs。
S3.3: and extracting feature information after convolution of two layers of Chebyshev GCN in the step S3.2 by using a pooling function. The pooling function here is an average pooling function, and the node features are pooled into graph features by an average pooling layer, defined as:
ypooling=AvgPooling(ou) (6)
s3.4: further training a full connection layer to distinguish phishing accounts from non-phishing accounts by using features:
Figure BDA0003555801860000081
where W and b are the trainable weight matrix and bias matrix respectively,
Figure BDA0003555801860000082
is the probability matrix of the final prediction result.
All the above trainable parameters are updated optimally by minimizing the following cross entropy loss function and using a gradient descent method:
Figure BDA0003555801860000083
mixing the algorithm model PDGNN (differentiating Scans Detection in Ethereum using Graph neural network) with Features (FE), LINE, Deepwalk (DW), Node2Vec (N2V), T-EDGE, Graph2Vec (G2V), I2Seven comparison algorithms of BGNN are used for comparison tests. The division ratio of the training set and the test set is 8:2, the fishing account detection experiment of each algorithm is repeated five times and averaged, F1-score is used as an evaluation index to measure the prediction result, and the experiment result is shown in Table 2.
Figure BDA0003555801860000084
TABLE 2 fishing account detection contrast experiment results
According to the analysis of experimental results, the FE effect of the simple feature extraction method is the worst, the effect of the walk algorithm N2V is better than that of the DW due to the addition of the network structure information, and the LINE is relatively better due to the intelligent aggregation of the second-order neighbor informationThe effect is not good. G2V, I2The BGNN and the PDGNN are both graph classification algorithms, and have better performance than a node classification algorithm. While our method pdbnn performs best in graph classification algorithms.

Claims (4)

1. A deep model fishing account identification method based on graph classification is characterized by comprising the following steps: the method comprises the following steps:
s1: constructing a lightweight data set; sampling from the open Ether house transaction records, constructing a second-order transaction sub-graph network after carrying out light weight on large-scale data, and extracting the characteristics of accounts in the network; wherein the target account contains fishing nodes and non-fishing nodes that have been marked; the transaction object comprises a first-order neighbor node and a second-order neighbor node of the target node; the characteristics comprise designated characteristics of a fishing account and a non-fishing account in the lightweight data set;
s2: sampling the transaction subgraph, comprehensively considering the topological structure of the network, constructing a calculation formula of the number of neighbors of the target node according to the attributes of the network average value, the network density, the number of nodes and the number of connecting edges, and obtaining the subgraph scale with uniform and reasonable size; when the number of the neighbors is less than the number of the neighbors, all the neighbor nodes are reserved; if the number of the neighbors is larger than the number of the neighbors, the attributes of the transaction amount and the transaction times of the neighbor nodes of the target node are sorted, and then the neighbors are reserved to obtain a small-scale sub-graph after sampling;
s3: through a graph volume deep neural network of Chebyshev, potential transaction behavior patterns of the account are learned, and end-to-end identification of the phishing account is achieved.
2. The deep model fishing account recognition method based on graph classification as claimed in claim 1, wherein step S1 specifically comprises:
s1.1: extracting small-scale transaction data by using a second-order breadth-first search algorithm BFS by taking a target account address as a starting point;
s1.2: on the basis of the lightweight data of the step S1.1, carrying out light weight operation on the data set again by using a random walk sampling algorithm; firstly, randomly selecting an account as an initial node by a walking algorithm, and sampling forwards by taking the account as a starting point to obtain a walking sequence with a fixed length; if the sequence does not reach the preset length in the sampling process, a certain account does not have a transaction account, an account accessed in the sequence needs to be randomly selected to start the wandering again;
s1.3: the accounts in the second-order trading networks of phishing and non-phishing are characterized separately.
3. The deep model fishing account recognition method based on graph classification as claimed in claim 2, wherein step S2 specifically comprises:
s2.1: in order to constrain the number of neighbors and the neighborhood order, a calculation formula of the number of neighbors is provided, the neighbors of the h order are sequenced, k neighbor nodes are obtained, and the calculation formula of the number k of the neighbor nodes is as follows:
Figure FDA0003555801850000021
wherein,
Figure FDA0003555801850000022
represents the average value of the network, Density represents the network Density,
Figure FDA0003555801850000023
indicating rounding up on ·, | V | and | E | indicate the number of nodes and edges of the network, respectively.
4. The deep model fishing account recognition method based on graph classification as claimed in claim 3, wherein step S3 specifically comprises:
s3.1: representing a second-order transaction network for each account in the form of a set of vectors; the second-order transaction network for each target account may be denoted by G ═ V, E, a, X, y; wherein V is the set of all nodes contained in the trading network; e is a set of directed edges in the transaction network, defined as
Figure FDA0003555801850000024
A is an adjacency matrix of a transaction network and is expressed as A epsilon Rn×n(ii) a X is a node characteristic, and available X belongs to Rn×dRepresenting, wherein d represents the dimension of the feature and n represents the total number of nodes; y represents whether the target node is a phishing account, y is 1 represents that the target node is a phishing account, and y is 0 represents that the target node is not a phishing account;
s3.2: by using the graph convolutional layer automatic aggregation node field information of the Chebyshev GCN, the convolutional layer form of the Chebyshev GCN is defined as:
Figure FDA0003555801850000025
wherein, betakCoefficients corresponding to the Chebyshev polynomial, which parameters are to be iteratively updated in the training, X is a node feature vector of the second-order trading network;
Figure FDA0003555801850000026
is a Chebyshev polynomial of order k due to Tk(x) Cos (k. arccos (x)), hence the diagonal matrix of eigenvalues
Figure FDA0003555801850000027
Needs to be fixed at [ -1,1 [)]In between, expressed as:
Figure FDA0003555801850000028
where lambda ismaxIs obtained by a power iteration method, and L is a Laplace matrix
Figure FDA0003555801850000029
The advantage of such a transformation is that the computation process does not need to perform feature vector decomposition; since the extracted second-order transaction subgraph is a directed network, the laplacian matrix is transformed to:
Figure FDA00035558018500000210
Figure FDA00035558018500000211
where a is the adjacency matrix of the transaction subgraph,
Figure FDA00035558018500000212
is the sum of the adjacency matrix and its transpose,
Figure FDA00035558018500000213
is a deformed adjacency matrix
Figure FDA00035558018500000214
The degree matrix of (1) is a diagonal matrix; σ (-) is the activation function, where ReLu (-) max (0,) is chosen as the activation function;
in the actual operation process, the property of the Chebyshev polynomial can be utilized to obtain the recursion:
Figure FDA0003555801850000031
Figure FDA0003555801850000032
two layers of Chebyshev GCN are adopted to aggregate neighborhood information of the target node, and the transaction subgraph feature extracted by taking the target account u as the center is represented as ou=gs;
S3.3: extracting feature information after convolution of two layers of Chebyshev GCN in the step S3.2 by using a pooling function; the pooling function here is an average pooling function, and the node features are pooled into graph features by an average pooling layer, defined as:
ypooling=AvgPooling(ou) (6)
s3.4: further training a full connection layer to distinguish phishing accounts from non-phishing accounts by using features:
Figure FDA0003555801850000033
where W and b are the trainable weight matrix and bias matrix respectively,
Figure FDA0003555801850000034
a probability matrix which is the final prediction result;
all the above trainable parameters are updated optimally by minimizing the following cross entropy loss function and using a gradient descent method:
Figure FDA0003555801850000035
CN202210276108.8A 2022-03-21 2022-03-21 Deep map convolution model phishing account identification method based on map classification Withdrawn CN114722920A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210276108.8A CN114722920A (en) 2022-03-21 2022-03-21 Deep map convolution model phishing account identification method based on map classification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210276108.8A CN114722920A (en) 2022-03-21 2022-03-21 Deep map convolution model phishing account identification method based on map classification

Publications (1)

Publication Number Publication Date
CN114722920A true CN114722920A (en) 2022-07-08

Family

ID=82237634

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210276108.8A Withdrawn CN114722920A (en) 2022-03-21 2022-03-21 Deep map convolution model phishing account identification method based on map classification

Country Status (1)

Country Link
CN (1) CN114722920A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115907770A (en) * 2022-11-18 2023-04-04 北京理工大学 Ether house phishing fraud identification and early warning method based on time sequence feature fusion

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115907770A (en) * 2022-11-18 2023-04-04 北京理工大学 Ether house phishing fraud identification and early warning method based on time sequence feature fusion
CN115907770B (en) * 2022-11-18 2023-09-29 北京理工大学 Ethernet phishing fraud identification and early warning method based on time sequence feature fusion

Similar Documents

Publication Publication Date Title
Park et al. Graph transplant: Node saliency-guided graph mixup with local structure preservation
CN111931505A (en) Cross-language entity alignment method based on subgraph embedding
CN111008337B (en) Deep attention rumor identification method and device based on ternary characteristics
CN113422761B (en) Malicious social user detection method based on counterstudy
CN114172688B (en) Method for automatically extracting key nodes of network threat of encrypted traffic based on GCN-DL (generalized traffic channel-DL)
CN114239083B (en) Efficient state register identification method based on graph neural network
CN113657896A (en) Block chain transaction topological graph analysis method and device based on graph neural network
CN115659966A (en) Rumor detection method and system based on dynamic heteromorphic graph and multi-level attention
CN113283902A (en) Multi-channel block chain fishing node detection method based on graph neural network
CN115118451B (en) Network intrusion detection method combined with graph embedded knowledge modeling
CN114897085A (en) Clustering method based on closed subgraph link prediction and computer equipment
CN115687760A (en) User learning interest label prediction method based on graph neural network
CN114722920A (en) Deep map convolution model phishing account identification method based on map classification
CN112597399B (en) Graph data processing method and device, computer equipment and storage medium
CN117114105B (en) Target object recommendation method and system based on scientific research big data information
CN112801784A (en) Bit currency address mining method and device for digital currency exchange
Morshed et al. LeL-GNN: Learnable edge sampling and line based graph neural network for link prediction
CN115965466A (en) Sub-graph comparison-based Ethernet room account identity inference method and system
CN116578904A (en) Block chain address attribute classification method and system based on integrated machine learning
CN114519605A (en) Advertisement click fraud detection method, system, server and storage medium
CN115906080A (en) Ether house phishing detection method, system, electronic device and medium
CN114706977A (en) Rumor detection method and system based on dynamic multi-hop graph attention network
CN113657441A (en) Classification algorithm based on weighted Pearson correlation coefficient and combined with feature screening
Yuan et al. A Multi‐Granularity Backbone Network Extraction Method Based on the Topology Potential
Xue et al. Tsc-gcn: A face clustering method based on gcn

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20220708

WW01 Invention patent application withdrawn after publication