CN114722920A - Deep map convolution model phishing account identification method based on map classification - Google Patents
Deep map convolution model phishing account identification method based on map classification Download PDFInfo
- Publication number
- CN114722920A CN114722920A CN202210276108.8A CN202210276108A CN114722920A CN 114722920 A CN114722920 A CN 114722920A CN 202210276108 A CN202210276108 A CN 202210276108A CN 114722920 A CN114722920 A CN 114722920A
- Authority
- CN
- China
- Prior art keywords
- account
- transaction
- network
- phishing
- order
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000005070 sampling Methods 0.000 claims abstract description 17
- 238000004364 calculation method Methods 0.000 claims abstract description 11
- 238000013528 artificial neural network Methods 0.000 claims abstract description 9
- 239000011159 matrix material Substances 0.000 claims description 36
- 230000006870 function Effects 0.000 claims description 18
- 238000011176 pooling Methods 0.000 claims description 15
- 238000004422 calculation algorithm Methods 0.000 claims description 11
- 230000008569 process Effects 0.000 claims description 11
- RTZKZFJDLAIYFH-UHFFFAOYSA-N Diethyl ether Chemical compound CCOCC RTZKZFJDLAIYFH-UHFFFAOYSA-N 0.000 claims description 10
- 239000013598 vector Substances 0.000 claims description 9
- 238000012549 training Methods 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 6
- 230000008901 benefit Effects 0.000 claims description 5
- 230000002776 aggregation Effects 0.000 claims description 4
- 238000004220 aggregation Methods 0.000 claims description 4
- 238000005295 random walk Methods 0.000 claims description 4
- 238000000354 decomposition reaction Methods 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000010845 search algorithm Methods 0.000 claims description 2
- 238000001514 detection method Methods 0.000 abstract description 9
- 230000006399 behavior Effects 0.000 abstract description 6
- 238000005516 engineering process Methods 0.000 description 10
- 238000007635 classification algorithm Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000007418 data mining Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000012163 sequencing technique Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- RWSOTUBLDIXVET-UHFFFAOYSA-N Dihydrogen sulfide Chemical compound S RWSOTUBLDIXVET-UHFFFAOYSA-N 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000003012 network analysis Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computer Security & Cryptography (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Computer Hardware Design (AREA)
- Biomedical Technology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Virology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The application discloses a depth map convolution model phishing account identification method based on map classification, which comprises the following steps: step S1: constructing a lightweight data set from the published Etherhouse transaction records; s2: comprehensively considering a network topological structure, and sampling the transaction subgraph to obtain a small-scale subgraph; s3: and (3) learning potential transaction behavior patterns of the account through a graph volume deep neural network of Chebyshev, and realizing classification and detection of the phishing account for the Etheng account. The invention reasonably reduces the calculation data scale, improves the calculation efficiency, can accurately distinguish the phishing account from the non-phishing account, and helps the digital currency platform and the user to avoid fraud risks.
Description
Technical Field
The invention relates to the technical field of block chains, in particular to a detection method for an Ether house phishing account.
Background
With the development of computer technology and the popularization of internet applications, electronic money is beginning to rise and is becoming a large component of the electronic financial field. The development of electronic money based on blockchain technology is started by a decentralized encrypted electronic money system based on P2P network, which is proposed by the inventor, and this also marks the formal start of the operation of bitcoin system. The blockchain is a distributed account book technology, and can guarantee the credible intermediary transaction between real-time nodes in a non-mutual trust environment. The blockchain technology is widely applied in various fields, wherein the cryptocurrency technology is one of the most widely applied fields of the blockchain. The block chain technology has outstanding advantages in the aspects of decentralization, openness and the like. Through the cryptocurrency technology, the account can freely transact currency and information without depending on the traditional third party; the transaction between the two addresses is permanently recorded in a public block and broadcasted to the whole network, and public and transparent security is guaranteed. However, in recent years, the cryptocurrency market inevitably has proliferated many cyber crime events due to the anonymity of the blockchain and the characteristics of unsupervised organizations.
EtherFang is the second largest cryptocurrency platform next to Bingjin and is also the largest intelligent contract support platform based on blockchain. A smart contract is a piece of code that is not tamperproof, process transparent, and uninterrupted in execution. The Ethengfang supports users to carry out picture-based complete language programming in the form of intelligent contracts, greatly enriches the levels and scenes of encryption currency trade, and further derives multiple applications of the block chain technology in the economic and financial field. While the hashing mechanism provided in the blockchain can prevent transactions from being tampered with, no internal tool has been available to date to detect illegal accounts and suspicious transactions on the network. Therefore, phishing fraud has become a key issue for etherhouses, deserving long-term attention and research and taking effective countermeasures.
Common approaches based on email detection and website detection are not suitable in this context due to the differences of phishing fraud approaches at etherhouses from traditional phishing accounts. Therefore, the relevant algorithm based on the network data mining field is considered to be used for extracting and learning effective information from the transaction network topological structure, the difference of the phishing account and the normal account in the transaction behavior is distinguished, and the phishing behavior is detected.
There are currently some methods for identifying phishing accounts based on network data mining techniques. Chinese patent application publication No. CN 112600810 a provides a graph-classification-based method for detecting phishing fraud in an ethernet workshop, which extracts a target node and preset first-order and second-order transaction neighbor nodes from the ethernet workshop network, learns a graph representation vector by using a graph-embedded algorithm, and learns and classifies through a classifier. Chinese patent application publication No. CN 112734425 a proposes a method for extracting transaction characteristics by using a transaction topology network and an intelligent contract, and then inputting the transaction characteristics into a classifier for identification. After the features are extracted by the two methods, the classifier is required to be trained again to detect the phishing account, and the end-to-end rapidity cannot be realized. The method of the Chinese patent application with the publication number of CN 113111930A is that from the perspective of a transaction subgraph, 20 neighbor information with the largest transaction amount of a target node is screened out, a second-order transaction network is constructed, a graph neural network is trained, and whether the target node is a phishing account or not is predicted.
Disclosure of Invention
The invention provides a deep map convolution model phishing account identification method based on map classification, which aims to overcome the defects in the technology, utilizes the map convolution neural network technology to dig out potential information of a transaction network to identify the phishing account, improves the calculation efficiency of network analysis, and ensures the end-to-end rapidity.
The invention provides a deep model fishing account identification method based on graph classification, which comprises the following steps:
s1: a lightweight data set is constructed. Sampling is carried out from the open Ether house transaction records, after the large-scale data are lightened, a second-order transaction sub-graph network is constructed, and the characteristics of the account in the network are extracted. Wherein the target account contains fishing nodes and non-fishing nodes that have been marked; the transaction object comprises a first-order neighbor node and a second-order neighbor node of the target node; the characteristics comprise designated characteristics of a fishing account and a non-fishing account in the lightweight data set;
s2: sampling the transaction subgraph, comprehensively considering the topological structure of the network, constructing a calculation formula of the number of the neighbors of the target node according to the attributes of the network average value, the network density, the number of the nodes and the number of the connecting edges, and obtaining the subgraph scale k with uniform and reasonable size. When the number of neighbors is less than k, all the neighbor nodes are reserved; if the number of the neighbors is larger than k, the attributes of the transaction amount and the transaction times of the neighbor nodes of the target node are sorted, and then k neighbors are reserved to obtain a small-scale sub-graph after sampling;
s3: through a graph volume deep neural network of Chebyshev, potential transaction behavior patterns of the account are learned, and end-to-end identification of the phishing account is realized.
Further, the step S1 specifically includes:
s1.1: extracting small-scale transaction data by a second-order breadth-first search algorithm (BFS) by taking a target account address as a starting point;
s1.2: based on the lightweight data of step S1.1, the dataset is again lightweight using a random walk sampling algorithm. The walking algorithm firstly randomly selects an account as a starting node, and samples forward by taking the account as a starting point to obtain a walking sequence with a fixed length. If the sequence does not reach the preset length in the sampling process, and a certain account does not have a transaction account, an account accessed in the sequence needs to be randomly selected to restart the wandering process.
S1.3: the accounts in the second-order trading networks of phishing and non-phishing are characterized separately.
Further, the step S2 specifically includes:
s2.1: when a trading network is constructed, an excessively large trading sample size causes large time complexity to affect the calculation efficiency, so that certain constraints are required on the number of neighbors and the neighborhood order. The patent provides a formula for calculating the number of neighbors, which is used for sequencing the neighbors of h order and obtaining k neighbor nodes, wherein the formula for calculating the number k of the neighbor nodes is as follows:
wherein,represents the average value of the network, Density represents the network Density,indicating rounding up on ·, | V | and | E | indicate the number of nodes and edges of the network, respectively.
Further, the step S3 specifically includes:
s3.1: the second order transaction network for each account is represented in the form of a set of vectors. The second order transaction network for each target account may be denoted by G ═ V, E, a, X, y. Where V is the set of all nodes that the trading network contains. E is a set of directed edges in the transaction network, defined asA is an adjacency matrix of a transaction network and is expressed as A epsilon Rn×n. X is a node characteristic, and can be used as X belonging to Rn×dWhere d represents the dimension of the feature and n represents the total number of nodes. y represents whether the target node is a phishing account, y-1 represents that the target node is a phishing account, and y-0 represents that the target node is not a phishing account.
S3.2: by using the graph convolutional layer automatic aggregation node field information of the Chebyshev GCN, the convolutional layer form of the Chebyshev GCN is defined as:
wherein, betakAre coefficients corresponding to the Chebyshev polynomial, these parameters will be updated iteratively in the training, and X is the node feature vector of the second order trading network.Is a Chebyshev polynomial of order k due to Tk(x) Cos (k. arccos (x)), hence the diagonal matrix of eigenvaluesNeeds to be fixed at [ -1,1 [)]In between, expressed as:
where lambda ismaxIs obtained by a power iteration method, and L is a Laplace matrixThe advantage of such a transformation is that the computation process does not need to perform the eigenvector decomposition anymore. Since the extracted second-order transaction subgraph is a directed network, the laplacian matrix is transformed to:where a is the adjacency matrix of the transaction subgraph,is the sum of the adjacency matrix and its transpose,is a deformed adjacency matrixThe degree matrix of (2) is a diagonal matrix. σ () is the activation function, and ReLu () max (0,) is chosen as the activation function.
In the actual operation process, the property of the Chebyshev polynomial can be utilized to obtain the recursion:
the scheme adopts two layers of Chebyshev GCN to aggregate neighborhood information of the target node, and the transaction subgraph feature extracted by taking the target account u as the center is represented as ou=gs。
S3.3: and extracting feature information after convolution of two layers of Chebyshev GCN in the step S3.2 by using a pooling function. The pooling function here is an average pooling function, and the node features are pooled into graph features by an average pooling layer, defined as:
ypooling=AvgPooling(ou) (6)
s3.4: further training a full connection layer to distinguish phishing accounts from non-phishing accounts by using features:
where W and b are the trainable weight matrix and bias matrix respectively,is the probability matrix of the final prediction result.
All the above trainable parameters are updated optimally by minimizing the following cross entropy loss function and using a gradient descent method:
the invention has the advantages that:
1. the method of extracting the second-order transaction network of the account and dynamically selecting the number of neighbors effectively avoids huge storage loss and operation loss required by using complete network data;
2. the high requirement on professional knowledge is relieved through the depth map neural network;
3. the phishing account is distinguished through the graph neural network, and phishing behaviors in the virtual currency field are effectively predicted.
4. The precision of the phishing account detection method provided by the invention is superior to that of the existing detection methods such as walking, graph embedding and the like.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments will be briefly described below.
FIG. 1 is a flow chart of the present invention.
Fig. 2 is a sampling method of the present invention.
Fig. 3 is a schematic diagram of the second order transaction sub-graph sampling process of the present invention.
Detailed Description
In order that those skilled in the art will better understand the disclosure, the following detailed description will be given of the embodiments of the present invention, which are described as only a part of the embodiments of the present invention, but not as all embodiments. This description is not to be taken in a limiting sense, but is intended to be a more detailed description of certain aspects and embodiments of the invention. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
Example 1
The technical scheme provides a method for identifying a phishing account in a transaction network based on a deep network model aiming at the transaction information of an Ether house, and specifically comprises the following steps:
s1: a lightweight data set is constructed. Sampling is carried out from the open Ether house transaction records, the large-scale data is lightened, a second-order transaction subgraph network is constructed, and the characteristics of accounts in the network are extracted.
S1.1: 1165 phishing accounts are collected from the tag cloud of the etherhouse blockchain browser etherscan. The lightweight data set has 1686003 accounts and 4380616 transaction records. The extracted transaction data set has 167 weak connected components, and only the maximum weak connected component is used, so that 1684164 accounts and 4378716 transaction records are shared;
s1.2: on the basis of the light weight data, the data set is subjected to light weight operation again by using a random walk sampling algorithm. Starting from one node, random walks were performed to obtain five networks of different sizes, as shown in table 1:
data set | Number of nodes | Number of connecting edges | Number of fishing nodes |
G1 | 20000 | 131189 | 242 |
G2 | 30000 | 172011 | 363 |
G3 | 40000 | 202595 | 462 |
G4 | 50000 | 227854 | 556 |
G5 | 60000 | 250402 | 604 |
Table 1 five data set information
S1.3: the accounts in the trading networks of phishing and non-phishing are characterized separately.
S2: sampling the transaction subgraph, comprehensively considering the topological structure of the network, constructing a calculation formula of the number of neighbors of the target node according to the attributes of the network average value, the network density, the number of nodes and the number of connecting edges, and obtaining a subgraph scale k with uniform and reasonable size. When the number of neighbors is less than k, all the neighbor nodes are reserved; if the number of neighbors is larger than k, sorting neighbor nodes of the target node according to attributes of transaction amount and transaction times, and then reserving k neighbors to obtain a sampled small-scale sub-graph;
s2.1: when a trading network is constructed, an excessively large trading sample size causes large time complexity to affect the calculation efficiency, so that certain constraints are required on the number of neighbors and the neighborhood order. Sequencing the neighbors of the h order and obtaining k neighbor nodes, wherein the calculation formula of the number k of the neighbor nodes is as follows:
wherein,represents the average value of the network, Density represents the network Density,representing the rounding of the pairs, upwards, | V | and | E | represent the number of nodes and connecting edges of the network, respectivelyAnd (4) counting.
S3: through a graph volume deep neural network of Chebyshev, potential transaction behavior patterns of the account are learned, and end-to-end identification of the phishing account is achieved.
S3.1: the second order transaction network for each account is represented in the form of a set of vectors. The second order transaction network for each target account may be denoted by G ═ V, E, a, X, y. Where V is the set of all nodes that the trading network contains. E is a set of directed edges in the transaction network, defined asA is an adjacency matrix of a transaction network and is expressed as A epsilon Rn×n. X is a node characteristic, and can be used as X belonging to Rn×dWhere d represents the dimension of the feature and n represents the total number of nodes. y represents whether the target node is a phishing account, y-1 represents that the target node is a phishing account, and y-0 represents that the target node is not a phishing account.
S3.2: by using the graph convolutional layer automatic aggregation node field information of the Chebyshev GCN, the convolutional layer form of the Chebyshev GCN is defined as:
wherein, betakAre coefficients corresponding to the Chebyshev polynomial, these parameters will be updated iteratively in the training, and X is the node feature vector of the second order trading network.Is a Chebyshev polynomial of order k due to Tk(x) Cos (k. arccos (x)), hence the diagonal matrix of eigenvaluesNeeds to be fixed at [ -1,1 [)]In between, expressed as:
where lambda ismaxIs obtained by a power iteration method, and L is a Laplace matrixThe benefit of such a transformation is that the computation process does not need to perform the feature vector decomposition any more. Since the extracted second-order transaction subgraph is a directed network, the laplacian matrix is transformed to:where a is the adjacency matrix of the transaction subgraph,is the sum of the adjacency matrix and its transpose,is a deformed adjacency matrixThe degree matrix of (2) is a diagonal matrix. σ () is an activation function, and ReLu () (max (0,) is selected as the activation function.
In the actual operation process, the property of the Chebyshev polynomial can be utilized to obtain the recursion:
the scheme adopts two layers of Chebyshev GCN to aggregate neighborhood information of the target node, and the transaction subgraph feature extracted by taking the target account u as the center is represented as ou=gs。
S3.3: and extracting feature information after convolution of two layers of Chebyshev GCN in the step S3.2 by using a pooling function. The pooling function here is an average pooling function, and the node features are pooled into graph features by an average pooling layer, defined as:
ypooling=AvgPooling(ou) (6)
s3.4: further training a full connection layer to distinguish phishing accounts from non-phishing accounts by using features:
where W and b are the trainable weight matrix and bias matrix respectively,is the probability matrix of the final prediction result.
All the above trainable parameters are updated optimally by minimizing the following cross entropy loss function and using a gradient descent method:
mixing the algorithm model PDGNN (differentiating Scans Detection in Ethereum using Graph neural network) with Features (FE), LINE, Deepwalk (DW), Node2Vec (N2V), T-EDGE, Graph2Vec (G2V), I2Seven comparison algorithms of BGNN are used for comparison tests. The division ratio of the training set and the test set is 8:2, the fishing account detection experiment of each algorithm is repeated five times and averaged, F1-score is used as an evaluation index to measure the prediction result, and the experiment result is shown in Table 2.
TABLE 2 fishing account detection contrast experiment results
According to the analysis of experimental results, the FE effect of the simple feature extraction method is the worst, the effect of the walk algorithm N2V is better than that of the DW due to the addition of the network structure information, and the LINE is relatively better due to the intelligent aggregation of the second-order neighbor informationThe effect is not good. G2V, I2The BGNN and the PDGNN are both graph classification algorithms, and have better performance than a node classification algorithm. While our method pdbnn performs best in graph classification algorithms.
Claims (4)
1. A deep model fishing account identification method based on graph classification is characterized by comprising the following steps: the method comprises the following steps:
s1: constructing a lightweight data set; sampling from the open Ether house transaction records, constructing a second-order transaction sub-graph network after carrying out light weight on large-scale data, and extracting the characteristics of accounts in the network; wherein the target account contains fishing nodes and non-fishing nodes that have been marked; the transaction object comprises a first-order neighbor node and a second-order neighbor node of the target node; the characteristics comprise designated characteristics of a fishing account and a non-fishing account in the lightweight data set;
s2: sampling the transaction subgraph, comprehensively considering the topological structure of the network, constructing a calculation formula of the number of neighbors of the target node according to the attributes of the network average value, the network density, the number of nodes and the number of connecting edges, and obtaining the subgraph scale with uniform and reasonable size; when the number of the neighbors is less than the number of the neighbors, all the neighbor nodes are reserved; if the number of the neighbors is larger than the number of the neighbors, the attributes of the transaction amount and the transaction times of the neighbor nodes of the target node are sorted, and then the neighbors are reserved to obtain a small-scale sub-graph after sampling;
s3: through a graph volume deep neural network of Chebyshev, potential transaction behavior patterns of the account are learned, and end-to-end identification of the phishing account is achieved.
2. The deep model fishing account recognition method based on graph classification as claimed in claim 1, wherein step S1 specifically comprises:
s1.1: extracting small-scale transaction data by using a second-order breadth-first search algorithm BFS by taking a target account address as a starting point;
s1.2: on the basis of the lightweight data of the step S1.1, carrying out light weight operation on the data set again by using a random walk sampling algorithm; firstly, randomly selecting an account as an initial node by a walking algorithm, and sampling forwards by taking the account as a starting point to obtain a walking sequence with a fixed length; if the sequence does not reach the preset length in the sampling process, a certain account does not have a transaction account, an account accessed in the sequence needs to be randomly selected to start the wandering again;
s1.3: the accounts in the second-order trading networks of phishing and non-phishing are characterized separately.
3. The deep model fishing account recognition method based on graph classification as claimed in claim 2, wherein step S2 specifically comprises:
s2.1: in order to constrain the number of neighbors and the neighborhood order, a calculation formula of the number of neighbors is provided, the neighbors of the h order are sequenced, k neighbor nodes are obtained, and the calculation formula of the number k of the neighbor nodes is as follows:
4. The deep model fishing account recognition method based on graph classification as claimed in claim 3, wherein step S3 specifically comprises:
s3.1: representing a second-order transaction network for each account in the form of a set of vectors; the second-order transaction network for each target account may be denoted by G ═ V, E, a, X, y; wherein V is the set of all nodes contained in the trading network; e is a set of directed edges in the transaction network, defined asA is an adjacency matrix of a transaction network and is expressed as A epsilon Rn×n(ii) a X is a node characteristic, and available X belongs to Rn×dRepresenting, wherein d represents the dimension of the feature and n represents the total number of nodes; y represents whether the target node is a phishing account, y is 1 represents that the target node is a phishing account, and y is 0 represents that the target node is not a phishing account;
s3.2: by using the graph convolutional layer automatic aggregation node field information of the Chebyshev GCN, the convolutional layer form of the Chebyshev GCN is defined as:
wherein, betakCoefficients corresponding to the Chebyshev polynomial, which parameters are to be iteratively updated in the training, X is a node feature vector of the second-order trading network;is a Chebyshev polynomial of order k due to Tk(x) Cos (k. arccos (x)), hence the diagonal matrix of eigenvaluesNeeds to be fixed at [ -1,1 [)]In between, expressed as:
where lambda ismaxIs obtained by a power iteration method, and L is a Laplace matrixThe advantage of such a transformation is that the computation process does not need to perform feature vector decomposition; since the extracted second-order transaction subgraph is a directed network, the laplacian matrix is transformed to: where a is the adjacency matrix of the transaction subgraph,is the sum of the adjacency matrix and its transpose,is a deformed adjacency matrixThe degree matrix of (1) is a diagonal matrix; σ (-) is the activation function, where ReLu (-) max (0,) is chosen as the activation function;
in the actual operation process, the property of the Chebyshev polynomial can be utilized to obtain the recursion:
two layers of Chebyshev GCN are adopted to aggregate neighborhood information of the target node, and the transaction subgraph feature extracted by taking the target account u as the center is represented as ou=gs;
S3.3: extracting feature information after convolution of two layers of Chebyshev GCN in the step S3.2 by using a pooling function; the pooling function here is an average pooling function, and the node features are pooled into graph features by an average pooling layer, defined as:
ypooling=AvgPooling(ou) (6)
s3.4: further training a full connection layer to distinguish phishing accounts from non-phishing accounts by using features:
where W and b are the trainable weight matrix and bias matrix respectively,a probability matrix which is the final prediction result;
all the above trainable parameters are updated optimally by minimizing the following cross entropy loss function and using a gradient descent method:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210276108.8A CN114722920A (en) | 2022-03-21 | 2022-03-21 | Deep map convolution model phishing account identification method based on map classification |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210276108.8A CN114722920A (en) | 2022-03-21 | 2022-03-21 | Deep map convolution model phishing account identification method based on map classification |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114722920A true CN114722920A (en) | 2022-07-08 |
Family
ID=82237634
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210276108.8A Withdrawn CN114722920A (en) | 2022-03-21 | 2022-03-21 | Deep map convolution model phishing account identification method based on map classification |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114722920A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115907770A (en) * | 2022-11-18 | 2023-04-04 | 北京理工大学 | Ether house phishing fraud identification and early warning method based on time sequence feature fusion |
-
2022
- 2022-03-21 CN CN202210276108.8A patent/CN114722920A/en not_active Withdrawn
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115907770A (en) * | 2022-11-18 | 2023-04-04 | 北京理工大学 | Ether house phishing fraud identification and early warning method based on time sequence feature fusion |
CN115907770B (en) * | 2022-11-18 | 2023-09-29 | 北京理工大学 | Ethernet phishing fraud identification and early warning method based on time sequence feature fusion |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Park et al. | Graph transplant: Node saliency-guided graph mixup with local structure preservation | |
CN111931505A (en) | Cross-language entity alignment method based on subgraph embedding | |
CN111008337B (en) | Deep attention rumor identification method and device based on ternary characteristics | |
CN113422761B (en) | Malicious social user detection method based on counterstudy | |
CN114172688B (en) | Method for automatically extracting key nodes of network threat of encrypted traffic based on GCN-DL (generalized traffic channel-DL) | |
CN114239083B (en) | Efficient state register identification method based on graph neural network | |
CN113657896A (en) | Block chain transaction topological graph analysis method and device based on graph neural network | |
CN115659966A (en) | Rumor detection method and system based on dynamic heteromorphic graph and multi-level attention | |
CN113283902A (en) | Multi-channel block chain fishing node detection method based on graph neural network | |
CN115118451B (en) | Network intrusion detection method combined with graph embedded knowledge modeling | |
CN114897085A (en) | Clustering method based on closed subgraph link prediction and computer equipment | |
CN115687760A (en) | User learning interest label prediction method based on graph neural network | |
CN114722920A (en) | Deep map convolution model phishing account identification method based on map classification | |
CN112597399B (en) | Graph data processing method and device, computer equipment and storage medium | |
CN117114105B (en) | Target object recommendation method and system based on scientific research big data information | |
CN112801784A (en) | Bit currency address mining method and device for digital currency exchange | |
Morshed et al. | LeL-GNN: Learnable edge sampling and line based graph neural network for link prediction | |
CN115965466A (en) | Sub-graph comparison-based Ethernet room account identity inference method and system | |
CN116578904A (en) | Block chain address attribute classification method and system based on integrated machine learning | |
CN114519605A (en) | Advertisement click fraud detection method, system, server and storage medium | |
CN115906080A (en) | Ether house phishing detection method, system, electronic device and medium | |
CN114706977A (en) | Rumor detection method and system based on dynamic multi-hop graph attention network | |
CN113657441A (en) | Classification algorithm based on weighted Pearson correlation coefficient and combined with feature screening | |
Yuan et al. | A Multi‐Granularity Backbone Network Extraction Method Based on the Topology Potential | |
Xue et al. | Tsc-gcn: A face clustering method based on gcn |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20220708 |
|
WW01 | Invention patent application withdrawn after publication |