CN115965466A - Sub-graph comparison-based Ethernet room account identity inference method and system - Google Patents

Sub-graph comparison-based Ethernet room account identity inference method and system Download PDF

Info

Publication number
CN115965466A
CN115965466A CN202211026856.7A CN202211026856A CN115965466A CN 115965466 A CN115965466 A CN 115965466A CN 202211026856 A CN202211026856 A CN 202211026856A CN 115965466 A CN115965466 A CN 115965466A
Authority
CN
China
Prior art keywords
graph
transaction
subgraph
account
sampling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202211026856.7A
Other languages
Chinese (zh)
Inventor
宣琦
胡宸恺
徐嘉影
周嘉俊
沈杰
俞山青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202211026856.7A priority Critical patent/CN115965466A/en
Publication of CN115965466A publication Critical patent/CN115965466A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

An Ethernet house account identity inference method based on subgraph comparison comprises the following steps: s1: constructing transaction data of the Etherhouse account into a lightweight interaction diagram, and extracting a second-order transaction subgraph of a target account; s2: carrying out graph data enhancement processing on each transaction subgraph to generate a multi-view enhancement graph; s3: and aggregating the potential mode and transaction information of the account learning account through a graph convolution neural network to realize an account identity inference task on the EtherFang platform. The system for implementing the Ethernet workshop account identity inference method based on sub-graph comparison is further included. The method introduces a graph data enhancement technology and a contrast learning means, can generate weak labeling data by the graph data enhancement technology, increase the data amount available for training and enrich the structural characteristics of the samples, and can capture the structural similarity and the relevance among the similar samples by utilizing the contrast learning.

Description

Sub-graph comparison-based Ethernet room account identity inference method and system
Technical Field
The invention relates to the field of block chains and graphs, in particular to an Etherhouse account identity reasoning method and system based on subgraph comparison.
Background
The block chain technology has the characteristics of distribution, decentralization and the like, is widely applied to the industries of digital currency, supply chain management, financial service, electronic bills and the like, and has profound influence on the physical economy and social environment. As a distributed database technology, the block chain realizes the functions of decentralization, encryption, tamper resistance and the like. Owing to these characteristics, the cryptocurrency based on the block chain technology represented by Bitcoin (Bitcoin) and Ethereum (Ethereum) has been developed rapidly, and as long as 2021, 8 months, the current virtual currency types are about 11000 and the total market value is $ 1.9 trillion according to the statistics of relevant market analysis websites.
The biggest problem facing security supervision on blockchains is anonymity, and in public chains, a user only needs to create one pseudonymous account to carry out transaction, and one user can also own a plurality of accounts to improve the anonymity of the user. In recent years, based on the Etherhouse's identity inference needs, a great deal of work has focused on using the information disclosed on blockchains to analyze patterns of account behavior, mine information about accounts, and infer the likely identity of an account. The existing account identity inference method mainly focuses on manual feature engineering, graph modeling and graph embedding algorithms, and has considerable effects but a plurality of defects. Firstly, the manual feature engineering relies on the prior knowledge of designers, and cannot capture deep information in the block chain data, such as a transaction mode, so that the feature utilization rate is low and the expression capability is poor. Secondly, the transaction graph is large in scale on account and transaction quantity, and when the graph embedding algorithm based on random walk or graph neural network is applied to a large-scale interactive graph, memory and time consumption are large.
Disclosure of Invention
The invention provides an Etherhouse account identity reasoning method and system based on sub-graph comparison, aiming at overcoming the defects in the prior art. The invention introduces a graph data enhancement technology and a comparison learning means, and can accurately deduce account identity on an Etherhouse platform. The data enhancement is to generate more equivalent data by certain modification based on limited existing data, and meanwhile, the semantic information or labels of the graph or the node are required to be kept unchanged, so that the model canonicalization capability is improved.
The invention collects and sorts a large amount of transaction and contract calling data in the ether house, crawls open account label data from related websites through a crawler technology, constructs an interactive graph, designs a strategy for sampling an account transaction subgraph, aggregates neighborhood transaction characteristics of an account through a graph convolution neural network, and finally compresses node-level characteristics in the transaction subgraph to a graph level by utilizing a pooling layer to realize an account classification target. Meanwhile, in order to avoid the problem of overfitting of the model and relieve the influence caused by rare label data, the invention introduces a graph data enhancement technology and a comparison learning means, weak labeling data can be generated through the graph data enhancement technology to increase the data quantity available for training and enrich the structural characteristics of the samples, the structural similarity and the relevance among the similar samples can be captured by utilizing the comparison learning, and finally a new end-to-end framework is constructed by combining the weak labeling data and the structural characteristics to realize the identity inference task. Experiments on an Ethengfang platform show that the method provided by the invention is obviously superior to the traditional neural network method.
In order to achieve the above object, the present invention provides the following technical solutions:
an Ethernet house account identity inference method based on subgraph comparison comprises the following steps:
s1: acquiring transaction data and label information of an account on an Etherhouse platform, constructing a lightweight interaction graph, extracting intelligent contract calling characteristics of the account, and respectively extracting second-order transaction subgraphs centering on the target account according to the transaction times and the transaction total of the target account;
s2: carrying out graph data enhancement processing on each transaction subgraph to generate a multi-view enhancement graph, further improving the representation effect of the feature vectors extracted from the graph convolution layer, and simultaneously constraining the fitting state of the model;
s3: the method comprises the steps of aggregating neighborhood transaction characteristics of accounts by using a Graph Convolutional neural Network, learning potential modes and characteristic information hidden in a transaction Network by adopting a model based on the Graph Convolutional Network (GCN for short), compressing node-level characteristics in a transaction subgraph to a Graph level through a pooling layer to obtain corresponding Graph embedding vectors, performing comparative learning by combining the transaction subgraph of a target node and the Graph embedding vectors of an enhanced Graph, and classifying according to the characteristic vectors of the Graph by using a full connection layer to finally realize an account identity inference task on an ether workshop platform.
Preferably, the step S1 specifically includes:
s1.1: the specific steps of constructing the lightweight interaction graph are as follows, in all transaction records of the Etherlands, the interaction between two accounts is directionally combined, and at most two transaction edges in opposite directions are left, so that a directional authorized interaction network G is constructed 1 Wherein G is 1 = V, E, where V represents an account, E represents an interaction relationship, and the weight information is the transaction number and the transaction amount.
S1.2: the second-order transaction subgraph of the target account node is obtained by sampling an interactive network, and the specific construction method is as follows, corresponding sampling strategies are designed according to weight information reserved by the interactive graph, namely, the K neighbors with the maximum sampling numerical value are sequenced according to transaction times or the K neighbors with the maximum sampling sum are sequenced according to transaction amount. A complete second-order subgraph sampling flow is realized through a breadth-first search algorithm, and comprises the following steps: a. sampling a first-order neighborhood of a target node; b. sampling a second-order neighborhood through a first-order neighborhood node; c. and perfecting the subgraph into a derived subgraph of the sampled node set. The transaction subgraph network topology characteristics of the target node are represented by an adjacency matrix A.
And S1.3, extracting the features of the intelligent contracts of the Etheng accounts. Statistics of Ether Fang deploymentThe number of times of calling the contracts and the external users is omitted, and the former N intelligent contracts which are called the most frequently are reserved. The calling conditions of the nodes in the second-order trading network sampled from all S1 nodes in the contract calling history record to the N intelligent contracts are counted, and the contract calling characteristic matrix of the second-order trading network of the target node can be expressed as
Figure SMS_1
Where | V | represents the number of nodes in the node set.
Preferably, the default value of K is 20 in step S1.2, and N =14885 in step S1.3.
Preferably, the step S2 specifically includes:
and (2) enhancing the graph data, namely generating two new samples for the sub-graph by adopting a mode of carrying out Node Dropping (ND) on the adjacent matrix A and carrying out random Feature Masking (FM) on the feature matrix X for the sub-graph obtained by sampling, and also taking the two new samples as two visual angles of comparison learning.
Preferably, the step S3 specifically includes:
s3.1. Generating node-level vectors by rolling up adjacent transaction characteristics of layer aggregated accounts through two layers of graphs
Figure SMS_2
The specific formula is as follows:
Figure SMS_3
wherein,
Figure SMS_4
is normalized A v Or A t ,A v Is a sub-graph obtained by sampling according to the transaction amount, A t Is a sub-graph obtained by sampling according to the number of transactions. />
Figure SMS_5
And/or>
Figure SMS_6
Which are trainable weight parameters for the two volume bases, respectively. Additionally, since the extracted transaction sub-map is weighted and directional, it->
Figure SMS_7
The processing of (2) has some differences from the original GCN, and the specific formula is as follows:
Figure SMS_8
Figure SMS_9
Figure SMS_10
Figure SMS_11
is the adjacent matrix of the undirected graph obtained by adding the directional weighted adjacent matrix A obtained by sampling and the transposition thereof, and finally passes through the column weight and the diagonal matrix D of the adjacent matrix sum The diagonal matrix is normalized instead of the values.
S3.2, acquiring graph embedding vectors, and pooling node-level features into graph-level features through a maximum pooling layer (Max Pooling) to represent the graph:
Z pool =MaxPooling(Z) (5)
maximum pooling operation by selecting the maximum of all nodes in each feature dimension, compressing Z to
Figure SMS_12
Resulting in a graph embedding vector.
S3.3, classifying the graph embedding vectors by using a full connection layer, wherein the specific full connection layer is set as follows:
Y=softmax(ZpoolW (2) +b) (6)
wherein
Figure SMS_13
And &>
Figure SMS_14
Trainable parameters and biases of the fully connected layer, respectively.
Finally the forward transfer function of the invention is as follows:
Figure SMS_15
where ReLU is the activation function and the remaining parameters are consistent with those described above.
And S3.4, further mapping the graph embedding vectors under the two enhanced views to another high-dimensional space through a nonlinear transformed full-connected layer (project head), and calculating the contrast loss between the two high-dimensional spaces. The contrast enhancement loss for the nth graph can be expressed as:
Figure SMS_16
where N and τ are the number of plots and the temperature parameter, respectively, Z n,1 And Z n,2 Graph embedding vectors, sim (Z), representing two enhanced views of the nth graph, respectively n,1 ,Z n,2 ) Which means that the cosine similarity between two vectors is calculated. The process of the comparative learning can enable samples with the same label to be relatively close in the embedding space, and enable samples with different labels to be relatively far away.
The resulting loss function of the present invention can be represented by the following equation:
Figure SMS_17
wherein
Figure SMS_18
For a cross-entropy loss of the prediction result of the original sample, <' > H>
Figure SMS_19
And &>
Figure SMS_20
For the cross-entropy loss of the prediction result of two enhanced samples, <' > H>
Figure SMS_21
For a loss of contrast, <' > based on>
Figure SMS_22
For the L2 regularization constraint of the parameters, the hyper-parameters α and β are used to weigh the contrast loss, prediction loss, and regularization constraint brought by the enhancement samples.
The system for implementing the Ethernet house account identity inference method based on sub-graph comparison comprises a network construction module, a data enhancement module and a sub-graph comparison and classification identification module which are sequentially connected;
the network construction module is used for obtaining an interactive graph through lightweight processing in all transaction records of the Etherhouse, designing a corresponding sampling strategy according to the weight information, and sampling target account transactions on the interactive graph to obtain a corresponding second-order subgraph;
the data enhancement module is used for generating two feature graphs under enhanced visual angles by adopting random feature masking and random node deleting modes for each second-order subgraph;
and the sub-graph comparison and classification identification module learns the intelligent contract calling characteristics and the second-order transaction network topological structure characteristics of the target account by adopting a GCN-based detection model, obtains a final embedded vector of the target account through the structure of multiple layers of GCNs, and constructs a loss function by combining the idea of comparison learning so as to realize the identity inference of the EtherFang account.
The invention has the advantages that: the transaction sub-graph topological features and the transaction features of the target account are fully utilized, and graph data enhancement technology and comparative learning are combined, so that the standardization capability and the precision of the model are improved. Meanwhile, the invention also saves calculation power resources and time resources.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required in the embodiments will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a block diagram of the method and system of the present invention;
FIG. 2 is a general flow diagram of the method and system of the present invention;
FIG. 3 is a transaction sub-graph extracted for one of the accounts in the exchange dataset by the present invention.
Detailed Description
Reference will now be made in detail to various exemplary embodiments of the invention, the detailed description should not be construed as limiting the invention but as a more detailed description of certain aspects, features and embodiments of the invention.
It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. Further, for numerical ranges in this disclosure, it is understood that each intervening value, between the upper and lower limit of that range, is also specifically disclosed. Every smaller range between any stated value or intervening value in a stated range and any other stated or intervening value in a stated range is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included or excluded in the range.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although only preferred methods and materials are described herein, any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention. All documents mentioned in this specification are incorporated herein by reference to disclose and describe the methods and/or materials in connection with which the documents are cited. In case of conflict with any incorporated document, the present specification will control.
It will be apparent to those skilled in the art that various modifications and variations can be made in the specific embodiments of the present disclosure without departing from the scope or spirit of the disclosure. Other embodiments will be apparent to those skilled in the art from consideration of the specification. The specification and examples are exemplary only.
As used herein, the terms "comprising," "including," "having," "containing," and the like are open-ended terms that mean including but not limited to.
The "parts" in the present invention are in parts by mass unless otherwise specified.
The following detailed description of embodiments of the invention is provided in connection with the accompanying drawings.
Taking account 0x2910543af39aba0cd09dbb2d50200b3e800a63d2 as an example, its transaction sub-graph is shown in fig. 3. Finally, the trading sub-graph network topological structure characteristics of the trading exchange nodes are represented by an adjacency matrix A.
Referring to fig. 1 and 2, an ethernet house account identity inference method and system based on subgraph comparison are provided, in this embodiment, account data is from an Exchange data set (ETH-Exchange, hereinafter referred to as ETH-E) of an ethernet house.
In this embodiment, an etherhouse account identity inference method based on subgraph comparison includes the following steps:
s1: constructing a lightweight interactive graph according to the acquired information of the trading exchange node account, extracting the intelligent contract calling characteristics of the account, and respectively extracting a second-order trading subgraph taking the target account as the center according to the trading times and the trading total amount of the target account;
s2: carrying out graph data enhancement processing on each transaction subgraph to generate a multi-view enhancement graph, further improving the representation effect of the feature vectors extracted from the graph convolution layer, and simultaneously constraining the fitting state of the model;
s3: the method comprises the steps of using neighborhood transaction characteristics of a Graph convolution neural Network aggregated account, learning potential modes and characteristic information hidden in the transaction Network by adopting a model based on a Graph convolution Network (GCN for short), compressing node-level characteristics in a transaction subgraph to a Graph level through a pooling layer to obtain a corresponding Graph embedding vector, performing comparative learning by combining the transaction subgraph of a target node and the Graph embedding vector of an enhanced Graph, and classifying according to the characteristic vector of the Graph by using a full connection layer to finally realize an account identity inference task on an Etherhouse platform.
The step S1 specifically includes:
s1.1: the specific steps of constructing the lightweight interaction graph are as follows, in all transaction records of the Etherlands, the interaction between two accounts is directionally combined, and at most two transaction edges in opposite directions are left, so that a directional authorized interaction network G is constructed 1 Wherein G is 1 = V, E, where V denotes an account and E denotes an interaction relationship, and the weight information thereof is the transaction number and the transaction amount.
S1.2: the second-order transaction subgraph of the target account node is obtained by sampling an interactive network, and the specific construction method is as follows, corresponding sampling strategies are designed according to weight information reserved by the interactive graph, namely K neighbors with the maximum sampling value are sorted according to transaction times or K neighbors with the maximum sampling sum are sorted according to transaction limits, wherein the default value of K is 20. The sub-graph data sets extracted according to the transaction times and the transaction amount are respectively represented as ETH-E-T and ETH-E-V. The two data sets are described in detail in table 1:
data set Number of subgraphs Average number of nodes Average number of connected edges Characteristic dimension
ETH-E-T 5070 59.76 130.62 14885
ETH-E-V 5070 59.23 124.92 14885
TABLE 1
And S1.3, extracting the features of the intelligent contracts of the Etheng accounts. Counting the intelligent contracts deployed in the ether house and the times of calls by external users, ignoring calls between contracts, and keeping the front 14885 intelligent contracts which are called the most frequently. The calling conditions of the 14885 intelligent contracts of the nodes in the second-order trading network sampled from all S1 are counted from the contract calling history, and the contract calling feature matrix of the second-order trading network of the target node can be represented as
Figure SMS_23
Where | V | represents the number of nodes of the node set.
The step S2 specifically includes:
performing graph data enhancement processing, namely performing Node Dropping (ND) and random Feature Masking (FM) on the sub-graphs obtained by sampling to generate a multi-view enhancement graph for each transaction sub-graph, further improving the expressive force of Feature vectors extracted from graph convolution layers, and simultaneously constraining the fitting state of a model;
the specific operation method for deleting the random node comprises the following steps: randomly deleting partial nodes except the central target node in the sub-graph obtained by sampling, wherein the number of the deleted nodes is calculated as follows:
DropNum = max (N/10, 1) (10) wherein N is the number of nodes in the subgraph, and at least one node is randomly deleted because the subgraph formed by isolated nodes does not exist in the data, and when the node is deleted, not only the adjacent matrix of the subgraph is reduced, but also the corresponding attribute feature vector is deleted.
The specific operation method for masking the random features comprises the following steps: randomly covering partial dimensions of all node features in a subgraph obtained by sampling, wherein the covered feature dimension number is calculated as follows:
MaskNum=d/100 (11)
where d is the number of dimensions of all features of the node.
The step S3 specifically includes:
s3.1. Generating node-level vectors by rolling up adjacent transaction characteristics of layer aggregated accounts through two layers of graphs
Figure SMS_24
The concrete formula is as follows:
Figure SMS_25
wherein,
Figure SMS_26
is normalized A v Or A t ,A v Is a sub-graph obtained by sampling according to transaction amount, A t Is a sub-graph obtained by sampling according to the number of transactions. />
Figure SMS_27
And/or>
Figure SMS_28
Which are trainable weight parameters for the two volume bases, respectively. Additionally, since the extracted transaction sub-map is weighted and directional, it->
Figure SMS_29
The processing of (2) has some differences from the original GCN, and the specific formula is as follows:
Figure SMS_30
Figure SMS_31
Figure SMS_32
Figure SMS_33
is the adjacent matrix of the undirected graph obtained by adding the directional weighted adjacent matrix A obtained by sampling and the transposition thereof, and finally passes through the column weight and the diagonal matrix D of the adjacent matrix sum The diagonal matrix is normalized instead of the values.
S3.2, acquiring graph embedding vectors, and pooling node-level features into graph-level features through a maximum pooling layer (Max Pooling) to represent the graph:
Z pool =MaxPooling(Z) (5)
maximum pooling operation by selecting the maximum of all nodes in each feature dimension, compressing Z to
Figure SMS_34
Thereby resulting in a graph-embedded vector of exchange nodes.
S3.3, classifying the graph embedded vectors of the exchange nodes by using a full connection layer, wherein the specific full connection layer is set as follows:
Y=softmax(Z pool W (2) +b) (6)
wherein
Figure SMS_35
And &>
Figure SMS_36
Trainable parameters and biases of the fully connected layer, respectively. />
The final forward transfer function of the present invention is as follows:
Figure SMS_37
where ReLU is the activation function and the remaining parameters are consistent with those described above.
S3.4, map embedding vectors in two enhanced view angles are further mapped to another high-dimensional space through a nonlinear transformation full connected layer (project head), and the contrast loss between the two high-dimensional spaces is calculated. The contrast enhancement loss for the nth plot can be expressed as:
Figure SMS_38
where N and τ are the number of plots and the temperature parameter, respectively, Z n,1 And Z n,2 Graph embedding vectors, sim (Z), representing two enhanced views of the nth graph, respectively n,1 ,Z n,2 ) Which means that the cosine similarity between two vectors is calculated. The process of the comparative learning can enable samples with the same label to be relatively close in the embedding space, and enable samples with different labels to be relatively far away.
The resulting loss function of the present invention can be represented by the following equation:
Figure SMS_39
wherein
Figure SMS_40
For a cross-entropy loss of the prediction result of the original sample, <' > H>
Figure SMS_41
And &>
Figure SMS_42
For a cross-entropy loss of a prediction result of two enhancement samples>
Figure SMS_43
For a loss of contrast, <' > based on>
Figure SMS_44
The hyper-parameters α and β are used to weigh the contrast loss, prediction loss and regularization constraints brought by the enhanced samples, which are the L2 regularization constraints of the parameters. In the present embodiment, the dimension of Project Head is set to 128, α is set to 0.2, and β is set to 0.0001.
The invention carries out an account identity inference experiment on the data set of the real Ether house platform, and selects the F1 score as an evaluation index. In order to prevent the error fluctuation of the training and testing results from being too large, the data set is trained and tested in a 3-fold cross validation mode, and the obtained data is averaged to enable the results to be more accurate.
The invention is described above for an embodiment of exchange node identity inference based on subgraph comparison for accounts in an ethernet house. The final test results, as shown in table 2,
Figure SMS_45
/>
Figure SMS_46
TABLE 2
Corresponding to the above method embodiment, the embodiment of the present invention further provides an ethernet house account identity inference system based on sub-graph comparison, and the structural schematic diagram thereof, as shown in fig. 2, includes a network construction module, a data enhancement module, and a sub-graph comparison and classification identification module, which are connected in sequence;
the network construction module 21 is used for obtaining an interactive graph through lightweight processing in all transaction records of the Etherhouse, designing a corresponding sampling strategy according to the weight information, and sampling target account transactions on the interactive graph to obtain a corresponding second-order subgraph; the method specifically comprises the following steps:
s1.1: the specific steps of constructing the lightweight interaction graph are as follows, in all transaction records of the Etherhouse, the interaction between two accounts is directionally combined, and at most two edges in the opposite direction are left, so that a directional authorized interaction network G is constructed 1 Wherein G is 1 =(V,E),VAnd E represents the interaction relation, and the weight information of the interaction relation is the transaction times and the transaction amount.
S1.2: the second-order transaction subgraph of the node is obtained by sampling the interactive network, and the specific construction method is that a corresponding sampling strategy is designed according to weight information reserved by the interactive graph, namely K neighbors with the maximum sampling value are sorted according to transaction times and K neighbors with the maximum sampling sum are sorted according to transaction limits, and the default value of K is 20. A complete second-order subgraph sampling flow is realized through a breadth-first algorithm, and mainly comprises the following steps: a. sampling a first-order neighborhood of a target node; b. sampling a second-order neighborhood through a first-order neighborhood node; c. and perfecting the subgraph into a derived subgraph of the sampled node set. The transaction subgraph network topology characteristics of the target node are represented by an adjacency matrix A.
And S1.3, extracting the features of the intelligent contracts of the Etheng accounts. The invention screens out the frequently called contracts from the intelligent contract calling historical records, extracts contract calling characteristics for the accounts in all second-order networks, and expresses the second-order subgraph intelligent contract calling characteristics of the target account through a matrix X.
The data enhancement module 22 is used for generating graph data under two enhanced visual angles by adopting a random characteristic covering and random node deleting mode for each second-order subgraph; the method specifically comprises the following steps:
and (2) enhancing the graph data, namely generating two new samples for the subgraph by performing Node Dropping (ND) on the adjacency matrix A and performing random Feature Masking (FM) on the Feature matrix X for the subgraph obtained by sampling, wherein the new samples can also be regarded as two visual angles of comparative learning.
The subgraph comparison and classification recognition module 23 learns the intelligent contract calling characteristics and the second-order transaction network topology structure characteristics of the target account by adopting a detection model based on the GCN, aggregates neighborhood characteristic information of the target account through the structure of a plurality of layers of GCNs and obtains a final embedded vector, and finally constructs a loss function by combining the idea of comparison learning, thereby realizing the identity inference of the ether house account.
The method specifically comprises the following steps:
s3.1. Generating node-level vectors by rolling up adjacent transaction characteristics of layer aggregated accounts through two layers of graphs
Figure SMS_47
The specific formula is as follows:
Figure SMS_48
wherein,
Figure SMS_49
is normalized A v Or A t ,A v Is a sub-graph obtained by sampling according to transaction amount, A t Is a sub-graph obtained by sampling according to the number of transactions. />
Figure SMS_50
And &>
Figure SMS_51
Which are trainable weight parameters for the two volume bases, respectively. In addition, since the extracted transaction sub-map is authoritative and directional, so ÷ based on the number of transaction sub-maps in the transaction sub-map>
Figure SMS_52
The processing of (2) has some differences from the original GCN, and the specific formula is as follows:
Figure SMS_53
Figure SMS_54
Figure SMS_55
Figure SMS_56
is a weighted directed adjacency moment obtained from samplesThe array A and the adjacent matrix of the undirected graph obtained by the transposition addition of the array A finally pass through the column weight and the diagonal matrix D of the adjacent matrix sum The diagonal matrix is normalized instead of the values.
S3.2, acquiring a graph embedding vector, and representing the graph by pooling node-level features into graph-level features through a maximum pooling layer (Max Paoling):
Z pool =MaxPooling(Z) (5)
maximum pooling operation by selecting the maximum of all nodes in each feature dimension, compressing Z to
Figure SMS_57
Resulting in a graph embedding vector.
S3.3, classifying the graph embedding vectors by using a full connection layer, wherein the specific full connection layer is set as follows:
Y=softmax(Z pool W (2) +b) (6)
wherein
Figure SMS_58
And &>
Figure SMS_59
Respectively, trainable parameters and bias for the fully connected layer.
Finally the forward transfer function of the invention is as follows:
Figure SMS_60
where ReLU is the activation function and the remaining parameters are consistent with those described above.
And S3.4, further mapping the graph embedding vectors under the two enhanced views to another high-dimensional space through a nonlinear transformed full-connected layer (project head), and calculating the contrast loss between the two high-dimensional spaces. The contrast enhancement loss for the nth graph can be expressed as:
Figure SMS_61
where N and τ are the number of plots and the temperature parameter, respectively, Z n,1 And Z n,2 Graph embedding vectors, sim (Z), representing two enhanced views of the nth graph, respectively n,1 ,Z n,2 ) Which means that the cosine similarity between two vectors is calculated. The process of the comparative learning can enable samples with the same label to be relatively close in the embedding space, and enable samples with different labels to be relatively far away.
The resulting loss function of the present invention can be represented by the following equation:
Figure SMS_62
wherein
Figure SMS_63
For a cross-entropy loss of the prediction result of the original sample, <' > H>
Figure SMS_64
And &>
Figure SMS_65
For the cross-entropy loss of the prediction result of two enhanced samples, <' > H>
Figure SMS_66
For a loss of contrast, <' > based on>
Figure SMS_67
For the L2 regularization constraint of the parameters, the hyper-parameters α and β are used to weigh the contrast loss, prediction loss, and regularization constraint brought by the enhancement samples.
The system for implementing the Ethernet house account identity inference method based on sub-graph comparison comprises a network construction module, a data enhancement module and a sub-graph comparison and classification identification module which are sequentially connected;
the network construction module is used for obtaining an interactive graph through lightweight processing in all transaction records of the Etherhouse, designing a corresponding sampling strategy according to the weight information, and sampling target account transactions on the interactive graph to obtain a corresponding second-order subgraph;
the data enhancement module is used for generating two feature graphs under enhanced visual angles by adopting random feature masking and random node deleting modes for each second-order subgraph;
and the subgraph comparison and classification identification module learns the intelligent contract calling characteristics and the second-order transaction network topological structure characteristics of the target account by adopting a GCN-based detection model, obtains the final embedded vector of the target account through the structure of multiple layers of GCNs, and constructs a loss function by combining the idea of comparison learning, thereby realizing the identity inference of the Etheng account.
The above-described embodiments are only intended to illustrate the preferred embodiments of the present invention, and not to limit the scope of the present invention, and various modifications and improvements made to the technical solution of the present invention by those skilled in the art without departing from the spirit of the present invention should fall within the protection scope defined by the claims of the present invention.

Claims (6)

1. An Ether house account identity inference method based on subgraph comparison is characterized by comprising the following steps:
s1: acquiring transaction data and label information of an account on an EtherFang platform, constructing a lightweight interactive graph, extracting intelligent contract calling characteristics of the account, and respectively extracting a second-order transaction subgraph centered on the target account according to the transaction times and the transaction total amount of the target account;
s2: carrying out graph data enhancement processing on each transaction subgraph to generate a multi-view enhancement graph, further improving the representation effect of the feature vectors extracted from the graph convolution layer, and simultaneously constraining the fitting state of the model;
s3: the method comprises the steps of using neighborhood transaction characteristics of a graph convolution neural network aggregation account, learning potential modes and characteristic information hidden in a transaction network by adopting a model based on a graph convolution network GCN, compressing node-level characteristics in a transaction subgraph to a graph level through a pooling layer to obtain a corresponding graph embedding vector, performing comparative learning by combining the transaction subgraph of a target node and the graph embedding vector of an enhanced graph, and classifying according to the characteristic vector of the graph by utilizing a full connection layer to finally realize an account identity inference task on an Etherhouse platform.
2. The ethernet house account identity inference method based on subgraph comparison according to claim 1, wherein said step S1 specifically comprises:
s1.1: specifically, the method comprises the following steps of directionally combining the interaction between two accounts in all transaction records of an Etherhouse, and finally leaving at most two transaction edges in opposite directions so as to construct a directional authorized interaction network G 1 Wherein G is 1 = (V, E), wherein V represents an account, E represents an interaction relation, and the weight information is transaction times and transaction amount;
s1.2: the second-order transaction subgraph of the target account node is obtained by sampling an interactive network, and the specific construction method is that a corresponding sampling strategy is designed according to weight information reserved by the interactive graph, namely, K neighbors with the maximum sampling value are sorted according to transaction times or K neighbors with the maximum sampling sum are sorted according to transaction limits; a complete second-order subgraph sampling process is realized by a breadth-first search algorithm, and comprises the following steps: a. sampling a first-order neighborhood of a target node; b. sampling a second-order neighborhood through a first-order neighborhood node; c. perfecting the subgraph into a derived subgraph of a node set obtained by sampling, wherein the transaction subgraph network topological structure characteristic of a target node is represented by an adjacency matrix A;
s1.3, extracting the intelligent contract calling characteristics of the Etheng account; counting intelligent contracts deployed in an ether house and the times of calling by external users, ignoring calling between contracts, and keeping the former N intelligent contracts which are called for the most times; the calling conditions of the nodes in the second-order trading network sampled from S1 in the contract calling history are counted, and the contract calling characteristic matrix of the second-order trading network of the target node can be expressed as
Figure FDA0003815861490000021
Where | V | represents the number of nodes of the node set.
3. The subgraph comparison-based etherhouse account identity inference method according to claim 2, characterized in that the default value of K in step S1.2 is 20, and N =14885 in step S1.3.
4. The ether house account identity inference method based on subgraph comparison according to claim 1 or 2, wherein said step S2 specifically comprises:
and (3) enhancing the graph data, namely generating two new samples for the subgraph by adopting a mode of deleting random nodes of the adjacent matrix A and masking random features of the feature matrix X for the subgraph obtained by sampling, and taking the new samples as two visual angles for comparison learning.
5. The ethernet house account identity inference method based on subgraph comparison according to claim 4, wherein said step S3 specifically comprises:
s3.1. Aggregating the adjacent transaction characteristics of accounts through two-layer graph convolution layer and generating vectors at node level
Figure FDA0003815861490000031
The specific formula is as follows: />
Figure FDA0003815861490000032
Wherein,
Figure FDA0003815861490000033
is normalized A v Or A t ,A v Is a sub-graph obtained by sampling according to the transaction amount, A t Is a sub-graph obtained by sampling according to the transaction times; />
Figure FDA0003815861490000034
And/or>
Figure FDA0003815861490000035
Are respectively provided withIs a trainable weight parameter for the base of two volumes; additionally, since the extracted transaction sub-map is weighted and directional, it->
Figure FDA0003815861490000036
The processing of (2) has some differences from the original GCN, and the specific formula is as follows:
Figure FDA0003815861490000037
Figure FDA0003815861490000038
Figure FDA0003815861490000039
Figure FDA00038158614900000310
is the adjacent matrix of the undirected graph obtained by adding the directional weighted adjacent matrix A obtained by sampling and the transposition thereof, and finally passes through the column weight and the diagonal matrix D of the adjacent matrix sum Normalizing the diagonal matrix instead of the degree value;
s3.2, acquiring graph embedding vectors, and pooling node-level features into graph-level features through a maximum pooling layer Max clustering to represent the graph:
Z pool =MaxPooling(Z) (5)
maximum pooling operation by selecting the maximum of all nodes in each feature dimension, compressing Z to
Figure FDA00038158614900000311
Thereby obtaining a graph embedding vector;
s3.3, classifying the graph embedding vectors by using a full connection layer, wherein the specific full connection layer is set as follows:
Y=softmax(Z pool W (2) +b) (6)
wherein
Figure FDA0003815861490000041
And &>
Figure FDA0003815861490000042
Trainable parameters and bias, respectively, for a fully connected layer;
the final forward transfer function of the present invention is as follows:
Figure FDA0003815861490000043
wherein ReLU is an activation function, and the rest parameters are consistent with the above parameters;
s3.4, further mapping the graph embedding vectors under the two enhanced views to another high-dimensional space through a nonlinear transformed full connected layer (project head), and calculating the contrast loss between the two high-dimensional spaces; the contrast enhancement loss for the nth plot can be expressed as:
Figure FDA0003815861490000044
where N and T are the number of maps and the temperature parameter, Z n,1 And Z n,2 Graph embedding vectors, sim (Z), representing two enhanced views of the nth graph, respectively n,1 ,Z n,2 ) Representing the calculation of cosine similarity between two vectors; the process of the comparison learning can lead the samples with the same label to be relatively close in the embedding space and lead the samples with different labels to be relatively far away;
the resulting loss function of the present invention can be represented by the following equation:
Figure FDA0003815861490000045
/>
wherein
Figure FDA0003815861490000046
For a cross-entropy loss of the prediction result of the original sample, <' > H>
Figure FDA0003815861490000047
And &>
Figure FDA0003815861490000048
For the cross-entropy loss of the prediction result of two enhanced samples, <' > H>
Figure FDA0003815861490000049
For a loss of contrast, ->
Figure FDA00038158614900000410
The hyper-parameters α and β are used to weigh the contrast loss, prediction loss and regularization constraints brought by the enhanced samples, which are the L2 regularization constraints of the parameters.
6. The system for implementing the sub-graph comparison-based etherhouse account identity inference method of claim 5, wherein: the system comprises a network construction module, a data enhancement module and a subgraph comparison and classification identification module which are sequentially connected;
the network construction module is used for obtaining an interactive graph through lightweight processing in all transaction records of the Etherhouse, designing a corresponding sampling strategy according to the weight information, and sampling target account transactions on the interactive graph to obtain a corresponding second-order subgraph;
the data enhancement module is used for generating graph data under two enhanced visual angles by adopting a random feature masking and random node deleting mode for each second-order subgraph;
the subgraph comparison and classification identification module learns the intelligent contract calling characteristics and the second-order transaction network topological structure characteristics of the target account by adopting a GCN-based detection model, aggregates neighborhood characteristic information of the target account through the structure of multiple layers of GCNs, obtains a final embedded vector, and finally constructs a loss function by combining the idea of comparison learning, thereby realizing the identity inference of the EtherFang account.
CN202211026856.7A 2022-08-25 2022-08-25 Sub-graph comparison-based Ethernet room account identity inference method and system Withdrawn CN115965466A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211026856.7A CN115965466A (en) 2022-08-25 2022-08-25 Sub-graph comparison-based Ethernet room account identity inference method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211026856.7A CN115965466A (en) 2022-08-25 2022-08-25 Sub-graph comparison-based Ethernet room account identity inference method and system

Publications (1)

Publication Number Publication Date
CN115965466A true CN115965466A (en) 2023-04-14

Family

ID=87360556

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211026856.7A Withdrawn CN115965466A (en) 2022-08-25 2022-08-25 Sub-graph comparison-based Ethernet room account identity inference method and system

Country Status (1)

Country Link
CN (1) CN115965466A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117371540A (en) * 2023-12-07 2024-01-09 南京信息工程大学 Depth map neural network-based blockchain address identity inference method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117371540A (en) * 2023-12-07 2024-01-09 南京信息工程大学 Depth map neural network-based blockchain address identity inference method and system
CN117371540B (en) * 2023-12-07 2024-03-15 南京信息工程大学 Depth map neural network-based blockchain address identity inference method and system

Similar Documents

Publication Publication Date Title
CN108681936B (en) Fraud group identification method based on modularity and balanced label propagation
Straka et al. Predicting popularity of electric vehicle charging infrastructure in urban context
CN111931505A (en) Cross-language entity alignment method based on subgraph embedding
CN113779264B (en) Transaction recommendation method based on patent supply and demand knowledge graph
Pan et al. Clustering of designers based on building information modeling event logs
CN113283902B (en) Multichannel blockchain phishing node detection method based on graphic neural network
Hosseini Rad et al. A new hybridization of DBSCAN and fuzzy earthworm optimization algorithm for data cube clustering
Shao et al. The Traffic Flow Prediction Method Using the Incremental Learning‐Based CNN‐LTSM Model: The Solution of Mobile Application
CN116340524B (en) Method for supplementing small sample temporal knowledge graph based on relational adaptive network
Zhang et al. Detecting colocation flow patterns in the geographical interaction data
CN113422761A (en) Malicious social user detection method based on counterstudy
CN117036060A (en) Vehicle insurance fraud recognition method, device and storage medium
CN117271899A (en) Interest point recommendation method based on space-time perception
CN114897085A (en) Clustering method based on closed subgraph link prediction and computer equipment
Rabbi et al. An Approximation For Monitoring The Efficiency Of Cooperative Across Diverse Network Aspects
CN115310589A (en) Group identification method and system based on depth map self-supervision learning
CN115965466A (en) Sub-graph comparison-based Ethernet room account identity inference method and system
CN114896977A (en) Dynamic evaluation method for entity service trust value of Internet of things
CN115086179B (en) Detection method for community structure in social network
Jiang et al. Jointly Learning Representations for Map Entities via Heterogeneous Graph Contrastive Learning
CN114722920A (en) Deep map convolution model phishing account identification method based on map classification
CN114265954B (en) Graph representation learning method based on position and structure information
Jenson et al. Mining location information from users' spatio-temporal data
CN114519605A (en) Advertisement click fraud detection method, system, server and storage medium
CN114706977A (en) Rumor detection method and system based on dynamic multi-hop graph attention network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20230414

WW01 Invention patent application withdrawn after publication