CN111861756A - Group partner detection method based on financial transaction network and implementation device thereof - Google Patents

Group partner detection method based on financial transaction network and implementation device thereof Download PDF

Info

Publication number
CN111861756A
CN111861756A CN202010777629.2A CN202010777629A CN111861756A CN 111861756 A CN111861756 A CN 111861756A CN 202010777629 A CN202010777629 A CN 202010777629A CN 111861756 A CN111861756 A CN 111861756A
Authority
CN
China
Prior art keywords
node
transaction
matrix
user
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010777629.2A
Other languages
Chinese (zh)
Other versions
CN111861756B (en
Inventor
朱滕威
王巍
黄俊恒
王佰玲
辛国栋
刘扬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Weihai Tianzhiwei Network Space Safety Technology Co ltd
Harbin Institute of Technology Weihai
Original Assignee
Weihai Tianzhiwei Network Space Safety Technology Co ltd
Harbin Institute of Technology Weihai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Weihai Tianzhiwei Network Space Safety Technology Co ltd, Harbin Institute of Technology Weihai filed Critical Weihai Tianzhiwei Network Space Safety Technology Co ltd
Priority to CN202010777629.2A priority Critical patent/CN111861756B/en
Publication of CN111861756A publication Critical patent/CN111861756A/en
Application granted granted Critical
Publication of CN111861756B publication Critical patent/CN111861756B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Finance (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Accounting & Taxation (AREA)
  • Technology Law (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Complex Calculations (AREA)

Abstract

The invention relates to a group partner detection method based on a financial transaction network and an implementation device thereof, wherein the detection method comprises the following steps: (1) preprocessing data; (2) generating a user feature vector: acquiring a user time sequence characteristic vector by using a sequence model, and acquiring a user space characteristic vector by using a GAE model; respectively normalizing the user time sequence feature vector and the space feature vector, and performing connection operation to generate a node expression vector; (3) group detection: and calculating the group to which each node belongs and outputting a group mark of the node. According to the method, original financial transaction flow information data is utilized, time sequence characteristics and space structure characteristics are extracted firstly, then the distance between every two nodes is calculated by using connection characteristics to serve as weight, and each user can be allocated to potential partners by using a group detection algorithm based on modularity optimization.

Description

Group partner detection method based on financial transaction network and implementation device thereof
Technical Field
The invention relates to a group partner detection method based on a financial transaction network and an implementation device thereof, belonging to the technical field of data mining.
Background
Group detection refers to detecting a node set with the same characteristics on graph data, and is also called community detection in the field of complex networks. Group detection has a wide application base. The aid decision tools in financial crime require high accuracy and interpretability. Therefore, the method has wide research and application values for mining potential suspects in massive transaction running water. At present, the work also generally depends on manual data mining and analysis, which needs to deeply understand data and criminal behaviors and deeply analyze the data, has high requirements on human experience, and provides new huge challenges for machine hardware and people along with the large outbreak of various transaction data volumes.
The community detection belongs to one of a plurality of technologies of complex networks, and the previous research mainly achieves great effect on networks in specific fields such as a scientist cooperation network, a power network, a protein interaction network and the like. From an initial GN algorithm to a label propagation algorithm which can be applied to a large-scale data set, and then to a generative model MMSB algorithm which is based on a probability statistical method and is proposed later, most of the existing community detection algorithms only utilize the space structure information of a graph, such as neighbor information. At present, an effective detection method for financial criminal parties is still lacked, and most of the algorithms based on the graph space topological structure are used, so that the accuracy of the obtained result is low, and the algorithm cannot be easily expanded to other transaction data. The existing method does not consider the time transaction sequence characteristics, and the characteristics capable of representing one node are also hidden in the time sequence. At present, no method can fully utilize the time sequence characteristics and the space characteristics of financial transaction data to detect criminal gangs.
Chinese patent document CN104867055A discloses a financial network suspicious fund tracking and identifying method, which comprises: (1) constructing a financial transaction network topological graph: the financial transaction network topological graph is a graph obtained by visualizing and displaying the original financial transaction flow after processing, and the graph contains all transaction relations and fund flow directions in the original financial transaction flow; (2) a capital flow direction analysis process; after the transaction network topological graph is constructed, carrying out fund flow direction analysis on the graph, wherein the purpose of the fund flow direction analysis is to track the specific flow direction of one or more funds; (3) transaction relationship analysis flow: and after the transaction network topological graph is constructed, carrying out fund flow direction analysis on the graph, wherein the purpose of the transaction relation analysis is to dig out the fund flow relation of the suspected gangs to obtain a high-volume high-suspicion fund path. However, the patent only utilizes the topological relation of the original transaction flow and does not deeply mine the time-space sequence characteristics in the transaction sequence; meanwhile, too many places need to be manually input and adjusted, and the automation degree is low.
Chinese patent document CN110348978A provides a risk group identification method, apparatus, device and storage medium based on graph calculation, the method includes: receiving a service request, wherein the service request comprises a service type and user attribute information; performing social network analysis on the service type, the user attribute information and historical service data corresponding to the service request to generate a corresponding social network; segmenting a sub-network corresponding to the service request from the social sub-network according to the degree of aggregation; and inputting the adjacency matrix of the sub-network into a preset prediction model to obtain a risk group identification result corresponding to the service request. The embodiment of the specification can realize the identification and detection of the risk group in the financial business. However, in the patent, 1) the biggest problem of clustering recognition based on aggregation level is that the obtained clusters are often members which are closely adjacent on the graph, and the members which are not adjacent cannot recognize; 2) the problem in the algorithm is that two nodes cannot be separated after being combined in the aggregation method, so the error fraction rate is high; 3) the use of transaction sequences remains in a simple topological relationship with unreliable results.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a group partner detection method based on a financial transaction network, which utilizes basic characteristics of a transaction account number, a transaction counter-account number, transaction time and the like in original financial transaction stream information data, extracts time sequence characteristics and space structure characteristics through a sequence model and a GAE model in a self-adaptive manner, and finally calculates the distance between every two nodes as network weight by using connection characteristics, so that each user can be allocated to a potential group partner by using the detection method. The method can reduce the work of manually extracting the features, can automatically determine the number of the groups, and can effectively improve the accuracy and the interpretability of the existing method.
The invention also provides an implementation device of the group partner detection method based on the financial transaction network.
Interpretation of terms:
skip-gram model a neural network model for training word vectors.
GAE model: the graph self-encoder model is a neural network model which efficiently represents input graph data through unsupervised learning.
3. High frequency word sampling technique: that is, in the process of training the word vector, in order to overcome the influence of the high frequency word, the high frequency word is deleted with a certain probability.
4. Negative sampling technology: a method for increasing the training speed of a neural network. Not all parameters are updated, only a small number of neuron parameters are updated.
5, GCN: graph convolution neural network, a neural network that convolves graph structure data.
The technical scheme of the invention is as follows:
a method of group detection based on a financial transaction network, comprising:
(1) data preprocessing: performing data cleaning on the transaction data, extracting a transaction sequence of each user and constructing graph data;
(2) generating a user feature vector: acquiring a user time sequence characteristic vector by using a sequence model, and acquiring a user space characteristic vector by using a GAE model; respectively normalizing the user time sequence characteristic vector and the space characteristic vector, performing connection operation, and generating a node expression vector
Figure BDA0002619043820000031
d′1,......d′mRepresenting the normalized user timing feature vector,
Figure BDA0002619043820000032
representing the normalized spatial feature vector;
(3) group detection: and calculating the group to which each node belongs and outputting a group mark of the node.
According to the invention, in the step (1), the transaction data includes a user, a transaction counter account, a transaction time and a transaction amount, and the specific step of performing data cleansing on the transaction data includes:
1-1, missing value filling: if any field of a user, a transaction counter account and transaction time of certain transaction data is missing, discarding the transaction data;
if only the transaction amount in certain transaction data has field loss, filling by adopting an average filling method, namely calculating the average value of all transaction amounts of the current user, and filling the transaction amount by adopting the average value;
1-2, data inconsistency processing: when different date forms are used for representing dates, a data time library of Python is used for formatting, and all time formats are unified into date forms of year, month and day; for example: dates within the trade time are "2019/01/07" and "07/01/2019", formatted using the Python datatime library, with all time formats unified to "20190107";
1-3, feature coding: and mapping by using a map, converting the account numbers of the users and the transaction opponents with more than 15 digits into label-encoding (label-encoding), and finally obtaining a transaction sample set. For example, after 100 account numbers are mapped, the number is 0-99, and the transaction account number comprises a bank card number and an account number; the transaction sample set comprises a plurality of pieces of transaction data which are preprocessed through the steps 1-1 to 1-3.
Many fields in the original transaction data have more missing values and abnormal values, and the main purpose of data cleaning is to process dirty data into primarily usable input data.
Preferably, in step (1), the specific steps of extracting the transaction sequence of each user and constructing graph data include:
a. generating a transaction sequence for each user based on the chronological order: obtaining a user set by using unique function of a Pandas library
Figure BDA0002619043820000041
n is the total number of users, ui1.... n, representing the ith transaction user; m is the total number of the counter account number of the transaction
Figure BDA0002619043820000042
j
1.... m, representing the jth transaction-partner account number;
for user uiIn other words, all the counter-trade account numbers are obtained from the trade sample set, and are sorted in ascending order according to the trade time, and the sequences are recombined into a trade sequence Li
With user uiIs a key, transaction sequence LiTo build a set of key-value pairs S ═ S i1.. n }, where siIs (u)i,Li) (ii) a The key of the key value pair set S is a user, and the value is a transaction sequence and is used for finding the transaction sequence through the user;
b. and (3) constructing graph data: the graph data comprises an adjacency matrix A and a feature matrix X of a graph node;
firstly, in a transaction sample set, a user u is extracted from the same transaction dataiAnd the account number of the transaction opponent
Figure BDA0002619043820000043
Form a sequence pair
Figure BDA0002619043820000044
i=1,...,n,j=1,...,m;
Then all the sequence pairs are subjected to the duplicate removal operation, and all the sequence pair sets after the duplicate removal are taken as an edge set E of the graph G, wherein E is { E {i1., m }; all user sets U are used as a node set V, V ═ V i1, ·, n }; generating an adjacency matrix A E R of a node through a network x library using the edge set E and the node set Vn×nThe adjacency matrix represents the topological structure of a graph by judging whether nodes of the coded graph are connected or not; the user and the transaction counter account are used as nodes, and an edge is added when a transaction exists between any two nodes;
the characteristic matrix X of the graph nodes is a degree matrix D of the nodes, the degree matrix D is a diagonal matrix, the elements on the diagonal are degrees of each node i, and the degree D of each node iiRepresentation and node viNumber of associated edges, Di=[di],DiRepresenting a node viThe obtained adjacency matrix A and the feature matrix X of the graph nodes are used as training data of the GAE model.
Preferably, in step (2), the sequence model is used to obtain the user time sequence feature vector, and the transaction sequence L is obtainediAs a result of being viewed as a sentence,
Figure BDA0002619043820000051
optimizing each layer of parameters by maximizing the probability of appearance of context nodes in the case of appearance of a central node, comprising the following specific steps:
2-1, preparing training data: firstly, vectorizing a node list by using an OneHotEncoder in a sklern library to obtain a node One-hot vector with a higher dimensionality, wherein the dimensionality of the node One-hot vector is equal to the number of words;
then setting window and skip step size to generate training data, and passing through transaction sequence LiConstruction of training data, Li={Li (1),...,Li (k)};LiFor user uiK is the transaction sequence L, and the superscript 1iK transaction-to-hand account numbers; setting a window and skip step size, taking a certain node as a central node, constructing a (input, output) form training set, and obtaining training data, wherein output is a context node and output is the central node;
in particular, assume that both the window and step size take 2, from Li (2)Starting as a central node, respectively selecting two nodes on the left side and the right side as window nodes, constructing a training set in the form of (input, output), and obtaining (L) in the form ofi (2),Li (1)),(Li (2),Li (3)),(Li (2),Li (4)) Three sets of training data;
2-2, constructing a Skip-gram model to obtain a node vector: the Skip-gram model comprises an input layer, a hidden layer and an output layer which are connected in sequence,
inputting a node One-Hot vector by an input layer; the dimensionality of the hidden layer is set according to the user requirement, and the dimensionality of the hidden layer is the number of the hidden layer neurons; the output layer is a softmax classifier, outputs the probability of each node,
calculating a cross entropy loss function, updating model weight parameters by using a gradient descent method, and finally using a weight matrix from an input layer to a hidden layer as a time sequence characteristic R of a nodeSequence of={d′1,......d′m};
Preferably, in the process of generating the training set in step 2-1, a high-frequency word sampling technique is used to sample vector sequence pairs (input, output) in the training samples, so as to reduce the number of the training samples and solve the problem of overlarge scale of the weight matrix and the training samples;
and by adopting a negative sampling technology, only the weight of each part of the model is updated when each sample is trained, so that the calculation load is reduced.
Preferably, in step (2), the user space feature vector is obtained by using a GAE model, where the GAE model includes an encoder and a decoder; the encoder comprises two layers of GCNs, and the decoder is used for calculating the probability of edges existing between any two nodes and then generating edges to form a reconstructed picture; the method comprises the following specific steps:
a. inputting an adjacency matrix A and a feature matrix X of a graph node at an input layer of the GAE model;
b. two layers of GCN of an encoder perform feature extraction on an adjacency matrix A and a feature matrix X of a graph node to obtain a node embedding vector Z, wherein it is assumed that each input sample adjacency matrix A obeys Gaussian distribution, feature extraction is performed on the adjacency matrix A and the feature matrix X of the graph node through the two layers of GCN, a mean value and a variance are determined, namely a distribution function of the Gaussian distribution is determined, and a reconstructed adjacency matrix is obtained through the distribution function of the Gaussian distribution
Figure BDA0002619043820000061
The node embedding vector Z satisfies:
Z=GCN(X,A) (I),
in the formula (I), GCN represents a graph convolution neural network model, X is a characteristic matrix of a graph node, and A is an adjacent matrix;
c. inputting the node embedding vector Z into a decoder, generating the connection probability of edges by using the decoder, and reconstructing a picture; finally, the reconstructed adjacency matrix is output by the output layer
Figure BDA0002619043820000062
The calculation formula is as follows:
Figure BDA0002619043820000063
in formula (II), the superscript T represents transposition, sigma (-) represents sigmoid function, namely output activation function of neuron, which is a common expression symbol in neural network,
Figure BDA0002619043820000064
representing the reconstructed adjacency matrix;
adopting a loss function L to measure the difference between the reconstructed image and the original image, and enabling the reconstructed image to be closest to the original image by minimizing the loss function L;
inputting an adjacency matrix A of a graph and a feature matrix X of nodes, extracting features of the adjacency matrix A and the feature matrix X of the graph nodes by an encoder with a two-layer GCN structure, calculating the probability of edges existing between any two nodes by using a decoder to generate the graph, measuring the difference between the input graph and the graph generated by GAE by a loss function L, and optimizing W0,W1The loss function L is minimized so that the reconstructed graph is closest to the original graph, resulting in a node-embedded vector matrix Z having the spatial characteristics of the graph,
Figure BDA0002619043820000065
z is a matrix of n rows, and the row vector corresponds to a node;
further preferably, in step b, the two layers of GCN are defined as follows:
Figure BDA0002619043820000066
in formula (III), ReLU (. cndot.) represents a linear rectification function,
Figure BDA0002619043820000067
d represents degree matrix, superscript-1/2 represents exponentiation, W0Representing a first weight matrix, W1Representing a second weight matrix;
further preferably, in step c, the decoder reconstructs the graph by calculating the probability between nodes, i.e. reconstructs the adjacency matrix:
Figure BDA0002619043820000071
in formula (IV), Sigmoid (. cndot.) is an activation function, which maps variables between 0 and 1, and if the probability exceeds a threshold, AijIs 1, represents that two nodes are connected to finally obtain an adjacency matrix
Figure BDA0002619043820000072
AijRepresenting nodes embedded in elements of the vector matrix Z located in the ith row and jth column, ZiAnd zjRespectively embedding nodes into i rows and j rows of a vector matrix;
Figure BDA0002619043820000073
representing the probability of reconstructing the connection between any two nodes i and j by embedding the vector matrix Z into the known nodes; sigmoid (-) is an activation function, maps variables between 0 and 1, and if the probability exceeds a threshold value, represents that two nodes are connected and corresponds to an adjacency matrix
Figure BDA0002619043820000074
The middle element is set to be 1,
Figure BDA0002619043820000075
is a decomposed representation of matrix a;
the loss function is a measure of the distance between the reconstructed picture by the encoder-decoder structure and the original picture:
L=Eq(Z|X,A)[logp(A|Z) (V)
in formula (V), L represents a loss function, and Eq (. cndot.) represents a desired distribution;
training GAE by using random gradient descent, finishing the loss function convergence training, and finally obtaining a low-dimensional node embedding vector matrix Z of the nodes;
by optimizing W0,W1And minimizing the loss function L, so that the reconstructed graph is closest to the original graph to obtain a low-dimensional node embedding vector matrix Z, and the low-dimensional node embedding vector matrix Z has the spatial characteristics of the graph.
Minimizing L by W requires a gradient of L to W, and then optimizing L using a gradient descent method to minimize L.
In the data preprocessing stage, an adjacency matrix A of a transaction graph and a feature matrix X of a node are generated, the feature matrix X contains degree information of the node, and the module encodes a space representation vector of the node through a GAE model, namely the space representation vector contains the feature of the node and the feature of a neighbor node.
Preferably, in step (3), the distance between each two nodes is calculated and used as the weight of the edge to obtain the group of each node, and the group mark of the node is output, and the method specifically comprises the following steps:
3-1, first, the vector R is represented by the nodes generated in step 2iCalculating the distance between any two nodes in the graph data structure, and taking the calculated distance as the weight of the edge, wherein the larger the distance is, the farther the distance between the two nodes is; then each node in the graph data structure is distributed to a single group, the nodes in the network are continuously traversed, the change situation of the module degree caused by the node joining the neighbor group is compared, the node is selected to be joined to the group which can increase the compactness to the maximum,
the modularity Q defines a function as:
Figure BDA0002619043820000081
in the formula (VI), Q represents the modularity, m is the sum of the weights of all sides, WijRepresents the weight between node i and node j, kiRepresents the sum of the weights, k, of the edges connected to node ijRepresents the sum of the weights of the edges connected to node j, ciAs a group to which node i belongs, cjIs the group to which node j belongs, (c)i,cj) For an illustrative function, if ci and cj are the same group, 1, otherwise 0;
3-2, merging all nodes belonging to the same group into a new node to construct a hypergraph;
3-3, repeating the step 3-1 and the step 3-2 to obtain the final grouping and generating (u)i,ci) Party mark of ciIs the group to which the node i belongs.
The realization device of the group partner detection method based on the financial transaction network comprises the following steps:
the data preprocessing module is used for carrying out data cleaning on transaction data, extracting a transaction sequence of each user and constructing graph data, and is used for executing the step (1);
the user characteristic vector generation module is used for acquiring a user time sequence characteristic vector by using a sequence model, acquiring a user space characteristic vector by using a GAE model, normalizing the user time sequence characteristic vector and the space characteristic vector respectively and connecting the user time sequence characteristic vector and the space characteristic vector for executing the step (2);
and the group detection module is used for calculating the group of each node and outputting the group mark of the node for executing the step (3).
The invention has the beneficial effects that:
1. the invention mainly provides a group detection method based on the combination of time series characteristics and space structure characteristics of nodes. The method utilizes basic characteristics of users, counter-trading account numbers, trading time and the like in original financial trading flow information data, extracts time sequence characteristics and space structure characteristics in a self-adaptive mode through a sequence model and a GAE model, finally calculates the distance between every two nodes as weight through connection characteristics, and can allocate each user to potential groups through a group detection algorithm based on modularity optimization.
2. The invention mainly aims to provide an auxiliary decision making system for case handling personnel, features are automatically extracted based on a Skip-gram model and a GAE model, manpower is greatly released, and the generated ganging marks can also be used for tracking potential suspects.
3. The group partner detection method based on the financial transaction network provided by the invention has the advantages that the flow is full-automatic, any person can obtain a final desired result by inputting original data with few fields, the working efficiency is improved, and a large amount of time is saved. Along with the improvement of the input transaction data quantity, the quantity of automatically constructed training data is increased, and the accuracy of the model is further improved.
Drawings
Fig. 1 is a data flow diagram of a group partner detection method based on a financial transaction network according to the present invention.
FIG. 2 is a schematic diagram of the structure of a sequence model.
FIG. 3 is a schematic diagram of the structure of the GAE model.
Fig. 4 is a flowchart of a group partner detection method based on a financial transaction network according to the present invention.
Detailed Description
The invention is further described below, but not limited thereto, with reference to the following examples and the accompanying drawings.
Example 1
A group detection method based on financial transaction network, as shown in fig. 1 and 4, comprising:
(1) data preprocessing: performing data cleaning on the transaction data, extracting a transaction sequence of each user and constructing graph data;
in the step (1), the transaction data includes a user, a transaction counter account, transaction time and transaction amount, and the specific steps of performing data cleaning on the transaction data include:
1-1, missing value filling: if any field of a user, a transaction counter account and transaction time of certain transaction data is missing, discarding the transaction data;
if only the transaction amount in certain transaction data has field loss, filling by adopting an average filling method, namely calculating the average value of all transaction amounts of the current user, and filling the transaction amount by adopting the average value;
1-2, data inconsistency processing: when different date forms are used for representing dates, a data time library of Python is used for formatting, and all time formats are unified into date forms of year, month and day; for example: dates within the trade time are "2019/01/07" and "07/01/2019", formatted using the Python datatime library, with all time formats unified to "20190107";
1-3, feature coding: and mapping by using a map, converting the account numbers of the users and the transaction opponents with more than 15 digits into label-encoding (label-encoding), and finally obtaining a transaction sample set. For example, after 100 account numbers are mapped, the number is 0-99, and the transaction account number comprises a bank card number and an account number; the transaction sample set comprises a plurality of pieces of transaction data which are preprocessed through the steps 1-1 to 1-3.
Many fields in the original transaction data have more missing values and abnormal values, and the main purpose of data cleaning is to process dirty data into primarily usable input data.
In the step (1), the specific steps of extracting the transaction sequence of each user and constructing graph data include:
a. generating a transaction sequence for each user based on the chronological order: obtaining a user set by using unique function of a Pandas library
Figure BDA0002619043820000101
n is the total number of users, u i1.... n, representing the ith transaction user; m is the total number of the counter account number of the transaction
Figure BDA0002619043820000102
j
1.... m, representing the jth transaction-partner account number;
for user uiIn other words, all the counter-trade account numbers are obtained from the trade sample set, and are sorted in ascending order according to the trade time, and the sequences are recombined into a trade sequence Li
With user uiIs a key, transaction sequence LiTo build a set of key-value pairs S ═ S i1.. n }, where siIs (u)i,Li) (ii) a The key of the key value pair set S is a user, and the value is a transaction sequence and is used for finding the transaction sequence through the user;
b. and (3) constructing graph data: the graph data comprises an adjacency matrix A and a feature matrix X of a graph node;
firstly, in a transaction sample set, a user u is extracted from the same transaction dataiAnd the account number of the transaction opponent
Figure BDA0002619043820000103
Form a sequence pair
Figure BDA0002619043820000104
i=1,...,n,j=1,...,m;
Then all the sequence pairs are subjected to duplicate removal operationAll the order pair sets after the duplication removal are used as an edge set E of the graph G, and E is equal to { E {i1., m }; all user sets U are used as a node set V, V ═ V i1, ·, n }; generating an adjacency matrix A E R of a node through a network x library using the edge set E and the node set Vn×nThe adjacency matrix represents the topological structure of a graph by judging whether nodes of the coded graph are connected or not; the user and the transaction counter account are used as nodes, and an edge is added when a transaction exists between any two nodes;
the characteristic matrix X of the graph nodes is a degree matrix D of the nodes, the degree matrix D is a diagonal matrix, the elements on the diagonal are degrees of each node i, and the degree D of each node iiRepresentation and node viNumber of associated edges, Di=[di],DiRepresenting a node viDegree of (c).
The obtained adjacency matrix A and the feature matrix X of the graph nodes are used as training data of the GAE model.
(2) Generating a user feature vector: acquiring a user time sequence characteristic vector by using a sequence model, and acquiring a user space characteristic vector by using a GAE model; respectively normalizing the user time sequence characteristic vector and the space characteristic vector, performing connection operation, and generating a node expression vector
Figure BDA0002619043820000105
d′1,......d′mRepresenting the normalized user timing feature vector,
Figure BDA0002619043820000106
representing the normalized spatial feature vector;
in the step (2), the sequence model is used for obtaining the user time sequence characteristic vector, and the transaction sequence L is processediAs a result of being viewed as a sentence,
Figure BDA0002619043820000111
optimizing each layer of parameters by maximizing the probability of occurrence of context nodes in the case of occurrence of a central node, the specific steps comprising:
2-1, preparing training data: firstly, vectorizing a node list by using an OneHotEncoder in a sklern library to obtain a node One-hot vector with a higher dimensionality, wherein the dimensionality of the node One-hot vector is equal to the number of words;
then setting window and skip step size to generate training data, and passing through transaction sequence LiConstruction of training data, Li={Li (1),...,Li (k)};LiFor user uiK is the transaction sequence L, and the superscript 1iK transaction-to-hand account numbers; setting a window and skip step size, taking a certain node as a central node, constructing a training set in an (input, output) form, and obtaining training data;
in particular, assume that both the window and step size take 2, from Li (2)Starting as a central node, respectively selecting two nodes on the left side and the right side as window nodes, constructing a training set in the form of (input, output), and obtaining (L) in the form ofi (2),Li (1)),(Li (2),Li (3)),(Li (2),Li (4)) Three sets of training data;
in the process of generating the training set in the step 2-1, a high-frequency word sampling technology is used for sampling vector sequence pairs (input, output) in the training samples so as to reduce the number of the training samples and solve the problem that the weight matrix and the training samples are overlarge in scale;
and by adopting a negative sampling technology, only the weight of each part of the model is updated when each sample is trained, so that the calculation load is reduced.
2-2, constructing a Skip-gram model to obtain a node vector: the Skip-gram model comprises an input layer, a hidden layer and an output layer which are connected in sequence,
inputting a node One-Hot vector by an input layer; the dimensionality of the hidden layer is set according to the user requirement, and the dimensionality of the hidden layer is the number of the hidden layer neurons; the output layer is a softmax classifier, outputs the probability of each node,
calculating cross entropy loss function, updating model weight parameters by using gradient descent method, and finally using inputLayer-to-hidden layer weight matrix as timing characteristic R of nodeSequence of={d′1,......d′m};
In the step (2), a GAE model is used to obtain the user space feature vector, wherein the GAE model comprises an encoder and a decoder; the encoder comprises two layers of GCNs, and the decoder is used for calculating the probability of edges existing between any two nodes and then generating edges to form a reconstructed picture; the method comprises the following specific steps:
a. inputting an adjacency matrix A and a feature matrix X of a graph node at an input layer of the GAE model;
b. two layers of GCN of an encoder perform feature extraction on an adjacency matrix A and a feature matrix X of a graph node to obtain a node embedding vector Z, wherein it is assumed that each input sample adjacency matrix A obeys Gaussian distribution, feature extraction is performed on the adjacency matrix A and the feature matrix X of the graph node through the two layers of GCN, a mean value and a variance are determined, namely a distribution function of the Gaussian distribution is determined, and a reconstructed adjacency matrix is obtained through the distribution function of the Gaussian distribution
Figure BDA0002619043820000121
The node embedding vector Z satisfies:
Z=GCN(X,A) (I),
in the formula (I), GCN represents a graph convolution neural network model, X is a characteristic matrix of a graph node, and A is an adjacent matrix;
c. inputting the node embedding vector Z into a decoder, generating the connection probability of edges by using the decoder, and reconstructing a picture; finally, the reconstructed adjacency matrix is output by the output layer
Figure BDA0002619043820000122
The calculation formula is as follows:
Figure BDA0002619043820000123
in formula (II), the superscript T represents transposition, sigma (-) represents sigmoid function, namely output activation function of neuron, which is a common expression symbol in neural network,
Figure BDA0002619043820000124
representing the reconstructed adjacency matrix;
adopting a loss function L to measure the difference between the reconstructed image and the original image, and enabling the reconstructed image to be closest to the original image by minimizing the loss function L;
inputting an adjacency matrix A of a graph and a feature matrix X of nodes, extracting features of the adjacency matrix A and the feature matrix X of the graph nodes by an encoder with a two-layer GCN structure, calculating the probability of edges existing between any two nodes by using a decoder to generate the graph, measuring the difference between the input graph and the graph generated by GAE by a loss function L, and optimizing W0,W1The loss function L is minimized so that the reconstructed graph is closest to the original graph, resulting in a node-embedded vector matrix Z having the spatial characteristics of the graph,
Figure BDA0002619043820000125
z is a matrix of n rows, and the row vector corresponds to a node;
further, in step b, the definition of the two layers of GCN is as follows:
Figure BDA0002619043820000126
in formula (III), ReLU (. cndot.) represents a linear rectification function,
Figure BDA0002619043820000127
d represents degree matrix, superscript-1/2 represents exponentiation, W0Representing a first weight matrix, W1Representing a second weight matrix;
further, in step c, the decoder reconstructs the graph by calculating the probability between the nodes, i.e. reconstructs the adjacency matrix:
Figure BDA0002619043820000131
in formula (IV), Sigmoid (. cndot.) is an activation function and will changeThe quantity maps between 0 and 1, if the probability exceeds a threshold, then AijIs 1, represents that two nodes are connected to finally obtain an adjacency matrix
Figure BDA0002619043820000132
AijRepresenting nodes embedded in elements of the vector matrix Z located in the ith row and jth column, ZiAnd zjRespectively embedding nodes into i rows and j rows of a vector matrix;
Figure BDA0002619043820000133
representing the probability of reconstructing the connection between any two nodes i and j by embedding the vector matrix Z into the known nodes; sigmoid (-) is an activation function, maps variables between 0 and 1, and if the probability exceeds a threshold value, represents that two nodes are connected and corresponds to an adjacency matrix
Figure BDA0002619043820000134
The middle element is set to be 1,
Figure BDA0002619043820000135
is a decomposed representation of matrix a;
the loss function is a measure of the distance between the reconstructed picture by the encoder-decoder structure and the original picture:
L=Eq(Z|x,A)[logp(A|Z) (V)
in formula (V), L represents a loss function, and Eq (. cndot.) represents a desired distribution;
training GAE by using random gradient descent, finishing the loss function convergence training, and finally obtaining a low-dimensional node embedding vector matrix Z of the nodes;
by optimizing W0,W1And minimizing the loss function L, so that the reconstructed graph is closest to the original graph to obtain a low-dimensional node embedding vector matrix Z, and the low-dimensional node embedding vector matrix Z has the spatial characteristics of the graph.
Minimizing L by W requires a gradient of L to W, and then optimizing L using a gradient descent method to minimize L.
In the data preprocessing stage, an adjacency matrix A of a transaction graph and a feature matrix X of a node are generated, the feature matrix X contains degree information of the node, and the module encodes a space representation vector of the node through a GAE model, namely the space representation vector contains the feature of the node and the feature of a neighbor node.
(3) Group detection: and calculating the group to which each node belongs and outputting a group mark of the node.
The step can be applied to the existing algorithms such as K-means, KNN and the like based on clustering and community detection algorithms of characteristic space distance;
in this example, in the step (3), the distance between each two nodes is calculated based on the euclidean distance, and the calculated distance is used as the weight of the edge to obtain the group to which each node belongs, and the specific steps include:
3-1, first, the vector R is represented by the nodes generated in step 2iCalculating the distance between any two nodes in the graph data structure, and taking the calculated distance as the weight of the edge, wherein the larger the distance is, the farther the distance between the two nodes is; then each node in the graph data structure is distributed to a single group, the nodes in the network are continuously traversed, the change situation of the module degree caused by the node joining the neighbor group is compared, the node is selected to be joined to the group which can increase the compactness to the maximum,
the modularity Q defines a function as:
Figure BDA0002619043820000141
in the formula (VI), Q represents the modularity, m is the sum of the weights of all sides, WijRepresents the weight between node i and node j, kiRepresents the sum of the weights, k, of the edges connected to node ijRepresents the sum of the weights of the edges connected to node j, ciAs a group to which node i belongs, cjIs the group to which node j belongs, (c)i,cj) For an illustrative function, if ci and cj are the same group, 1, otherwise 0;
3-2, merging all nodes belonging to the same group into a new node to construct a hypergraph;
3-3, repeating the step 3-1 and the step 3-2 to obtain the final grouping and generating (u)i,ci) Party mark of ciIs the group to which the node i belongs.
The invention mainly provides a group detection method based on the combination of time sequence characteristics and space structure characteristics of nodes. The method utilizes basic characteristics of a transaction account number, a transaction counter account number, transaction time and the like in original financial transaction flow information data, extracts time sequence characteristics and space structure characteristics in a self-adaptive mode through a sequence skip-gram model and a GAE model, calculates the distance between every two nodes as weight by using connection characteristics, and can distribute each user to potential groups by using a group detection algorithm based on modularity optimization. The method reduces the workload of artificial characteristic engineering and fully utilizes the time sequence and spatial characteristics of the transaction diagram.
Example 2
An implementation apparatus of a group partner detection method for a financial transaction network provided in embodiment 1 includes:
the data preprocessing module is used for carrying out data cleaning on transaction data, extracting a transaction sequence of each user and constructing graph data, and is used for executing the step (1);
the user characteristic vector generation module is used for acquiring a user time sequence characteristic vector by using a sequence model, acquiring a user space characteristic vector by using a GAE model, normalizing the user time sequence characteristic vector and the space characteristic vector respectively and connecting the user time sequence characteristic vector and the space characteristic vector for executing the step (2);
and the group detection module is used for calculating the group of each node and outputting the group mark of the node for executing the step (3).

Claims (10)

1. A group partner detection method based on a financial transaction network, comprising:
(1) data preprocessing: performing data cleaning on the transaction data, extracting a transaction sequence of each user and constructing graph data;
(2) generating a user feature vector: using sequential modesAcquiring a user time sequence characteristic vector, and acquiring a user space characteristic vector by using a GAE model; respectively normalizing the user time sequence characteristic vector and the space characteristic vector, performing connection operation, and generating a node expression vector
Figure FDA0002619043810000011
d′1,……d′mRepresenting the normalized user timing feature vector,
Figure FDA0002619043810000012
representing the normalized spatial feature vector;
(3) group detection: and calculating the group to which each node belongs and outputting a group mark of the node.
2. The group partner detecting method based on the financial transaction network as claimed in claim 1, wherein in the step (1), the transaction data includes a user, a counter-party account number, a transaction time and a transaction amount, and the step of performing data cleansing on the transaction data includes:
1-1, missing value filling: if any field of a user, a transaction counter account and transaction time of certain transaction data is missing, discarding the transaction data;
if only the transaction amount in certain transaction data has field loss, filling by adopting an average filling method, namely calculating the average value of all transaction amounts of the current user, and filling the transaction amount by adopting the average value;
1-2, data inconsistency processing: when different date forms are used for representing dates, a data time library of Python is used for formatting, and all time formats are unified into date forms of year, month and day;
1-3, feature coding: and mapping by using a map, converting the account numbers of the users and the transaction opponents with more than 15 digits into label-encoding (label-encoding), and finally obtaining a transaction sample set.
3. The group detection method based on financial transaction network as claimed in claim 1, wherein in the step (1), the specific steps of extracting the transaction sequence and constructing graph data of each user comprise:
a. generating a transaction sequence for each user based on the chronological order: obtaining a user set by using unique function of a Pandas library
Figure FDA0002619043810000013
n is the total number of users, ui1, … …, n, representing the ith trading user; m is the total number of the counter account number of the transaction
Figure FDA0002619043810000014
Represents the jth transaction partner account number;
for user uiIn other words, all the counter-trade account numbers are obtained from the trade sample set, and are sorted in ascending order according to the trade time, and the sequences are recombined into a trade sequence Li
With user uiIs a key, transaction sequence LiTo build a set of key-value pairs S ═ Si1.. n }, where siIs (u)i,Li) (ii) a The key of the key-value pair set S is a user, and the value is a transaction sequence and is used for finding the transaction sequence through the user;
b. and (3) constructing graph data: the graph data comprises an adjacency matrix A and a feature matrix X of a graph node;
firstly, in a transaction sample set, a user u is extracted from the same transaction dataiAnd the account number of the transaction opponent
Figure FDA0002619043810000021
Form a sequence pair
Figure FDA0002619043810000022
Then all the sequence pairs are subjected to the duplicate removal operation, and all the sequence pair sets after the duplicate removal are taken as an edge set E of the graph G, wherein E is { E {i1., m }; all user sets U are used as a node set V, V ═ Vi1, ·, n }; generating an adjacency matrix A E R of a node through a network x library using the edge set E and the node set Vn×nThe adjacency matrix represents the topological structure of a graph by judging whether nodes of the coded graph are connected or not; the user and the transaction counter account are used as nodes, and an edge is added when a transaction exists between any two nodes;
the characteristic matrix X of the graph nodes is a degree matrix D of the nodes, the degree matrix D is a diagonal matrix, the elements on the diagonal are degrees of each node i, and the degree D of each node iiRepresentation and node viNumber of associated edges, Di=[di],DiRepresenting a node viThe obtained adjacency matrix A and the feature matrix X of the graph nodes are used as training data of the GAE model.
4. The method as claimed in claim 1, wherein in the step (2), the user time sequence feature vector is obtained by using a sequence model, and the transaction sequence L is converted into the transaction sequence LiAs a result of being viewed as a sentence,
Figure FDA0002619043810000023
optimizing each layer of parameters by maximizing the probability of occurrence of context nodes in the case of occurrence of a central node, the specific steps comprising:
2-1, preparing training data: firstly, vectorizing a node list by using an OneHotEncoder in a sklern library to obtain a node One-hot vector with a higher dimensionality, wherein the dimensionality of the node One-hot vector is equal to the number of words;
then setting window and skip step size to generate training data, and passing through transaction sequence LiConstruction of training data, Li={Li (1),...,Li (k)};LiFor user uiK is the transaction sequence L, and the superscript 1iK transaction partner account numbers; setting a window and skip step size, taking a certain node as a central node, and constructing a training set in an (input, output) form to obtain training data;
2-2, constructing a Skip-gram model to obtain a node vector: the Skip-gram model comprises an input layer, a hidden layer and an output layer which are connected in sequence,
inputting a node One-Hot vector by an input layer; the dimension of the hidden layer is the number of neurons of the hidden layer; the output layer is a softmax classifier, outputs the probability of each node,
calculating a cross entropy loss function, updating model weight parameters by using a gradient descent method, and finally using a weight matrix from an input layer to a hidden layer as a time sequence characteristic R of a nodeSequence of={d′1,……d′m}。
5. The method for group detection based on financial transaction network as claimed in claim 4, wherein in the step 2-1, the vector order pairs (input, output) in the training samples are sampled by using high frequency word sampling technique; with the negative sampling technique, only each partial model weight is updated as each sample is trained.
6. The financial transaction network-based group partner detecting method according to claim 1, wherein in the step (2), the user space feature vector is obtained by using a GAE model, the GAE model comprises an encoder and a decoder; the encoder comprises two layers of GCNs, and the decoder is used for calculating the probability of edges existing between any two nodes and then generating edges to form a reconstructed picture; the method comprises the following specific steps:
a. inputting an adjacency matrix A and a feature matrix X of a graph node at an input layer of the GAE model;
b. two layers of GCN of the encoder extract the characteristics of the adjacent matrix A and the characteristic matrix X of the graph nodes to obtain a node embedding vector Z, and the node embedding vector Z meets the following requirements:
Z=GCN(X,A) (I),
in the formula (I), GCN represents a graph convolution neural network model, X is a characteristic matrix of a graph node, and A is an adjacent matrix;
c. the node embedding vector Z is input to a decoder, and the graph is reconstructed using the connection probability of the decoder generated edges(ii) a Finally, outputting the reconstructed adjacency matrix by the output layer
Figure FDA0002619043810000031
The calculation formula is as follows:
Figure FDA0002619043810000032
in formula (II), the superscript T denotes transpose, σ (-) denotes sigmoid function, i.e. the output activation function of the neuron,
Figure FDA0002619043810000033
representing the reconstructed adjacency matrix;
adopting a loss function L to measure the difference between the reconstructed image and the original image, and enabling the reconstructed image to be closest to the original image by minimizing the loss function L;
inputting an adjacency matrix A of a graph and a feature matrix X of nodes, performing feature extraction on the adjacency matrix A and the feature matrix X of the graph nodes through an encoder with a two-layer GCN structure, calculating the probability of edges existing between any two nodes by using a decoder to generate the graph, measuring the difference between the input graph and the graph generated by GAE through a loss function L, and optimizing W0,W1So as to minimize the loss function L and obtain a node embedded vector matrix Z having the spatial characteristics of the graph,
Figure FDA0002619043810000041
z is a matrix of n rows, the row vector corresponding to a node.
7. The method as claimed in claim 6, wherein in the step b, the two-layer GCN is defined as follows:
Figure FDA0002619043810000042
in the formula (III), ReLU (. smallcircle.) represents a lineA function of a linear rectification,
Figure FDA0002619043810000043
d represents degree matrix, superscript-1/2 represents exponentiation, W0Representing a first weight matrix, W1Representing a second weight matrix.
8. The method as claimed in claim 6, wherein the decoder reconstructs the graph by calculating the probability between nodes, i.e. reconstructs the adjacency matrix:
Figure FDA0002619043810000044
in formula (IV), Sigmoid (. cndot.) is an activation function, which maps variables between 0 and 1, and if the probability exceeds a threshold, AijIs 1, represents that two nodes are connected to obtain the adjacency matrix
Figure FDA0002619043810000045
AijRepresenting the node embedded in an element of the vector matrix Z located in the ith row and jth column, ZiAnd zjRespectively embedding nodes into i rows and j rows of a vector matrix;
Figure FDA0002619043810000046
representing the probability of reconstructing the connection between any two nodes i and j by embedding the vector matrix Z into the known nodes; sigmoid (-) is an activation function, maps variables between 0 and 1, and if the probability exceeds a threshold value, represents that two nodes are connected and corresponds to an adjacency matrix
Figure FDA0002619043810000047
The middle element is set to be 1,
Figure FDA0002619043810000048
is a decomposed representation of matrix a;
the loss function is a measure of the distance between the reconstructed picture by the encoder-decoder structure and the original picture:
L=Eq(Z|X,A)[logp(A|Z) (V)
in formula (V), L represents a loss function, and Eq (. cndot.) represents a desired distribution;
training GAE by using random gradient descent, finishing the loss function convergence training, and finally obtaining a low-dimensional node embedding vector matrix Z of the nodes;
by optimizing W0,W1And minimizing the loss function L, so that the reconstructed graph is closest to the original graph to obtain a low-dimensional node embedding vector matrix Z, and the low-dimensional node embedding vector matrix Z has the spatial characteristics of the graph.
9. The group detection method based on the financial transaction network as claimed in claim 1, wherein in the step (3), the distance between each two nodes is calculated and used as the weight of the edge to obtain the group to which each node belongs, and the group mark of the node is output, and the specific steps include:
3-1, first, the vector R is represented by the nodes generated in step 2iCalculating the distance between any two nodes in the graph data structure, taking the calculated distance as the weight of an edge, then distributing each node in the graph data structure to a single group, continuously traversing the nodes in the network, comparing the modularity change condition caused by the node joining the neighbor group, selecting the node to join the group which can increase the compactness to the maximum,
the modularity Q defines a function as:
Figure FDA0002619043810000051
in the formula (VI), Q represents the modularity, m is the sum of the weights of all sides, WijRepresents the weight between node i and node j, kiRepresents the sum of the weights, k, of the edges connected to node ijRepresents the sum of the weights of the edges connected to node j, ciAs a group to which node i belongs, cjIs the group to which node j belongs, (c)i,cj) For an illustrative function, if ci and cj are the same group, 1, otherwise 0;
3-2, merging all nodes belonging to the same group into a new node to construct a hypergraph;
3-3, repeating the step 3-1 and the step 3-2 to obtain the final grouping and generating (u)i,cj) Party mark of ciIs the group to which the node i belongs.
10. An apparatus for implementing a group partner detection method of a financial transaction network as claimed in any one of claims 1 to 9, comprising:
the data preprocessing module is used for carrying out data cleaning on transaction data, extracting a transaction sequence of each user and constructing graph data, and is used for executing the step (1);
the user characteristic vector generation module is used for acquiring a user time sequence characteristic vector by using a sequence model, acquiring a user space characteristic vector by using a GAE model, normalizing the user time sequence characteristic vector and the space characteristic vector respectively and connecting the user time sequence characteristic vector and the space characteristic vector for executing the step (2);
and the group detection module is used for calculating the group of each node and outputting the group mark of the node for executing the step (3).
CN202010777629.2A 2020-08-05 2020-08-05 Group partner detection method based on financial transaction network and realization device thereof Active CN111861756B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010777629.2A CN111861756B (en) 2020-08-05 2020-08-05 Group partner detection method based on financial transaction network and realization device thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010777629.2A CN111861756B (en) 2020-08-05 2020-08-05 Group partner detection method based on financial transaction network and realization device thereof

Publications (2)

Publication Number Publication Date
CN111861756A true CN111861756A (en) 2020-10-30
CN111861756B CN111861756B (en) 2024-05-03

Family

ID=72971259

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010777629.2A Active CN111861756B (en) 2020-08-05 2020-08-05 Group partner detection method based on financial transaction network and realization device thereof

Country Status (1)

Country Link
CN (1) CN111861756B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113011979A (en) * 2021-03-29 2021-06-22 中国银联股份有限公司 Transaction detection method, training method and device of model and computer-readable storage medium
CN113362071A (en) * 2021-06-21 2021-09-07 浙江工业大学 Pompe fraudster identification method and system for Ether house platform
CN114741433A (en) * 2022-06-09 2022-07-12 北京芯盾时代科技有限公司 Community mining method, device, equipment and storage medium
CN114925243A (en) * 2022-05-06 2022-08-19 支付宝(杭州)信息技术有限公司 Method and device for predicting node attribute in graph network
CN112667863B (en) * 2021-01-16 2024-02-02 北京工业大学 Financial fraud group identification method based on hypergraph segmentation

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103164533A (en) * 2013-04-09 2013-06-19 哈尔滨工业大学 Complex network community detection method based on information theory
CN104867055A (en) * 2015-06-16 2015-08-26 咸宁市公安局 Financial network doubtable money tracking and identifying method
CN109165950A (en) * 2018-08-10 2019-01-08 哈尔滨工业大学(威海) A kind of abnormal transaction identification method based on financial time series feature, equipment and readable storage medium storing program for executing
CN110263227A (en) * 2019-05-15 2019-09-20 阿里巴巴集团控股有限公司 Clique based on figure neural network finds method and system
CN110413707A (en) * 2019-07-22 2019-11-05 百融云创科技股份有限公司 The excavation of clique's relationship is cheated in internet and checks method and its system
CN110956547A (en) * 2019-11-28 2020-04-03 广州及包子信息技术咨询服务有限公司 Search engine-based method and system for identifying cheating group in real time
US20200202219A1 (en) * 2017-12-15 2020-06-25 Alibaba Group Holding Limited Graphical structure model-based transaction risk control

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103164533A (en) * 2013-04-09 2013-06-19 哈尔滨工业大学 Complex network community detection method based on information theory
CN104867055A (en) * 2015-06-16 2015-08-26 咸宁市公安局 Financial network doubtable money tracking and identifying method
US20200202219A1 (en) * 2017-12-15 2020-06-25 Alibaba Group Holding Limited Graphical structure model-based transaction risk control
CN109165950A (en) * 2018-08-10 2019-01-08 哈尔滨工业大学(威海) A kind of abnormal transaction identification method based on financial time series feature, equipment and readable storage medium storing program for executing
CN110263227A (en) * 2019-05-15 2019-09-20 阿里巴巴集团控股有限公司 Clique based on figure neural network finds method and system
CN110413707A (en) * 2019-07-22 2019-11-05 百融云创科技股份有限公司 The excavation of clique's relationship is cheated in internet and checks method and its system
CN110956547A (en) * 2019-11-28 2020-04-03 广州及包子信息技术咨询服务有限公司 Search engine-based method and system for identifying cheating group in real time

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
张昊;黄蔚;胡国超;: "基于改进随机森林的洗钱交易角色识别应用", 计算机与现代化, no. 02 *
彭欣宇;: "基于DeepWalk的社团检测方法", 电脑知识与技术, no. 04 *
戚琦;申润业;王敬宇;: "GAD:基于拓扑感知的时间序列异常检测", 通信学报, no. 06 *
李鹏;李英乐;王凯;何赞园;李星;常振超;: "基于交互行为和连接分析的社交网络社团检测", 计算机科学, no. 07 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112667863B (en) * 2021-01-16 2024-02-02 北京工业大学 Financial fraud group identification method based on hypergraph segmentation
CN113011979A (en) * 2021-03-29 2021-06-22 中国银联股份有限公司 Transaction detection method, training method and device of model and computer-readable storage medium
CN113362071A (en) * 2021-06-21 2021-09-07 浙江工业大学 Pompe fraudster identification method and system for Ether house platform
CN114925243A (en) * 2022-05-06 2022-08-19 支付宝(杭州)信息技术有限公司 Method and device for predicting node attribute in graph network
CN114741433A (en) * 2022-06-09 2022-07-12 北京芯盾时代科技有限公司 Community mining method, device, equipment and storage medium
CN114741433B (en) * 2022-06-09 2022-09-23 北京芯盾时代科技有限公司 Community mining method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN111861756B (en) 2024-05-03

Similar Documents

Publication Publication Date Title
CN111861756A (en) Group partner detection method based on financial transaction network and implementation device thereof
CN111882446B (en) Abnormal account detection method based on graph convolution network
Omran et al. Differential evolution methods for unsupervised image classification
CN108427921A (en) A kind of face identification method based on convolutional neural networks
CN110995475A (en) Power communication network fault detection method based on transfer learning
CN106529721B (en) A kind of ad click rate forecasting system and its prediction technique that depth characteristic is extracted
CN113157957A (en) Attribute graph document clustering method based on graph convolution neural network
CN111783879B (en) Hierarchical compressed graph matching method and system based on orthogonal attention mechanism
CN113269647B (en) Graph-based transaction abnormity associated user detection method
CN115983984A (en) Multi-model fusion client risk rating method
CN112766283B (en) Two-phase flow pattern identification method based on multi-scale convolution network
CN117746260B (en) Remote sensing data intelligent analysis method and system
CN115310589A (en) Group identification method and system based on depth map self-supervision learning
CN114298834A (en) Personal credit evaluation method and system based on self-organizing mapping network
CN116206327A (en) Image classification method based on online knowledge distillation
CN114037001A (en) Mechanical pump small sample fault diagnosis method based on WGAN-GP-C and metric learning
CN113569920A (en) Second neighbor anomaly detection method based on automatic coding
De Araujo et al. Automatic cluster labeling based on phylogram analysis
CN116245645A (en) Financial crime partner detection method based on graph neural network
CN115049472B (en) Unsupervised credit card anomaly detection method based on multidimensional feature tensor
Wu et al. Self-organizing-map based clustering using a local clustering validity index
CN114298854A (en) Weakly supervised user identity linking method combining learning representation and alignment
Singh et al. Implication of Mathematics in Data Science Technology Disciplines
Sassi et al. A methodology using neural network to cluster validity discovered from a marketing database
CN107169410A (en) The structural type rarefaction representation sorting technique based on LBP features for recognition of face

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Wang Wei

Inventor after: Zhu Tengwei

Inventor after: Huang Junheng

Inventor after: Wang Bailing

Inventor after: Xin Guodong

Inventor after: Liu Yang

Inventor before: Zhu Tengwei

Inventor before: Wang Wei

Inventor before: Huang Junheng

Inventor before: Wang Bailing

Inventor before: Xin Guodong

Inventor before: Liu Yang

CB03 Change of inventor or designer information
GR01 Patent grant
GR01 Patent grant