Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a method and a system for discovering an abnormal key account based on a graph neural network;
the method can analyze and mine information such as abnormal financial accounts, organization key fund flows and the like from the complex financial transaction network by applying machine learning, deep learning, complex network research and combination optimization methods according to the characteristics of account transaction and fund flow aiming at abnormal financial transaction behaviors of abnormal organizations. The invention can be used for: 1) feature mining of abnormal key accounts based on abnormal transaction flow data; 2) key account discovery based on abnormal transaction flow data.
The invention also provides computer equipment and a storage medium.
Interpretation of terms:
1. the transaction account is a bank account which actively initiates a fund transfer-out action;
2. an adversary account, a bank account that receives funds.
The technical scheme of the invention is as follows:
an abnormal key account discovery method based on a graph neural network comprises the following steps:
(1) data preprocessing: sequentially performing data cleaning, key data item extraction and internal account transaction relationship organization operation on the historical transaction records of the abnormal financial accounts;
data cleansing, which means: cleaning all transaction data related to normal accounts, and only keeping historical transaction records of abnormal financial accounts of both transaction parties;
key data item extraction, which refers to: extracting transaction account, counter-party account and access sign information item data from the historical transaction records of the abnormal financial accounts;
and (3) establishing an account transaction relationship in the organization, which means that:
two data items are created for new data, i.e., intra-organization account transaction relationships: a source account and a target account; the source account refers to an account which transfers a certain amount of money from the source account in the current transaction, and the target account refers to an account which receives the amount transferred from the source account; for the historical transaction records of each abnormal financial account, if the data of the in-out mark information item is 'out', the source account is a transaction account, and the target account is an opponent account; if the data of the in-out mark information item is 'in', the source account is an opponent account, and the target account is a transaction account;
when the piece of data does not exist in the new data, adding the piece of information containing the source account and the target account into the new data; meanwhile, coding mapping is carried out on each abnormal financial account again, and the abnormal financial accounts are mapped into codes from the interval 0 to the number of the abnormal accounts;
(2) constructing a financial transaction network graph by abnormal organization;
constructing an abnormal organization financial transaction network diagram according to the account transaction relationship in the organization constructed in the step (1); in the network diagram of the abnormal organization financial transaction, nodes represent codes of abnormal financial accounts, a directed edge connecting two nodes represents that the two abnormal financial accounts have transferred the transaction, and the direction of an arrow represents the flow direction of funds;
(3) abnormally organizing key account discovery; and the discovery of the key account of the abnormal organization is realized through the trained TRGA model.
According to the invention, in the step (1), a threshold method is adopted for data cleaning, specifically: and if the absolute difference of the fund inflow and outflow times of the current transaction record is smaller than a given threshold value, determining that the current transaction record is a normal account and cleaning, otherwise, keeping the current transaction record.
According to the invention, the step (3) is preferably realized by the following steps:
3.1 constructing and training a TRGA model;
3.2 extracting the topological characteristics of the account transaction, and finding the key account of abnormal organization.
According to the optimization of the invention, the TRGA comprises an input layer, a three-way graph neural network layer, a multi-head attention mechanism layer, a linear layer and a Softmax layer which are sequentially connected;
the input of the input layer of the TRGA model is abnormal organization of the financial transaction network Graph and the one-hot characteristic X of the account node;
the three-way graph neural network layer respectively carries out feature aggregation on nodes of the abnormal organization financial transaction network graph from different angles to update node features, then the node features obtained from different layers are spliced, and information weighting is carried out on the three obtained node features through the multi-head attention mechanism layer, so that a TRGA (tree trunk genetic algorithm) model focuses on more effective node topological structure information; the TRGA integrates the output of the multi-head attention mechanism layer, namely the TRGA integrates the characteristic vectors of the nodes through the linear layer and performs data dimensionality reduction, the TRGA finally outputs a vector with the length of 2, and abnormal organization key account discovery is achieved based on the output vector.
Preferably, according to the present invention, each road map neural network layer of the TRGA model aggregates adjacent node information of the nodes independently.
According to the invention, in the TRGA model, the abnormal organization financial transaction network Graph and the one-hot characteristic X of the account node are respectively input into the three-way Graph neural network layer; in each road map neural network layer, the input of other network layers is the output of the previous network layer in the current road map neural network layer; each path of graph neural network layer finally obtains a node feature matrix with the same dimension;
according to the optimization of the invention, the first road map neural network layer of the TRGA model extracts the account node features in the abnormally organized financial transaction network map through the multi-head attention mechanism layer, specifically: the multi-head graph attention layer discovers multiple relevant features of a central node and all adjacent nodes thereof through multiple groups of independent attention mechanisms, distributes different attention weights to adjacent nodes of the central node, and learns multiple relevant features between the central node and the adjacent nodes thereof.
Further preferably, assume that the central node is viThen the central node viAnd its adjacent node vjThe complete attention weight calculation formula of (a) is shown as formula (i):
in the formula (I), the compound is shown in the specification,
is the k-th layer central node v
iAnd its adjacent node v
jThe attention weight coefficient of (a) is,
is a node v
iThe feature vector corresponding to the k-th layer,
is a node v
jW is a weight matrix, a
TFor the weight parameter, the activation function is LeakyReLU (-), N (v)
i) Represents v
iSet of adjacent nodes of v
jDenotes v
iAn adjacent node of (2);
node viThe characteristic vector of the k +1 th layer is shown as the formula (II):
in the formula (II), w
(k)Is a weight parameter of the k-th layer node feature transformation,
sigma (·) is a sigmoid activation function, | | | represents splicing operation, and final feature embedding of the first road map neural network layer is obtained by aggregating feature vectors of nodes of the first road map neural network layer.
Further preferably, the multi-head attention mechanism layer comprises a plurality of groups of self-attention mechanisms which are independently and equally distributed, and the formula of the self-attention mechanism is shown as formula (iii):
in formula (III), Q, K, V are the dot product matrix of the input node feature vector and the weight, dkFor the feature vector dimension, a self-attention mechanism calculates the association between each node feature vector and other node feature vectors, and takes context features into account well.
According to the optimization of the invention, in the other two graph neural network layers of the TRGA model, the front edge and the reverse edge of the abnormal organization financial transaction network graph are respectively regarded as two edges with different types, wherein in the second graph neural network layer, feature aggregation is carried out on a central node and a neighbor node which are connected through the front edge type by means of a graph convolution layer so as to update the characteristics of the central node; and in the third graph neural network layer, feature aggregation is carried out on the central node and the neighbor nodes which are connected through the reverse edge type by means of the graph convolution layer so as to update the characteristics of the central node.
More preferably, the calculation method of the map convolution layer is represented by formula (iv):
in the formula (IV), the compound is shown in the specification,
in the second road map neural network layer, the and node v
iThe node set connected with the outgoing edge is the node v in the third route graph neural network layer
iA set of nodes connected by an incoming edge;
is the weight of the k-th layer graph neural network, c
i,rIs and node v
iThe total number of connected nodes;
information weighting is carried out on the obtained three node characteristics through a multi-head attention mechanism layer, so that the TRGA model pays attention to more effective node topological structure information; the calculation of the attention layer of the multigraph is shown in formula (V) and formula (VI):
headi=Att(QWi Q,KWi K,VWi V) (Ⅴ)
h=MultiHead(Q,K,V)=Concat(head1,...,headh)WO (Ⅵ)
head in the formulae (V) and (VI)iFor the ith head feature embedding, Att (-) is to perform an attention calculation; wi Q,Wi K,Wi V,WORespectively are weight coefficients of the neural network; concat (·) represents one vector splicing, and h is the final output vector of the TRGA model.
According to the invention, the discovery of the key account of the abnormal organization is realized based on the output vector, which specifically includes: when the value of the first element of the output vector is larger than the value of the second element, the account represented by the node is considered to be a non-key account, otherwise, the account represented by the node is considered to be a key account.
An abnormal key account discovery system based on a graph neural network comprises a data preprocessing module, an abnormal organization financial transaction network graph construction module and an abnormal organization key account discovery module;
the data preprocessing module is used for: sequentially performing data cleaning, key data item extraction and internal account transaction relationship organization operation on the historical transaction records of the abnormal financial accounts; the abnormal organization financial transaction network graph construction module is used for: constructing an abnormal organization financial transaction network graph according to the internal organization account transaction relationship constructed by the data preprocessing module; the abnormal organization key account discovery module is used for: and the discovery of the key account of the abnormal organization is realized through the trained TRGA model.
A computer device comprising a memory storing a computer program and a processor implementing the steps of a graph neural network based non-normal critical account discovery method when the computer program is executed.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the graph neural network-based abnormal key account discovery method.
The invention has the beneficial effects that:
1. the abnormal key account discovery method based on the graph neural network can analyze and mine information such as abnormal financial accounts, organization key fund flows and the like from a complex financial transaction network by applying machine learning, deep learning, complex network research and a combined optimization method according to the characteristics of account transaction and fund flow aiming at abnormal financial transaction behaviors of abnormal organizations.
2. The abnormal key account discovery method provided by the invention can learn the transaction relation characteristics of the account nodes in the financial transaction network in the abnormal organization based on the proposed TRGA neural network model so as to classify the account nodes, thereby discovering the key accounts of the abnormal organization and reducing the investment of labor-intensive characteristic engineering to a certain extent. By using the single type of abnormal transaction flow data and less characteristics, a good abnormal key account discovery effect can be achieved. The method can provide auxiliary study and judgment information for abnormal investigation work of related workers, improve the working efficiency and save time. With the discovery of more abnormal marking data, the classification model can be further improved, and the accuracy of the detection and identification result tends to increase.
3. The practical range of the invention comprises the characteristic mining of abnormal key accounts based on abnormal transaction running water data and the key account discovery based on the abnormal transaction running water data. Has wide application prospect.
Detailed Description
The invention is further defined in the following, but not limited to, the figures and examples in the description.
Example 1
An abnormal key account discovery method based on a graph neural network is disclosed, as shown in fig. 1 and 5, and comprises the following steps:
(1) data preprocessing: the historical transaction records of the abnormal financial accounts are sequentially subjected to operations such as data cleaning, key data item extraction, account transaction relationship construction in organization and the like;
data cleansing, which means: cleaning all transaction data related to normal accounts, and only keeping historical transaction records of abnormal financial accounts of both transaction parties;
the invention mainly focuses on financial transaction networks inside abnormal organizations, and aims to discover key accounts of the abnormal organizations from transaction behaviors among financial accounts inside the abnormal organizations, wherein the key accounts include high-level accounts (organizers and leaders) of the organizations, and purchase-applying accounts which are responsible for absorbing funds and rebate accounts which are responsible for releasing the funds. Once participating in the abnormal organization's purchase-rebate behavior, the financial account will become an abnormal financial account. Thus, the present invention recognizes that abnormally related financial transactions flow only within an abnormal organization.
Key data item extraction, which refers to: extracting transaction account, counter-party account and access sign information item data from the historical transaction records of the abnormal financial accounts;
extracting key data items, specifically: first, data denoising processing is performed to remove the transaction records lost by individual data items. And then, carrying out structural splitting on the historical transaction record in a mode of manually defining rules, and extracting data according to fields such as a transaction account, an opponent account and the like. And finally, adding an in-out mark item according to the fund flow direction relation between the transaction account and the counter-party account in the historical transaction record.
Inbound and outbound transaction information for an abnormal account is critical to critical account discovery. For a single account, the transaction characteristics of the account that the incoming and outgoing transactions constitute are reflected in the transaction relationship with other accounts. Thus, the key to building an abnormally organized financial transaction network is whether there is a transaction between accounts, and the flow of transaction amounts to the data. Therefore, the invention extracts the data of the transaction account, the opponent account and the access mark information item from the data.
And (3) establishing an account transaction relationship in the organization, which means that:
to facilitate subsequent network patterning of transactions, the present invention reconstructs the data based on the entry and exit flag information for each transaction record. The invention creates two data items for new data, i.e. an intra-organization account transaction relationship: a source account and a target account; the source account and the target account are referred to herein with respect to the direction of the flow of funds in the transaction. The source account refers to an account which transfers a certain amount of money from the source account in the current transaction, and the target account refers to an account which receives the amount transferred from the source account; for the historical transaction records of each abnormal financial account, if the data of the in-out mark information item is 'out', the source account is a transaction account, and the target account is an opponent account; if the data of the in-out mark information item is 'in', the source account is an opponent account, and the target account is a transaction account;
when the piece of data does not exist in the new data, adding the piece of information containing the source account and the target account into the new data; meanwhile, coding mapping is carried out on each abnormal financial account again, and the abnormal financial accounts are mapped into codes from the interval 0 to the number of the abnormal accounts;
the part of the data of the account transaction relationship in the organization finally constructed is shown in table 1:
TABLE 1
(2) Constructing a financial transaction network graph by abnormal organization;
constructing an abnormal organization financial transaction network diagram according to the account transaction relationship in the organization constructed in the step (1); in the network diagram of the abnormal organization financial transaction, nodes represent codes of abnormal financial accounts, a directed edge connecting two nodes represents that the two abnormal financial accounts have transferred the transaction, and the direction of an arrow represents the flow direction of funds;
the method specifically comprises the following steps: and setting a random number seed according to the account node code, generating a rectangular coordinate of the node according to the random number seed, and finally performing normalization processing to obtain the graph coordinate of the account node. And generating a graph coordinate aiming at each node to obtain the abnormal organization financial transaction network graph.
(3) Abnormally organizing key account discovery; and (3) realizing abnormal organization key account discovery through a trained TRGA (Three-Route Graph Attention Network) model.
Example 2
The abnormal key account discovery method based on the graph neural network is characterized by comprising the following steps of:
the method is suitable for a computing platform with the CPU version or performance not lower than intel i5 and the memory more than 4G and configured with a Linux operating system which needs to be configured with Tensorflow and Keras frames. In the above configuration, a computing platform with powerful GPU computing power is a more preferable choice for running the method.
In the step (1), a threshold method is adopted for data cleaning, and specifically the method comprises the following steps: and if the absolute difference of the fund inflow and outflow times of the current transaction record is smaller than a given threshold value, determining that the current transaction record is a normal account and cleaning, otherwise, keeping the current transaction record.
The invention adopts a threshold value method to clean data. The fund inflow times and outflow times of the key accounts and the normal accounts in the abnormal organization are greatly different, and the fund outflow times of the key accounts are obviously higher than those of the normal accounts. Thus, if the number of funds outflows of the current transaction record is less than a given threshold, then it is considered a normal account and the data set is removed. According to data statistical analysis, the fund inflow times of the key accounts are obviously smaller than the outflow times, so if the absolute difference of the fund inflow times and the fund outflow times of the current transaction records is smaller than a given threshold value, the current transaction records are determined to be normal accounts, and the data set is removed.
The concrete implementation steps of the step (3) comprise:
3.1 constructing and training a TRGA model;
for account nodes in the financial transaction network of abnormal organization, the invention abstracts the key account discovery problem of the abnormal organization into the classification problem of graph nodes;
the TRGA model is a deep learning model, so the TRGA model is based on a Pythrch deep learning framework to build a neural network. And (3) transmitting neural network hyper-parameters such as learning rate, training step length, training times and the like into the network, setting the gradient descent strategy of the optimizer as random gradient descent, setting the loss function as cross entropy, and starting the training of the TRGA model.
3.2 extracting the topological characteristics of the account transaction, and finding the key account of abnormal organization.
The TRGA comprises an input layer, a three-way graph neural network layer, a multi-head attention mechanism layer, a linear layer and a Softmax layer which are sequentially connected;
the three-way Graph neural network layer is divided into three-way Graph neural networks, the first Graph neural network layer regards Graph as a directionless Graph and carries out iterative weighting on adjacent nodes aiming at a central node, and then each layer of Graph neural network carries out feature vector processing on the nodes; in the second road graph neural network layer, feature aggregation is carried out on the central node and the adjacent nodes which are connected through the front edge type by means of the graph convolution layer so as to update the characteristics of the central node; in the third graph neural network layer, feature aggregation is carried out on the central node and the adjacent nodes which are connected through the reverse edge type by means of the graph convolution layer so as to update the characteristics of the central node. The multi-head attention mechanism layer is composed of 8 independent self-attention layers which are distributed in the same way, and output characteristic embedding is converted into a final output vector h through splicing.
Fig. 3 and 6 are neural network structure diagrams of the TRGA model, and one-hot features of the directed financial transaction network diagram and the nodes are independently provided to a three-way graph neural network layer. That is, the inputs to the first layer network of the three-way network layer of the model are identical. In each path, the input of other network layers is provided by the output of the previous network layer in the current path. And finally, each path obtains a node characteristic matrix with the same dimension.
From the overall architecture, the input of the input layer of the TRGA model is the abnormal organization of the financial transaction network Graph and the one-hot characteristic X of the account node;
the one-hot feature X of the account node is obtained in the following mode: one-hot encoding uses an N-bit status register to encode N states, each having its own independent register bit and only One of which is active at any One time. For example, the one-hot characteristics of nodes A, B and C are [1,0,0], [0,1,0], [0,0 and 1], respectively.
The core part of the TRGA model is divided into three paths, the three-path graph neural network layer carries out feature aggregation on nodes of the abnormal organization financial transaction network graph from different angles respectively to update node features, then the node features obtained from different layers are spliced, and information weighting is carried out on the three obtained node features through the multi-head attention mechanism layer, so that the TRGA model focuses on more effective node topological structure information; the TRGA integrates the output of the multi-head attention mechanism layer, namely the TRGA integrates the characteristic vectors of the nodes through the linear layer and performs data dimensionality reduction, the TRGA finally outputs a vector with the length of 2, and abnormal organization key account discovery is achieved based on the output vector.
Each road graph neural network layer of the TRGA model independently aggregates adjacent node information of the nodes.
In the TRGA model, abnormally organizing a financial transaction network Graph and one-hot characteristics X of account nodes are respectively input into a three-way Graph neural network layer; that is, the inputs to the first layer network of the three-way network layer of the model are identical. In each road map neural network layer, the input of other network layers is the output of the previous network layer in the current road map neural network layer; each path of graph neural network layer finally obtains a node feature matrix with the same dimension;
the first road map neural network layer of the TRGA model extracts account node characteristics in the financial transaction network map with abnormal organization through a multi-head attention mechanism layer, and specifically means that: the multi-head graph attention layer discovers multiple relevant features of a central node and all adjacent nodes thereof through multiple groups of independent attention mechanisms, distributes different attention weights to adjacent nodes of the central node, and learns multiple relevant features between the central node and the adjacent nodes thereof.
Assume a central node of viThen the central node viAnd its adjacent node vjThe complete attention weight calculation formula of (a) is shown as formula (i):
in the formula (I), the compound is shown in the specification,
is the k-th layer central node v
iAnd its adjacent node v
jThe attention weight coefficient of (a) is,
is a node v
iThe feature vector corresponding to the k-th layer,
is a node v
jW is a weight matrix, a
TFor the weight parameter, the activation function is LeakyReLU (-), N (v)
i) Represents v
iSet of adjacent nodes of v
jDenotes v
iAn adjacent node of (2);
node viThe characteristic vector of the k +1 th layer is shown as the formula (II):
in the formula (II), w
(k)Is a weight parameter of the k-th layer node feature transformation,
sigma (·) is a sigmoid activation function, | | | represents splicing operation, and final feature embedding of the first road map neural network layer is obtained by aggregating feature vectors of nodes of the first road map neural network layer.
Thus, in fig. 2, there are 4 nodes in total, 466, 497, 457 and 454, among the neighbor nodes of the central node 548. The multi-head graph attention layer in fig. 4 utilizes 8 independent attention mechanisms to learn the correlation characteristics between node 548 and its 4 neighboring nodes, which are shown by 8 dotted lines in the figure. In the calculation of each attention mechanism, the invention firstly calculates the correlation degree of the central node and the neighbor nodes thereof, and then uses the LeakyReLU function for activation. In order to better distribute the weight, the correlation degree calculated by the central node and all the neighbors is subjected to unified normalization processing by using softmax.
The lower half of fig. 4 shows an illustration of the calculation process of the attention weight of node 548 and its neighbor node 454.
The multi-head attention mechanism layer comprises a plurality of groups of self-attention mechanisms which are independently and equally distributed, and the formula of the self-attention mechanism is shown as the formula (III):
in formula (III), Q, K, V are the dot product matrix of the input node feature vector and the weight, dkFor the feature vector dimension, a self-attention mechanism calculates the association between each node feature vector and other node feature vectors, and takes context features into account well. Thereby enhancing the learning ability of the system.
In the other two graph neural network layers of the TRGA model, the front edge and the reverse edge of the abnormal organization financial transaction network graph are respectively regarded as two edges of different types, wherein in the second graph neural network layer, feature aggregation is carried out on a central node and a neighbor node which are connected through the front edge type by means of a graph convolution layer so as to update the characteristics of the central node; and in the third graph neural network layer, feature aggregation is carried out on the central node and the neighbor nodes which are connected through the reverse edge type by means of the graph convolution layer so as to update the characteristics of the central node.
The calculation method of the graph convolution layer is shown as the formula (IV):
in the formula (IV), the compound is shown in the specification,
in the second road map neural network layer, the and node v
iThe node set connected with the outgoing edge is the node v in the third route graph neural network layer
iA set of nodes connected by an incoming edge;
is the weight of the k-th layer graph neural network, c
i,rIs and node v
iThe total number of connected nodes;
for each graph neural network layer, each additional network layer is superposed, and information of neighbor nodes of higher order is aggregated. After a certain account node in the financial transaction network of the abnormal organization is subjected to feature aggregation of a three-way graph neural network layer of the TRGA, 3 feature vectors Q, K and V of the node are obtained. Information weighting is carried out on the obtained three node characteristics through a multi-head attention mechanism layer, so that the TRGA model pays attention to more effective node topological structure information; the calculation of the attention layer of the multigraph is shown in formula (V) and formula (VI):
headi=Att(QWi Q,KWi K,VWi V) (Ⅴ)
h=MultiHead(Q,K,V)=Concat(head1,...,headh)WO (Ⅵ)
head in the formulae (V) and (VI)iFor the ith head feature embedding, Att (-) is to perform an attention calculation; wi Q,Wi K, Wi V,WORespectively are weight coefficients of the neural network; concat (·) represents one vector splicing, and h is the final output vector of the TRGA model.
Based on the output vector, the discovery of the key account of the abnormal organization is realized, which specifically comprises the following steps: when the value of the first element of the output vector is larger than the value of the second element, the account represented by the node is considered to be a non-key account, otherwise, the account represented by the node is considered to be a key account.
Example 3
An abnormal key account discovery system based on a graph neural network is used for realizing the abnormal key account discovery method based on the graph neural network in the embodiment 1 or 2, and comprises a data preprocessing module, an abnormal organization financial transaction network graph construction module and an abnormal organization key account discovery module;
the historical transaction records of the abnormal financial accounts can obtain the transaction relationship of the accounts in the organization after the preprocessing work such as data cleaning and the like. The data preprocessing module is used for: the historical transaction records of the abnormal financial accounts are sequentially subjected to operations such as data cleaning, key data item extraction, account transaction relationship construction in organization and the like; the abnormal organization financial transaction network graph construction module is used for: constructing an abnormal organization financial transaction network graph according to the internal organization account transaction relationship constructed by the data preprocessing module; the abnormal organization key account discovery module is used for: and (3) realizing abnormal organization key account discovery through a trained TRGA (Three-Route Graph Attention Network) model.
Example 4
A computer device comprising a memory storing a computer program and a processor implementing the steps of the graph neural network based abnormal key account discovery method of embodiments 1 or 2 when the processor executes the computer program.
Example 5
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the graph neural network-based abnormal key account discovery method of embodiment 1 or 2.