CN113362160A

CN113362160A - Federal learning method and device for credit card anti-fraud

Info

Publication number: CN113362160A
Application number: CN202110635863.6A
Authority: CN
Inventors: 胡凯; 吴佳胜; 陆美霞; 李姚根; 徐露娟; 夏旻
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2021-06-08
Filing date: 2021-06-08
Publication date: 2021-09-07
Anticipated expiration: 2041-06-08
Also published as: CN113362160B

Abstract

The invention discloses a federal learning method and a device for anti-fraud of a credit card, wherein the method comprises the following steps: building a partial graph convolutional neural network model corresponding to K federal learning participants with different fraud classes; carrying out federal learning training by using a local graph convolutional neural network model; the method comprises the following steps that an attention mechanism is adopted to improve the aggregation process of the federal learning parameters, so that each partial graph convolutional neural network model has adaptive weight to aggregate; and outputting a global graph convolution neural network model, wherein the global graph convolution neural network model is used for processing the imported user data and identifying the corresponding fraud category. Aiming at the existing problems of the existing credit card fraud assessment method and the classical federal learning algorithm, the invention provides a federal learning algorithm which is suitable for non-European space data and the personalized characteristics of participants to process financial data and carry out credit card anti-fraud judgment.

Description

Federal learning method and device for credit card anti-fraud

Technical Field

The invention relates to the technical field of credit card anti-fraud, in particular to a federal learning method and a device for credit card anti-fraud.

Background

With the rapid development of the internet, the financial technology based on the artificial intelligence technology has a profound influence on the consumption behavior of people. But the financial data often relate to privacy, and data of financial institutions such as different banks, loan institutions and the like cannot be directly shared, so that a data island is formed. However, if the artificial intelligence algorithm is to achieve higher precision, a large amount of data support is needed, and the algorithm model trained by a single data owner locally and independently cannot accurately evaluate whether the credit card is fraudulent.

Federal learning is used as a distributed machine learning/deep learning framework for protecting data privacy, and a good solution can be provided for the problems of data isolated island, serious data discretization, data isomerism, unbalanced data distribution and the like. At the present stage, machine learning and deep learning are also successful in various fields, and a foundation is laid for the Federal learning algorithm model to obtain better performance. The existing federal learning algorithm only carries out averaging processing on parameters of each local model, and firstly, personalization of each local model is not considered (for example, for an anti-fraud analysis system, user data characteristics of different financial institutions in a sample are not consistent, and the whole amount of loan is also different according to different regional economic levels), and the influence of different gravity centers of samples of characteristics caused by different environments of each client cannot be dealt with; secondly, the fact that most data in the actual environment are in non-Euclidean space, such as the association between users, financial knowledge maps and the like, is not considered, because the evaluation standards of the non-Euclidean space data are inconsistent, the data structure is irregular, and a high-performance model is difficult to obtain by the combined training of the data. Such a problem causes a drawback that learning is not accurate enough.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a federal learning method for anti-fraud of a credit card, and provides a federal learning algorithm which is suitable for non-European space data and personalized characteristics of participants to process financial data and judge anti-fraud of the credit card aiming at the prior problems of the prior credit card fraud assessment method and the prior classic federal learning algorithm.

In order to achieve the purpose, the invention adopts the following technical scheme:

in a first aspect, an embodiment of the present invention provides a federal learning method for anti-fraud of a credit card, where the federal learning method includes the following steps:

s1, building partial graph convolutional neural network models corresponding to K federal learning participants with different fraud categories; the local undirected graph structure data owned by each participant is G_i(V, E, A) (i ∈ K), where the set of nodes in the graph structure is V_i∈V，v_iThe feature on a node is x_iBelongs to X, each node comprises a plurality of key characteristic information including user information, loan amount, deposit amount and credit investigation data, and the edge set between the nodes is e_i,j＝(v_i,v_j) E belongs to E; a represents an adjacency matrix and defines the interconnection relationship among nodes; the fraud category comprises three types of card theft fraud, virtual application fraud and non-fraud;

s2, carrying out federal learning training by using a local graph convolutional neural network model; the method comprises the following steps that an attention mechanism is adopted to improve the aggregation process of the federal learning parameters, so that each partial graph convolutional neural network model has adaptive weight to aggregate;

and S3, outputting a global graph convolution neural network model, wherein the global graph convolution neural network model is used for processing the imported user data and identifying the corresponding fraud category.

Optionally, the adjacency matrix includes a region adjacency matrix for expressing a region distance or a relationship and an attribute adjacency matrix for expressing whether the user-related information is similar.

Optionally, in step S1, the process of building the partial graph convolutional neural network model corresponding to the K federate learning participants with different fraud classes includes the following steps:

s11, carrying out normalization preprocessing on the node data by adopting the following formula to obtain data to be processed:

wherein x_iFor the characteristic raw data of the respective nodes,

is the characteristic mean, mu is the variance

S12, building an embedding layer, and embedding each node in the graph by a graph embedding method according to the following formula:

wherein N is the number of graph nodes, the superscript 0 represents the 0 th layer, namely the input layer, h represents the characteristic vector of each node, and omega represents the trainable weight matrix of the corresponding layer;

s13, building a graph convolution layer, aggregating the adjacent node characteristics according to the following formula, and updating the characteristic vector of the node:

where the superscript l denotes the number of layers,

adding an identity matrix to the original adjacency matrix to obtain a new adjacency matrix, so as to contain the self node, when λ is 1, the complete self node is contained,

obtaining a degree matrix according to a new adjacent matrix, wherein sigma represents an activation function;

s14, adding an attention mechanism module; in aggregation, different nodes are assigned different weights according to the following formula:

wherein

A feature vector representing the i-th node in the l-th network,

to represent

Of a neighboring node, W^lRepresenting a feature vector dimension transformation matrix, att () representing an attention coefficient calculation function, i.e. calculating a correlation coefficient; transformed features of two neighboring nodes

Matrix transverse splicing and trainable parameters

And performing dot product operation to form a single-layer perceptron of which the hidden layer only has one neuron, inputting the characteristics of the spliced nodes, and outputting the similarity between the two nodes. The symbol represents the dot product operation, and the symbol | represents the matrix transverse splicing;

s15, normalizing the attention coefficient using the softmax function:

wherein

The serial number of the neighbor node of the i node;

s16, distributing attention weight, calculating and averaging the attention coefficients for multiple times by using a formula (4) for multiple times, accumulating the coefficients, averaging to obtain a final attention coefficient, adding the final attention coefficient into a graph convolution network, and modifying a feature vector updating formula to obtain:

where num _ att represents the number of attention coefficient calculations.

Alternatively, in step S14, the attention mechanism only works within a first-order neighbor node, i.e., only node pairs with directly connected edges are considered.

Optionally, in step S2, the process of performing federal learning training using the partial graph convolutional neural network model includes the following steps:

s21, initializing model parameters of the global graph convolutional neural network model;

s22, randomly selecting the federal learning participants according to the following formula:

wherein num _ fed is the number of the participants in the federal study, the federal study has K participants, the proportion of the participants participating in the calculation in each round is C, and the symbol

Is rounded down, max () is taken to be the maximum;

s23, downloading the to-be-learned parameters initialized by the global graph convolutional neural network model by the local graph convolutional neural network model;

s24, according to the downloaded parameters to be learned, the local graph convolution neural network model starts to train:

s241, setting a loss function, and updating model parameters of the local graph convolution neural network model by a batch stochastic gradient descent method according to the following formula:

wherein W represents the parameter to be learned in each partial graph convolutional neural network, eta represents the learning rate,

the loss-expressing function is used to calculate the difference between the predicted value of the neural network output of the graph and the true label,

it is shown that the partial derivative is taken,

the output result of the last layer of the graph convolution neural network is obtained;

s242, carrying out federal attention mechanism calculation on all the partial graph convolutional neural network models; and performing attention mechanism calculation on the kth local graph convolutional neural network model, uploading the calculated attention weight coefficient to the global graph convolutional neural network model and aggregating the calculated attention weight coefficient with other local graph convolutional neural network models:

wherein

An attention weight coefficient representing the kth local graph convolutional neural network model of the l layer, an

att () represents an attention mechanism calculation function, calculates a correlation between the local graph convolutional neural network model and the global graph convolutional neural network model,

trainable parameters, w, representing the kth local graph convolution neural network model at the l-th layer^lTrainable parameters representing a first layer global graph convolutional neural network model;

s25, updating model parameters of the global graph convolutional neural network model, uploading the calculated attention weight coefficients of each local graph convolutional neural network model and the calculated parameters of the local graph convolutional neural network model to the global graph convolutional neural network model for aggregation:

wherein

Representing the attention weight coefficient assigned to the kth participant model at time t,

and the l-th layer parameters of the global graph convolutional neural network model after the t +1 time aggregation are represented.

In a second aspect, an embodiment of the present invention provides a federal learning apparatus for credit card fraud prevention, where the apparatus includes:

the local model building module is used for building local graph convolutional neural network models corresponding to K federal learning participants with different fraud categories; the local undirected graph structure data owned by each participant is G_i(V, E, A) (i ∈ K), where the set of nodes in the graph structure is V_i∈V，v_iThe feature on a node is x_iBelongs to X, each node comprises a plurality of key characteristic information including user information, loan amount, deposit amount and credit investigation data, and the edge set between the nodes is e_i,j＝(v_i,v_j) E belongs to E; a represents an adjacency matrix and defines the interconnection relationship among nodes; the fraud category comprises three types of card theft fraud, virtual application fraud and non-fraud;

the federal learning training module is used for performing federal learning training by using a local graph convolutional neural network model; the method comprises the following steps that an attention mechanism is adopted to improve the aggregation process of the federal learning parameters, so that each partial graph convolutional neural network model has adaptive weight to aggregate;

and the global graph convolutional neural network model is used for processing the imported user data and identifying corresponding fraud categories.

The invention has the beneficial effects that:

the invention takes the individuation of each participant into consideration, introduces the attention mechanism, and in deep learning, the attention mechanism can emphasize and highlight the characteristics of the participant sample, thereby taking the individuation problem of the participants into consideration. According to the method, the data of the non-European space can be considered, the relevance among the user data is fully utilized, the training precision of the federal learning model can be well improved, and the financial fraud assessment accuracy is further improved.

Drawings

Fig. 1 is a schematic diagram of a federated learning framework based on a graph convolutional neural network according to an embodiment of the present invention.

Fig. 2 is a schematic structural diagram of a graph convolution neural network according to an embodiment of the present invention.

Fig. 3 is a schematic diagram illustrating an attention mechanism of the neural network according to an embodiment of the present invention.

FIG. 4 is a schematic diagram of a neural network aggregation with attention mechanism according to an embodiment of the present invention.

Fig. 5 is a flowchart of a federal learning method for credit card fraud prevention according to an embodiment of the present invention.

Detailed Description

The present invention will now be described in further detail with reference to the accompanying drawings.

It should be noted that the terms "upper", "lower", "left", "right", "front", "back", etc. used in the present invention are for clarity of description only, and are not intended to limit the scope of the present invention, and the relative relationship between the terms and the terms is not limited by the technical contents of the essential changes.

Example one

Fig. 5 is a flowchart of a federal learning method for credit card fraud prevention according to an embodiment of the present invention. The present embodiment is applicable to the case where the credit card fraud category is identified by a device such as a server, and the method may be performed by a federal learning apparatus for credit card fraud prevention, which may be implemented in software and/or hardware, and may be integrated in an electronic device, such as an integrated server device.

The federal learning method is mainly used for identifying two most common credit card fraud types, wherein the type 1 is card theft fraud: act of surrendering or embezzlement of the lost credit card for transaction; type 2 is virtual application fraud: the applicant uses false information to apply credit cards and avoids the audit of the card issuing organization. Therefore, the objective of the federal learning method proposed in this embodiment is to improve the accuracy of credit card fraud three classification tasks (type 1, type 2, non-fraud) with limited samples and protection of private data of the parties.

Assuming that the federal study has K participants, the local undirected graph structure data owned by each participant is G_i(V, E, A) (i ∈ K), where the set of nodes in the graph structure is V_i∈V，v_iThe feature on a node is x_iBelongs to X, each node comprises 20 types of key characteristic information such as user information, loan amount, deposit amount, credit investigation data and the like, and the edge set between the nodes is e_i,j＝(v_i,v_j) E, for example, the user name of the node i is called zhang san, the nodes connected with the node i by the edge have certain correlation, such as related companies, the same contact way, and the like, a represents an adjacency matrix, defines the interconnection relationship between the nodes, and is divided into two types of adjacency matrices according to different connection relationships, as shown below.

Region adjacency matrix

Wherein GD is a regionEnglish acronym for Distance geographic Distance. According to whether the living addresses provided by the user nodes i and j are adjacent, if the linear distance of the living location is less than M kilometers, the living addresses are judged to be adjacent, and because different users with similar distances are close to each other, important consideration is needed. Or if the prior knowledge can know that the relatives exist among the users, the users are also judged to be in the adjacent relationship.

Attribute adjacency matrix

Where FD is an English acronym for Feature Distance. According to whether the user-related information provided by the user nodes i and j is similar, for example, the contact telephone of different users is the same, and the working units are the same.

In the training process, the proportion of participants participating in calculation in each round is C, the number of times of completely training respective local data of each participant in each round is Epoch, and the minimum batch data volume B is used for updating the participant model.

Referring to fig. 5, step 1: and building a local graph convolution neural network (GCN) model of the federal learning participator. The innovation point of the step 1 is that the graph convolution neural network is set as a local model of federal learning, irregular data can be excellently processed according to the characteristic that the graph convolution network can, and therefore the problem that various non-European space data cannot be well utilized by a conventional deep learning network in real life is solved, and the specific steps are shown in steps 1.2-1.4.

Step 1.1: and carrying out normalization pretreatment on the node data to obtain data to be processed. Each node v_iCorrespond to respective features x_iHowever, the range difference of the original values may be very largeIn many cases, the objective function may not work properly, and also to accelerate the convergence speed of the gradient descent. The invention normalizes the original data by mean variance normalization.

Wherein x_iFor the characteristic raw data of the respective nodes,

μ is the variance, which is the feature mean.

Step 1.2: building an embedding layer, and embedding and representing each node in the graph by a graph embedding method, as shown in formula (2):

wherein N is the number of nodes of a graph, the superscript 0 represents the 0 th layer, namely the input layer, h represents the feature vector of each node, and omega represents the trainable weight matrix of the corresponding layer.

Step 1.3: and (4) building a graph convolution layer, as shown in a formula (3), aggregating the characteristics of adjacent nodes, and updating the characteristic vectors of the nodes to achieve the purpose of extracting the characteristics.

Where the superscript l denotes the number of layers,

is based on a new adjacency matrix degree matrix, and sigma represents the ReLU activation functionAnd (4) counting. Fig. 2 is a schematic structural diagram of a graph convolution neural network according to an embodiment of the present invention.

Step 1.4: and an attention mechanism module is added, and the adjacent nodes with large influence are focused. Different nodes are assigned different weights when aggregated, as shown in equation (4).

Wherein

A feature vector representing the i-th node in the l-th network,

to represent

Of a neighboring node, W^lRepresenting a feature vector dimension transformation matrix, att () representing an attention coefficient calculation function, namely calculating a correlation coefficient, firstly transforming the features of two neighboring nodes

Matrix transverse splicing and trainable parameters

And performing dot product operation to form a single-layer perceptron of which the hidden layer only has one neuron, inputting the characteristics of the spliced nodes, and outputting the similarity between the two nodes. The symbol represents the dot product operation, and the symbol | | | represents the matrix horizontal concatenation. To reduce the amount of computation, the attention mechanism only works within a first order neighbor node, i.e., only node pairs with directly connected edges are considered.

Step 1.4.1: and normalizing the attention coefficient to facilitate the application of the attention coefficient to each node, as shown in formula (5). Normalization was performed using the softmax function.

Wherein

The sequence number of the neighbor node of the inode. The softmax function is well known in the art and will not be described in detail herein. Fig. 3 is a schematic diagram illustrating an attention mechanism of the neural network according to an embodiment of the present invention.

Step 1.4.2: and (3) allocating attention weight, using formula (4) for multiple times, namely performing multiple attention coefficient calculation averaging, accumulating the coefficients, averaging, adding the accumulated coefficients into a graph convolution network, and modifying the feature vector updating formula in formula (3) as shown in formula (6).

The equation (6) is different from the equation (3) in that an adjacency matrix is used when neighbor node features are aggregated in the equation (3), the same weight coefficient is distributed but cannot be changed, and the equation (6) distributes different weight coefficients for different neighbor nodes, so that the correlation among the nodes is better considered. FIG. 4 is a schematic diagram of a neural network aggregation with attention mechanism according to an embodiment of the present invention.

Step 2: federal learning training is performed using a graph convolutional neural network (GCN). Fig. 1 is a schematic diagram of a federated learning framework based on a graph convolutional neural network according to an embodiment of the present invention. The innovation point of the step 2 is that: unlike the traditional federal learning average aggregation algorithm, which is to average and aggregate the local models of the federal learning participants in the global model with the same weight, this approach is not favorable for the individualization of the model, i.e. each local model cannot be adapted to the respective field. The patent improves the aggregation process of the federal learning parameters through an attention mechanism, so that each local model has a weight which is relatively suitable for the local model to aggregate. Thereby reducing the influence of data noise and increasing the degree of individuation. The specific steps are step 2.4

Step 2.1: initializing global graph convolutional neural network GCN _ G model parameters

Step 2.2: the federal learning participants are randomly selected as shown in equation (7).

Is rounded down, if C · K is 2.99, then

max () is taken to be maximum, max (2,1) ═ 2.

Step 2.3: and (4) downloading the parameters to be learned after the global model is initialized by the local model (the global model is consistent with the graph convolution neural network structure built in the step 1).

Step 2.4: local graph convolutional neural network GCN _ L model training

Step 2.4.1: a loss function is set and the parameters are updated by a batch stochastic gradient descent method, as shown in equation (8) below.

the loss-expressing function is used to calculate the difference between the predicted value y' of the graph neural network output and the true label y,

indicating partial derivative. At this time

Output results for the last layer of the graph convolution neural network

Step 2.4.2: federal attention mechanism calculations. Calculating GCN _ L for kth local graph convolution neural network model_kAnd performing attention mechanism operation, and calculating attention weight coefficients so as to establish a personalized model for uploading to the global model and aggregating with other local models. As shown in equation (9).

Wherein

Represents the attention weight coefficient of the kth local graph convolution network of the l layer, and

att () represents an attention mechanism calculation function, calculates the correlation between local GCN _ L and global GCN _ G,

denotes the kth local GCN _ L of the L-th layer_kTrainable parameters of, w¹Trainable parameters representing the global GCN _ G of the l-th layer. As with equation (4), the notation · represents the dot product operation, and the notation | | | represents the matrix horizontal concatenation.

Step 2.5: updating global GCN _ G model parameters, and calculating each local GCN _ L obtained in the step 2.4.2_kAttention weight coefficient in conjunction with local GCN _ L calculated in step 2.4.1_kModel parameters are uploaded into the GCN _ G global model for aggregation. As shown in equation (10).

Wherein

and the l-th layer parameters of the global model after aggregation at the t +1 moment are represented.

The invention still adopts a frame of a classical horizontal federal learning model, the innovative content of the model is mainly divided into two modules, the first module is used for improving the capability of the model for processing non-European data, the relevance between the data can be mined by utilizing a graph convolution neural network, a participator adopts the graph convolution neural network as a local model to carry out data modeling, and an attention mechanism module is used in the graph convolution neural network so as to reduce the influence of data noise. The second module provides an improved attention mechanism algorithm for relieving the problems that the average aggregation algorithm causes the model to lack individuation and the irregular data has noise, and provides a proper attention weight for each participant model, and the weight is applied to the parameters of each layer so as to improve the individuation degree of the model parameters, cause the participant models to be more suitable for respective fields and further reduce the noise influence caused by the irregular data structure to a certain degree.

Example two

The embodiment of the invention provides a federal learning device for anti-fraud of a credit card, which comprises a local model building module, a federal learning training module and a global graph convolution neural network model.

The local model building module is used for building local graph convolutional neural network models corresponding to K federal learning participants with different fraud categories; the local undirected graph structure data owned by each participant is G_i(V, E, A) (i ∈ K), where the set of nodes in the graph structure is V_i∈V，v_iThe feature on a node is x_iBelongs to X, each node comprises user information and loan amountA plurality of key characteristic information including deposit amount and credit investigation data, and the edge set between nodes is e_i,j＝(v_i,v_j) E belongs to E; a represents an adjacency matrix and defines the interconnection relationship among nodes; the fraud categories include card theft fraud, virtual application fraud and no fraud.

The federal learning training module is used for performing federal learning training by using a local graph convolutional neural network model; and improving the aggregation process of the federal learning parameters by adopting an attention mechanism, so that each partial graph convolutional neural network model has a weight adapted to the partial graph convolutional neural network model for aggregation.

Through the federal learning device in the second embodiment of the invention, the transmission object is determined by establishing the data inclusion relation of the whole application, so that the aim of identifying the credit card fraud category is achieved. The federal learning device provided by the embodiment of the invention can execute the federal learning method for anti-fraud of the credit card provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.

The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may be made by those skilled in the art without departing from the principle of the invention.

Claims

1. A federal learning method for credit card fraud prevention, comprising the steps of:

s1, building partial graph convolutional neural network models corresponding to K federal learning participants with different fraud categories; the local undirected graph structure data owned by each participant is G_i(V, E, A) (i ∈ K), where the set of nodes in the graph structure is V_i∈V，v_iThe feature on a node is x_iBelongs to X, each node comprises a plurality of key characteristic information including user information, loan amount, deposit amount and credit investigation data, and the edge set between the nodes is e_i，j＝(v_i，v_j) E belongs to E; a represents an adjacency matrix and defines the interconnection relationship among nodes; the fraud category comprises three types of card theft fraud, virtual application fraud and non-fraud;

2. The federal learning method for credit card fraud prevention as claimed in claim 1, wherein the adjacency matrix includes both a regional adjacency matrix for expressing regional distance or relationship and an attribute adjacency matrix for expressing whether the user-related information is similar.

3. The federal learning method for credit card fraud prevention as claimed in claim l, wherein in step S1, the process of building partial graph convolutional neural network models corresponding to K federal learning participants with different fraud classes includes the following steps:

wherein x_iFor the characteristic raw data of the respective nodes,

is the characteristic mean, mu is the variance

where the superscript l denotes the number of layers,

wherein

A feature vector representing the i-th node in the l-th network,

to represent

Matrix transverse splicing and trainable parameters

s15, normalizing the attention coefficient using the softmax function:

wherein

The serial number of the neighbor node of the i node;

where num _ att represents the number of attention coefficient calculations.

4. A federal learning method for credit card fraud prevention as claimed in claim 3, wherein in step S14, the attention mechanism is applied only in a first-order neighbor node, i.e., only the node pairs with directly connected edges are considered.

5. The federal learning method for credit card fraud prevention as claimed in claim 1, wherein the process of using the partial graph convolutional neural network model for federal learning training in step S2 includes the steps of:

Is rounded down, max () is taken to be the maximum;

it is shown that the partial derivative is taken,

wherein

wherein

6. A federal learning device for credit card fraud prevention, the federal learning device comprising:

the local model building module is used for building local graph convolutional neural network models corresponding to K federal learning participants with different fraud categories; the local undirected graph structure data owned by each participant is G_i(V, E, A) (i ∈ K), where the set of nodes in the graph structure is V_i∈V，v_iThe feature on a node is x_iBelongs to X, each node comprises a plurality of key characteristic information including user information, loan amount, deposit amount and credit investigation data, and the edge set between the nodes is e_i,j＝(v_i,v_j) E belongs to E; a represents an adjacency matrix and defines the interconnection relationship among nodes; the fraud categories include stolen card fraud, virtual application fraud and no fraudThree types are adopted;