CN113362160B

CN113362160B - Federal learning method and device for credit card anti-fraud

Info

Publication number: CN113362160B
Application number: CN202110635863.6A
Authority: CN
Inventors: 胡凯; 吴佳胜; 陆美霞; 李姚根; 徐露娟; 夏旻
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2021-06-08
Filing date: 2021-06-08
Publication date: 2023-08-22
Anticipated expiration: 2041-06-08
Also published as: CN113362160A

Abstract

The invention discloses a federal learning method and a device for credit card anti-fraud, wherein the method comprises the following steps: constructing local graph convolutional neural network models corresponding to K federal learning participants with different fraud categories; performing federal learning training by using a local graph convolution neural network model; the method comprises the steps of improving the aggregation process of federal learning parameters by adopting an attention mechanism, so that each partial graph convolutional neural network model has weight matched with the partial graph convolutional neural network model for aggregation; and outputting a global graph convolutional neural network model, wherein the global graph convolutional neural network model is used for processing the imported user data and identifying the corresponding fraud category. Aiming at the existing problems of the existing credit card fraud assessment method and classical federal learning algorithm, the invention provides the federal learning algorithm which is suitable for non-European space data and individuation characteristics of participants to process financial data and carry out credit card fraud prevention judgment.

Description

Federal learning method and device for credit card anti-fraud

Technical Field

The invention relates to the technical field of credit card anti-fraud, in particular to a federal learning method and device for credit card anti-fraud.

Background

With the rapid development of the internet, financial science and technology based on artificial intelligence technology has profound effects on people's consumption behavior. However, the financial data often relates to privacy, and the data of different banks, loan institutions and other financial institutions cannot be directly shared, so that a data island is formed. However, to achieve higher accuracy with artificial intelligence algorithms, a large amount of data support is required, and a single data owner's locally independently trained algorithm model cannot accurately assess whether credit card fraud is occurring.

Federal learning is used as a distributed machine learning/deep learning framework for protecting data privacy, and can provide a good solution for the problems of data island, serious data discretization, data isomerism, unbalanced data distribution and the like. At present, machine learning and deep learning have been greatly successful in various fields, and a foundation is laid for the federal learning algorithm model to obtain better performance. The existing federal learning algorithm only performs an averaging process on the parameters of each local model, firstly, individuation of each local model is not considered (for example, in a sample, user data characteristics of different financial institutions are inconsistent, and the whole amount of loans is different according to different regional economy levels in a sample, so that the influence of different sample centers of gravity of the characteristics caused by different environments of each client cannot be dealt with; secondly, the fact that most of data in an actual environment are in non-Euclidean space, such as correlation among users, financial knowledge graph and the like, is not considered, and because evaluation standards of non-Euclidean space data are inconsistent, data structures are irregular, and high-performance models are difficult to obtain by jointly training such data. Such a problem causes a disadvantage of being inaccurate in learning.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a federal learning method for credit card anti-fraud, and aiming at the existing problems of the existing credit card fraud assessment method and classical federal learning algorithm, the invention provides a federal learning algorithm which is suitable for non-European space data and individualized characteristics of participants to process financial data and judge credit card anti-fraud.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

in a first aspect, an embodiment of the present invention provides a federal learning method for credit card anti-fraud, the federal learning method including the steps of:

s1, building local graph convolutional neural network models corresponding to K federal learning participants with different fraud categories; each participant has a local undirected graph structure data G _i (V, E, A) (i.epsilon.K), where the set of nodes in the graph structure is V _i ∈V，v _i The feature on the node is x _i E X, each node contains multiple key characteristic information including user information, loan amount, deposit amount and credit data, and the edge set between nodes is e _i,j ＝(v _i ,v _j ) E is E; a represents an adjacency matrix, and defines the interconnection relation between nodes; the fraud category comprises three types of fraudulent use of stolen cards, virtual application fraud and no fraud;

s2, performing federal learning training by using a local graph convolution neural network model; the method comprises the steps of improving the aggregation process of federal learning parameters by adopting an attention mechanism, so that each partial graph convolutional neural network model has weight matched with the partial graph convolutional neural network model for aggregation;

and S3, outputting a global graph convolutional neural network model, wherein the global graph convolutional neural network model is used for processing the imported user data and identifying the corresponding fraud category.

Optionally, the adjacency matrix comprises two types of region adjacency matrix for expressing region distance or relatives and attribute adjacency matrix for expressing whether user related information is similar.

Optionally, in step S1, the process of building the local graph convolutional neural network model corresponding to the K federal learning participants with different fraud categories includes the following steps:

s11, carrying out normalization pretreatment on node data by adopting the following formula to obtain data to be processed:

wherein x is _i For the characteristic raw data of each node,is the characteristic mean, mu is the variance

S12, building an embedding layer, and carrying out embedding representation on each node in the graph by a graph embedding method according to the following formula:

wherein N is the number of nodes of a graph, the superscript 0 represents the 0 th layer, namely the input layer, h represents the feature vector of each node, and ω represents the trainable weight matrix of the corresponding layer;

s13, constructing a graph convolution layer, aggregating adjacent node characteristics according to the following formula, and updating characteristic vectors of nodes:

wherein the superscript i indicates the number of layers,adding an identity matrix to the original adjacent matrix to obtain a new adjacent matrix, thereby comprising self nodes, and when λ=1, comprising complete self nodes, +.>A degree matrix is obtained according to the new adjacent matrix, and sigma represents an activation function;

s14, adding an attention mechanism module; during aggregation, different weights are allocated to different nodes according to the following formula:

wherein the method comprises the steps ofFeature vector representing the ith node in the layer 1 network, +.>Representation->W is adjacent to the adjacent node of (a) ^l Representing a feature vector dimension transformation matrix, att () representing an attention coefficient calculation function, i.e., calculating a correlation coefficient; characteristics of two neighboring nodes after transformation +.>Matrix transverse splicing and trainable parameters +.>And performing dot product operation to form a single-layer perceptron with a hidden layer and only one neuron, wherein the input is the characteristics of the spliced nodes, and the output is the similarity between the two nodes. Symbol represents dot product operation, and symbol represents transverse splicing of the matrix;

s15, normalizing the attention factor using a softmax function:

wherein the method comprises the steps ofThe sequence number of the neighbor node of the i node;

s16, attention weights are distributed, multiple times of attention coefficient calculation and averaging are carried out by using the formula (4), the coefficients are accumulated and averaged to obtain a final attention coefficient, the final attention coefficient is added into a graph convolution network, and a feature vector updating formula is modified to obtain the final attention coefficient:

where num_att represents the number of attention coefficient calculations.

Optionally, in step S14, the attention mechanism is only applied within the first order neighbor nodes, i.e. only node pairs with directly connected edges are considered.

Optionally, in step S2, the process of performing federal learning training using the local graph roll-up neural network model includes the following steps:

s21, initializing model parameters of a global graph convolutional neural network model;

s22, randomly selecting a federal learning participant according to the following formula:

wherein num_fed is the number of federal learning participants, the federal learning has K participants, the proportion of each participant participating in calculation is C, and the symbol is givenMeaning of (c) is rounding down, max () is maximum among them;

s23, downloading parameters to be learned after initializing a global graph convolutional neural network model by the local graph convolutional neural network model;

s24, training the local graph convolution neural network model according to the downloaded parameters to be learned:

s241, setting a loss function, and updating model parameters of the local graph convolutional neural network model by a batch random gradient descent method according to the following formula:

wherein W represents the parameter to be learned in each local graph convolutional neural network, eta represents the learning rate,representing a loss function for calculating a gap between a predicted value of the neural network output and the real label,/>Representing the partial derivative,/->The neural network is the most rolled up for the graphOutputting a result of the latter layer;

s242, performing federal attention mechanism calculation on all local graph convolutional neural network models; the attention mechanism calculation is carried out on the kth local graph convolutional neural network model, and the attention weight coefficient obtained through calculation is uploaded to the global graph convolutional neural network model and is aggregated with other local graph convolutional neural network models:

wherein the method comprises the steps ofAttention weight coefficient representing the first layer kth partial graph convolutional neural network model, andatt () represents an attention mechanism calculation function, calculates the correlation between the local graph convolutional neural network model and the global graph convolutional neural network model,/o>Trainable parameters representing a layer i kth local graph convolutional neural network model, w ^l Trainable parameters representing a layer l global graph convolutional neural network model;

s25, updating model parameters of the global graph convolutional neural network model, and uploading the calculated attention weight coefficient of each local graph convolutional neural network model and the calculated model parameters of the local graph convolutional neural network model to the global graph convolutional neural network model for aggregation:

wherein the method comprises the steps ofIndicating time tAttention weighting factors assigned to the kth participant model, +.>And (3) representing the first layer parameters of the global graph convolutional neural network model after aggregation at the time t+1.

In a second aspect, an embodiment of the present invention proposes a federal learning apparatus for credit card anti-fraud, the federal learning apparatus comprising:

the local model construction module is used for constructing local graph convolutional neural network models corresponding to K federal learning participants with different fraud categories; each participant has a local undirected graph structure data G _i (V, E, A) (i.epsilon.K), where the set of nodes in the graph structure is V _i ∈V，v _i The feature on the node is x _i E X, each node contains multiple key characteristic information including user information, loan amount, deposit amount and credit data, and the edge set between nodes is e _i,j ＝(v _i ,v _j ) E is E; a represents an adjacency matrix, and defines the interconnection relation between nodes; the fraud category comprises three types of fraudulent use of stolen cards, virtual application fraud and no fraud;

the federal learning training module is used for performing federal learning training by using the local graph convolution neural network model; the method comprises the steps of improving the aggregation process of federal learning parameters by adopting an attention mechanism, so that each partial graph convolutional neural network model has weight matched with the partial graph convolutional neural network model for aggregation;

the global graph convolutional neural network model is used for processing the imported user data and identifying the corresponding fraud category.

The beneficial effects of the invention are as follows:

the invention considers individuation of each participant, introduces an attention mechanism, and in deep learning, the attention mechanism can emphasize and highlight the characteristics of the participant sample, thereby considering the individuation problem of the participant, so the invention can improve the anti-fraud evaluation precision of the local model of each federal learning participant according to the characteristics of the respective user data. The invention can consider the data of non-European space, fully utilize the relevance between the user data, and can well improve the training precision of the federal learning model, thereby improving the evaluation accuracy of financial fraud.

Drawings

Fig. 1 is a schematic diagram of a federal learning framework based on a graph convolutional neural network in an embodiment of the present invention.

Fig. 2 is a schematic diagram of a graph roll-up neural network according to an embodiment of the invention.

Fig. 3 is a schematic diagram of the attention mechanism of the neural network according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of aggregation of the neural network with attention mechanism according to an embodiment of the present invention.

FIG. 5 is a flow chart of a federal learning method for credit card anti-fraud in accordance with an embodiment of the present invention.

Detailed Description

The invention will now be described in further detail with reference to the accompanying drawings.

It should be noted that the terms like "upper", "lower", "left", "right", "front", "rear", and the like are also used for descriptive purposes only and are not intended to limit the scope of the invention in which the invention may be practiced, but rather the relative relationship of the terms may be altered or modified without materially altering the teachings of the invention.

Example 1

FIG. 5 is a flow chart of a federal learning method for credit card anti-fraud in accordance with an embodiment of the present invention. The present embodiment is applicable to the case of identifying the category of credit card fraud by a device such as a server, the method may be performed by federal learning means for credit card anti-fraud, which means may be implemented in software and/or hardware and may be integrated in an electronic device, such as an integrated server device.

The federal learning method is mainly used for identifying two most common credit card fraud types, wherein the type 1 is stolen card fraud: act of transacting by either fraudulent use or theft of the lost credit card; type 2 is virtual application fraud: the applicant uses the false information to apply for the credit card, and avoids the audit of a card issuing mechanism. Therefore, the federal learning method proposed in this embodiment aims to improve the accuracy of the credit card fraud three classification tasks (type 1, type 2, non-fraud) on the premise of limited samples and protecting private data of each party.

Assuming federal learning to have K participants, each participant has a local undirected graph structure data of G _i (V, E, A) (i.epsilon.K), where the set of nodes in the graph structure is V _i ∈V，v _i The feature on the node is x _i E X, each node contains 20 key characteristic information such as user information, loan amount, deposit amount, credit data and the like, and the edge set between the nodes is e _i,j ＝(v _i ,v _j ) E, e.g. the user name of node i is Zhang three, the nodes connected with the edge of node i have a certain correlation, such as related companies, the same contact ways, etc., a represents an adjacency matrix, and the interconnection relationship between the nodes is defined, and the nodes are divided into two types of adjacency matrices according to different connection relationships, as follows.

Regional adjacency matrixWhere GD is an acronym for english for regional distance Geographical Distance. And judging whether the existing addresses provided by the user nodes i and j are adjacent or not according to the condition that the straight line distance of the existing places is smaller than M kilometers, and judging that the existing addresses are adjacent or not, wherein the different users with similar distances are close in relation, and important consideration is needed. Or if the priori knowledge can know that the relative relationship exists between the users, the users are also judged to be adjacent.

Attribute adjacency matrixWhere FD is the acronym for Feature Distance. According to the users provided by user nodes i and jWhether the related information is similar, such as the contact phones of different users are the same, and the working units are the same.

The proportion of each round of participants participating in calculation in the training process is C, the number of times of each round of complete training of local data of each participant is Epoch, and the minimum batch of data quantity B used for updating the participant model is shown in the graph theory if no special description exists.

Referring to fig. 5, step 1: a federal learning participant local graph roll-up neural network (GCN) model is built. The innovation point of the step 1 is that the graph convolution neural network is set as a local model of federal learning, and the irregular data can be processed excellently according to the characteristic of the graph convolution network, so that the problem that various non-European space data cannot be utilized well by a conventional deep learning network in real life is solved, and the specific steps are shown in the steps 1.2-1.4.

Step 1.1: and carrying out normalization pretreatment on the node data to obtain data to be processed. Each node v _i All corresponding to the respective characteristic x _i However, the range of the original values may be quite different, and the objective function may not work properly, and also to accelerate the convergence speed of the gradient descent. The invention normalizes the original data by mean variance normalization.

Wherein x is _i For the characteristic raw data of each node,is the feature mean, μ is the variance.

Step 1.2: setting up an embedding layer, and carrying out embedding representation on each node in the graph by a graph embedding method, wherein the embedding representation is shown as a formula (2):

wherein N is the number of nodes of a graph, the superscript 0 represents the 0 th layer, namely the input layer, h represents the feature vector of each node, and ω represents the trainable weight matrix of the corresponding layer.

Step 1.3: and constructing a graph convolution layer, aggregating adjacent node characteristics as shown in a formula (3), and updating characteristic vectors of nodes to achieve the purpose of extracting the characteristics.

Wherein the superscript i indicates the number of layers,adding an identity matrix to the original adjacent matrix to obtain a new adjacent matrix, thereby comprising self nodes, and when λ=1, comprising complete self nodes, +.>Is based on a new adjacency matrix availability matrix, σ representing the ReLU activation function. Fig. 2 is a schematic diagram of a graph roll-up neural network according to an embodiment of the invention.

Step 1.4: an attention mechanism module is added, and the attention mechanism module focuses on the adjacent nodes with large influence. As shown in equation (4), different weights are assigned to different nodes during aggregation.

Wherein the method comprises the steps ofFeature vector representing the ith node in the layer 1 network, +.>Representation->W is adjacent to the adjacent node of (a) ^l Representing a feature vector dimension transformation matrix, att () representing an attention coefficient calculation function, i.e. calculating a correlation coefficient, first transforming two neighboring nodes to the feature +.>Matrix transverse splicing and trainable parameters +.>And performing dot product operation to form a single-layer perceptron with a hidden layer and only one neuron, wherein the input is the characteristics of the spliced nodes, and the output is the similarity between the two nodes. The symbol represents dot product operation, and the symbol represents matrix transverse stitching. To reduce the computational effort, the attention mechanism is only applied in first order neighbors, i.e. only node pairs with directly connected edges are considered.

Step 1.4.1: the attention coefficient is normalized, and the attention coefficient is conveniently applied to each node as shown in a formula (5). Normalization was performed using the softmax function.

Wherein the method comprises the steps ofIs the sequence number of the neighbor node of the i node. The softmax function is well known in the art and will not be described in detail herein. Fig. 3 is a schematic diagram of the attention mechanism of the neural network according to an embodiment of the present invention.

Step 1.4.2: and (3) distributing attention weights, repeatedly using the formula (4), namely carrying out multiple attention coefficient calculation and averaging, then accumulating the coefficients and averaging, adding the coefficients into a graph convolution network, and modifying a feature vector updating formula in the formula (3), as shown in the formula (6).

The best effect is achieved according to the experiment num_att of 12, and the difference between the formula (6) and the formula (3) is that the adjacent matrix is used when the neighbor node characteristics are aggregated in the formula (3), the same weight coefficient is allocated and cannot be changed, the formula (6) allocates different weight coefficients for different neighbor nodes, and the correlation among the nodes is better considered. Fig. 4 is a schematic diagram of aggregation of the neural network with attention mechanism according to an embodiment of the present invention.

Step 2: federal learning training was performed using a graph roll-up neural network (GCN). Fig. 1 is a schematic diagram of a federal learning framework based on a graph convolutional neural network in an embodiment of the present invention. The innovation point of the step 2 is that: unlike the traditional federal learning average aggregation algorithm, namely that local models of federal learning participants are aggregated in an average manner in a global model with the same weight, the method is unfavorable for model individualization, namely that each local model cannot be adapted to the respective field. The present patent improves the aggregation process of federal learning parameters by a mechanism of attention such that each local model has a weight that is relatively suitable for itself to aggregate. Thereby reducing the influence of data noise and increasing the individuation degree. The specific steps are step 2.4

Step 2.1: initializing global graph convolutional neural network GCN_G model parameters

Step 2.2: the federal learning participants are randomly selected as shown in equation (7).

Wherein num_fed is the number of federal learning participants, the federal learning has K participants, the proportion of each participant participating in calculation is C, and the symbol is givenMeaning a rounding down if c·k=2.99, then->max () is max (2, 1) =2, where max is taken to be maximum.

Step 2.3: and (3) downloading parameters to be learned after initializing a global model by the local model (the global model is consistent with the structure of the graph convolution neural network constructed in the step (1)).

Step 2.4: local graph convolutional neural network GCN_L model training

Step 2.4.1: the loss function is set and the parameters are updated by a batch random gradient descent method as shown in the following formula (8).

Wherein W represents the parameter to be learned in each local graph convolutional neural network, eta represents the learning rate,representing the difference between the predicted value y' and the true label y, which is used by the loss function to calculate the output of the neural network, for +.>Representing the partial derivative. At this time +.>Output result for last layer of graph convolution neural network

Step 2.4.2: federal attentiveness mechanisms calculate. GCN_L is calculated for the kth local graph convolution neural network model _k And (3) performing attention mechanism operation, and calculating attention weight coefficients so as to conveniently formulate a personalized model for uploading to a global model and aggregating with other local models. As shown in equation (9).

Wherein the method comprises the steps ofAttention weighting factor representing the kth local graph roll-up network of layer l, and +.>att () represents an attention mechanism calculation function, calculating the correlation between local gcn_l and global gcn_g, +_>Represents the kth local GCN_L of the first layer _k Trainable parameters, w ¹ Trainable parameters representing the layer i global gcn_g. Like equation (4), the symbol represents a dot product operation, and the symbol represents a matrix transversal concatenation.

Step 2.5: updating global GCN_G model parameters, and calculating each local GCN_L obtained in the step 2.4.2 _k Attention weighting coefficients together with the local GCN_L calculated in step 2.4.1 _k The model parameters are uploaded to the gcn_g global model for aggregation. As shown in equation (10).

Wherein the method comprises the steps ofAttention weighting coefficients assigned to the kth participant model at time t are indicated, +.>And the first layer parameters of the global model after aggregation at the time t+1 are represented.

The invention still adopts the framework of a classical transverse federal learning model, the innovation content of the model is mainly divided into two modules, the first module is used for improving the capability of the model for processing non-European data, the graph rolling neural network is utilized for mining the relevance between data, the participants adopt the graph rolling neural network as a local model for carrying out data modeling, and the attention mechanism module is used in the graph rolling neural network, so that the influence of data noise is reduced. The second module provides an improved attention mechanism algorithm for relieving the problem that the average aggregation algorithm causes the model to lack individuality and irregular data to have noise, and provides proper attention weight for each participant model, wherein the weight is applied to parameters of each layer so as to improve individuality degree of model parameters, so that the participant model is more suitable for respective fields, and noise influence caused by irregular data structures is further reduced to a certain extent.

Example two

The embodiment of the invention provides a federal learning device for credit card anti-fraud, which comprises a local model building module, a federal learning training module and a global graph convolution neural network model.

The local model construction module is used for constructing local graph convolutional neural network models corresponding to K federal learning participants with different fraud categories; each participant has a local undirected graph structure data G _i (V, E, A) (i.epsilon.K), where the set of nodes in the graph structure is V _i ∈V，v _i The feature on the node is x _i E X, each node contains multiple key characteristic information including user information, loan amount, deposit amount and credit data, and the edge set between nodes is e _i,j ＝(v _i ,v _j ) E is E; a represents an adjacency matrix, and defines the interconnection relation between nodes; the fraud categories include theft card fraud, virtual application fraud, and no fraud.

The federal learning training module is used for performing federal learning training by using the local graph convolution neural network model; the aggregation process of the federal learning parameters is improved by adopting an attention mechanism, so that each partial graph convolutional neural network model has weights adaptive to the partial graph convolutional neural network model for aggregation.

By the federal learning device of the second embodiment of the invention, the transmission object is determined by establishing the data inclusion relation of the whole application, so as to achieve the aim of identifying the fraud category of the credit card. The federal learning device provided by the embodiment of the invention can execute the federal learning method for credit card anti-fraud provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above examples, and all technical solutions belonging to the concept of the present invention belong to the protection scope of the present invention. It should be noted that modifications and adaptations to the invention without departing from the principles thereof are intended to be within the scope of the invention as set forth in the following claims.

Claims

1. A federal learning method for credit card anti-fraud, the federal learning method comprising the steps of:

s3, outputting a global graph convolutional neural network model, wherein the global graph convolutional neural network model is used for processing the imported user data and identifying the corresponding fraud category;

in step S2, the process of performing federal learning training using the local graph convolution neural network model includes the following steps:

wherein W represents the parameter to be learned in each local graph convolutional neural network, eta represents the learning rate, l ()' represents the difference between the predicted value output by the graph neural network and the real label calculated by the loss function,representing the partial derivative of the signal to be calculated,/>the output result of the last layer of the neural network is rolled up for the graph; y' represents a predicted value, y represents a real label;

wherein the method comprises the steps ofAttention weighting coefficient representing the kth local graph convolutional neural network model of the first layer, and +.>att () represents an attention mechanism calculation function, calculates the correlation between the local graph convolutional neural network model and the global graph convolutional neural network model,/o>Trainable parameters representing a layer i kth local graph convolutional neural network model, w ^l Trainable parameters representing a layer l global graph convolutional neural network model;

wherein the method comprises the steps ofAttention weighting coefficients assigned to the kth participant model at time t are indicated, +.>And (3) representing the first layer parameters of the global graph convolutional neural network model after aggregation at the time t+1.

2. The federal learning method for credit card anti-fraud according to claim 1, wherein the adjacency matrix includes two kinds of regional adjacency matrices for expressing regional distances or relatives and attribute adjacency matrices for expressing whether user-related information is similar.

3. The federal learning method for credit card anti-fraud according to claim 1, wherein in step S1, the process of constructing a local graph convolutional neural network model corresponding to K federal learning participants having different fraud categories comprises the steps of:

wherein x is _i For the characteristic raw data of each node,is the characteristic mean, μ is the variance;

wherein the method comprises the steps ofFeature vector representing the ith node in the layer 1 network, +.>Representation->W is adjacent to the adjacent node of (a) ^l Representing a feature vector dimension transformation matrix, att () representing an attention coefficient calculation function, i.e., calculating a correlation coefficient; characteristics of two neighboring nodes after transformation +.>Matrix transverse splicing and trainable parameters +.>Performing dot product operation to form a single-layer perceptron with a hidden layer and only one neuron, wherein the input is the characteristics of the spliced nodes, and the output is the similarity between the two nodes; symbol represents dot product operation, and symbol represents transverse splicing of the matrix;

s15, normalizing the attention factor using a softmax function:

s16, attention weight is distributed, attention coefficient calculation and averaging are carried out for a plurality of times, the coefficients are accumulated and averaged to obtain a final attention coefficient, the final attention coefficient is added into a graph convolution network, and a feature vector update formula is modified to obtain the final attention coefficient:

where num_att represents the number of attention coefficient calculations.

4. A federal learning method for credit card anti-fraud according to claim 3, wherein in step S14, the attention mechanism is only applied in first order neighbors, i.e. only node pairs with directly connected edges are considered.

5. A federal learning apparatus for credit card anti-fraud based on the method of any of claims 1-4, the federal learning apparatus comprising: