CN111105241B

CN111105241B - Identification method for anti-fraud of credit card transaction

Info

Publication number: CN111105241B
Application number: CN201911323155.8A
Authority: CN
Inventors: 董雪梅; 崔奔雷
Original assignee: Zhejiang Gongshang University
Current assignee: Zhejiang Gongshang University
Priority date: 2019-12-20
Filing date: 2019-12-20
Publication date: 2023-04-07
Anticipated expiration: 2039-12-20
Also published as: CN111105241A

Abstract

The invention discloses an identification method applied to credit card transaction anti-fraud, which specifically comprises the following steps: 101 Aggregate transaction characteristics based on identity characteristics; 102 Aggregate transaction time characteristics according to locality; 103 ) a prediction model processing step and 104) a prediction result step; the invention provides an anti-fraud identification method applied to credit card transactions based on fusion of multiple gradient lifting tree models.

Description

Identification method for anti-fraud of credit card transaction

Technical Field

The invention relates to the field of credit cards, in particular to an identification method applied to anti-fraud of credit card transactions.

Background

As the internet financial industry develops, the situation of performing financial service transactions through internet channels is becoming more and more popular. For both internet transaction parties, it is particularly important to be able to correctly evaluate transaction risks and prevent financial fraud and other situations from occurring in a wind control work.

For credit investigation examination and anti-fraud tests of internet financial users, various credit investigation and evaluation materials of the users need to be examined, so that transaction risks are evaluated, and benefits of financial platforms are guaranteed. At present, corresponding risk examination work also needs manual access to different degrees, so that the efficiency and stability of business development are limited.

Disclosure of Invention

The invention overcomes the defects of the prior art and provides an anti-fraud identification method applied to credit card transaction based on the fusion of a plurality of gradient lifting tree models.

In order to solve the technical problems, the technical scheme of the invention is as follows:

an identification method applied to credit card transaction anti-fraud specifically comprises the following steps:

101 Aggregate transaction characteristics based on identity characteristics;

102 ) aggregate transaction time characteristics according to locality;

103 Establishing three models of XGboost, catboost and LightGBM to predict credit card transaction to obtain a fraud probability judgment value;

the XGboost model has the following specific processing formula:

in the formula

Is a residual term->

Is a regularization term, where γ is the number of decision trees, T is the number of leaf nodes, and->

Lambda is a constant for the weight value of each leaf node;

will be shown in formula (1)

Changed to be->

Will as a function of loss in the formula>

Instead, it is changed into

As a regular term in the formula, the formula after conversion is as follows:

/>

in the formula

Is a newly added t tree, and the changed value is recorded as f _t (x _i ) (ii) a Wherein the t-1 tree is fitted to >>

Further decomposing the residual sum of squares of the previous t-1 trees, and the newly fitted t-th tree, convert equation (2) to the following equation:

so that each time a decision tree is found, f is made _t (x _i ) The maximum residual value is reduced;

will be that in formula (3)

As x, then f _t (x _i ) As Δ x, obj (t) = F (x + Δ x), it is subjected to taylor expansion and ÷ based on ÷ letter/or>

To (X)>

Is noted as the first derivative of _i The second derivative is denoted as h _i Ignoring the constant component C, the following equation is obtained:

wherein f is _t (x _i ) As a function of the leaf node weights based on the t-th tree, equation (4) is transformed as follows:

wherein

The samples are divided into leaf nodes, the sequential traversal from the sample 1 to the sample n is changed into the traversal from the sample on the leaf node 1 to the sample on the leaf node n, and the following formula is obtained:

note the book

Is G _i In or on>

Is H _i Is converted into w _j The multivariate extremum formula of (c):

the new objective function obtained by substituting equation (6) is:

according to the division of the leaf nodes, the divided parts are divided into an L part and an R part, and the classified profit formula is as follows:

obtaining a maximum fraud probability judgment value Gain of the XGboost;

104 According to the output results of the three models established in the step 103), the average weighted fusion of the results with low correlation is obtained by using a Pearson correlation coefficient matrix, and the final prediction result is obtained.

Further, the identification of the unique identity is determined based on the identification of the explicit and/or implicit identity characteristics of the client, and the transaction characteristics under the unique identity include the average amount of the transaction, the frequency of the transaction and the type of the used equipment.

Furthermore, the time characteristic is based on the time characteristic of the region, the time of the highest transaction frequency band is counted according to the region classification, and the difference value between each transaction time and the local high-frequency transaction time is calculated to serve as the important characteristic for judging abnormal transactions.

Further, the Catboost model randomly orders the training set, and for the p-th sample, the statistical value of the previous p-1 sample values is used for replacing the p-th sample, and the specific formula is as follows:

p and a are hyper-parameters so as to reduce noise obtained in a low-frequency category, and the robustness and generalization capability of the model are improved in a sequencing promotion mode.

Further, the specific steps of step 104) are as follows:

401 Obtain the pearson correlation coefficients for the three model output predictions;

402 The prediction result with the Pearson correlation coefficient lower than 0.99 in the step 401) is taken out, and the self prediction precision is close and better;

403 The prediction results of the three models are fused with the same weight to output a final result as a final prediction result.

Compared with the prior art, the invention has the advantages that:

the invention has complementation between characteristic types, and can better discover the real property of software by fusing the characteristics of different abstraction levels. Furthermore, since the assumptions of learning algorithms are different, there is no learning algorithm that is optimal for various types of problems. It is not an easy task to select a suitable classification algorithm for different features. Different classification algorithms have induction bias, various learning algorithms can exert respective advantages by being fused, and the defects are overcome, so that the accuracy of the classification algorithms is improved, the false alarm rate is reduced, and the generalization performance of the classification algorithms is improved.

Detailed Description

The following specific embodiments are given to further illustrate the present invention.

A recognition method applied to credit card transaction anti-fraud specifically comprises the following steps:

101 Aggregate transaction characteristics based on identity characteristics; the identification of the unique identity is identified based on the explicit and/or implicit identity characteristics of the client, and the transaction characteristics under the statistic unique identity comprise the average amount of the transaction, the frequency of the transaction and the type of the used equipment.

102 ) aggregate transaction time characteristics according to locality; the time characteristic is based on the time characteristic of the region, the time of the highest transaction frequency band is counted according to the region classification, and the difference value between each transaction time and the local high-frequency transaction time is calculated to serve as the important characteristic for judging the abnormal transaction.

the XGboost-based model has the following specific processing formula:

in the formula

Is a residual term->

Lambda is a constant for the weight value of each leaf node;

converting the formula (1) into the following formula (2), and concretely, converting the formula into the following formula

Is rewritten as->

Will as a loss function>

As a regularization term, overwritten as->

In the formula

Is a newly added t-th tree, and the changed value is recorded as f _t (x _i ) (ii) a Wherein the t-1 th tree is fitted to ^>

will be that in formula (3)

As x, then f _t (x _i ) As Δ x, obj (t) = F (x + Δ x), it is Taylor expanded and flagged =>

Is paired and/or matched>

wherein f is _t (x _i ) As a function of the weights of the leaf nodes based on the t-th tree, f _t (x _i ) Is determined by the weight w _q And (3) converting the formula (4) into a formula as follows:

wherein

The samples are divided into leaf nodes, and the sequential traversal of the samples 1 to n is changed into the traversal from the sample on the leaf node 1 to the sample on the leaf node n, so that the following formula is obtained:

/>

note the book

Is G _i Is recorded and judged>

Is H _i Is converted into w _j The multivariate extreme value formula of (1):

the new objective function can be obtained by substituting equation (6):

according to the division of the leaf nodes, dividing the divided part into an L part and an R part, and expressing the classified benefits as follows:

and traversing all possible conditions for each division, so that leaf nodes of each layer of each newly-built tree have the optimal weight coefficient, and the maximum fraud probability judgment value Gain based on the XGboost is obtained.

Randomly ordering the training set based on a Catboost model, and for the p sample, replacing the p sample with the statistical value of the previous p-1 sample values, wherein the concrete formula is as follows:

p and a are hyper-parameters and are used for reducing noise obtained in a low-frequency category, and the robustness and the generalization capability of the model are improved in a sequencing and promoting mode. Because the Catboost has great advantages in processing the classified data, the general processing of the classified data can be performed by adopting a coding (such as one-hot coding) mode and the like, but the scheme adopts a more effective strategy on a Catboost model, randomly orders a training set, improves the problem of prediction offset in the GBDT, and replaces a gradient calculation method (calculating gradient by using the same data set every time) in the GBDT by an ordered boosting mode, thereby achieving the effect of reducing gradient estimation deviation and improving the robustness and generalization capability of the model.

LightGBM directly adopts an improved algorithm of GBDT algorithm proposed by Microsoft in 2015, and has the main innovation points that a Gradient-based One-Side Sampling technology (GOSS) and an independent Feature merging technology (EFB) are adopted, so that the sample size is reduced, the calculation cost is reduced, and meanwhile, the considerable accuracy is ensured.

104 According to the output results of the three models established in the step 103), a pearson correlation coefficient matrix is obtained, and the results with low correlation are subjected to average weighted fusion to obtain the final prediction result. The specific process is as follows:

402 The prediction result with the Pearson correlation coefficient lower than 0.99 in the step 401) is taken out, and the prediction accuracy is close and better. Such as: according to the scheme, based on the probability judgment value of the fraud of the XGboost model, the obtained Pearson correlation coefficient performance reaches 0.95, based on the probability judgment value of the fraud of the LightGBM and the CatBOost model, the obtained Pearson correlation coefficient performance reaches 0.945, the probability judgment value performances of the fraud of the three models are close, but the correlation coefficient ratio of the output result is lower, and then fusion is needed.

403 The fraud probability judgment values of the three models in the step 402) are fused by the same weight, and a final result is output as a final prediction result. Such as: the probability judgment values of the output fraud of the three models are y1, y2 and y3 respectively. Then the final output result is 1/3 + y1+1/3 + y2+1/3 + y3.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the spirit of the present invention, and these modifications and decorations should also be regarded as being within the scope of the present invention.

Claims

1. An identification method applied to credit card transaction anti-fraud is characterized by comprising the following steps:

101 Aggregate transaction characteristics based on identity characteristics;

102 ) aggregate transaction time characteristics according to locality;

103 Establishing three models based on XGboost, catboost and LightGBM to predict credit card transactions, and obtaining a probability judgment value of fraud;

the XGboost model has the following specific processing formula:

in the formula

Is a residual term->

Is a canonical term, where γ is the number of decision trees, T is the number of leaf nodes, and>

lambda is a constant for the weight value of each leaf node;

will be shown in formula (1)

Changed to be->

Will as a function of loss in the formula>

Instead, it is changed into

As a regular term in the formula, the formula after conversion is as follows:

in the formula

Is a newly added t-th tree, and the changed value is recorded as f _t (x _i ) (ii) a Wherein the t-1 tree is fit to

will be that in formula (3)

As x, then f _t (x _i ) As Δ x, obj (t) = F (x + Δ x), it is subjected to taylor expansion and is recorded ÷ or>

Is paired and/or matched>

/>

wherein

note the book

Is G _i Is recorded and judged>

Is H _i Is converted into w _j The multivariate extremum formula of (c):

the new objective function obtained by substituting equation (6) is:

according to the division of the leaf nodes, the divided part is divided into an L part and an R part, and the classified income formula is as follows:

obtaining a probability judgment value Gain of the maximum fraud of the XGboost;

2. An identification method applied to credit card transaction anti-fraud according to claim 1, characterized in that: the identification of the unique identity is determined based on the identification of the explicit and/or implicit identity characteristics of the client, and the transaction characteristics under the statistical unique identity comprise the average amount of the transaction, the frequency of the transaction and the type of the used equipment.

3. An identification method applied to credit card transaction anti-fraud according to claim 1, characterized in that: the time characteristic is based on the time characteristic of the region, the time of the highest transaction frequency band is counted according to the region classification, and the difference value between each transaction time and the local high-frequency transaction time is calculated to serve as the important characteristic for judging the abnormal transaction.

4. An identification method applied to credit card transaction anti-fraud according to claim 1, characterized in that: the Catboost model randomly orders the training set, and for the p-th sample, the statistical value of the previous p-1 sample values is used for replacing the p-th sample, and the specific formula is as follows:

5. An identification method applied to credit card transaction anti-fraud according to claim 1, characterized in that: step 104) comprises the following specific steps:

402 ) the prediction result with the Pearson correlation coefficient lower than 0.99 in the step 401) is taken out, and the prediction precision is close and better;

403 The predicted results of the three models are fused with the same weight, and the final result is output as the final predicted result.