CN111105241A - Identification method for anti-fraud of credit card transaction - Google Patents
Identification method for anti-fraud of credit card transaction Download PDFInfo
- Publication number
- CN111105241A CN111105241A CN201911323155.8A CN201911323155A CN111105241A CN 111105241 A CN111105241 A CN 111105241A CN 201911323155 A CN201911323155 A CN 201911323155A CN 111105241 A CN111105241 A CN 111105241A
- Authority
- CN
- China
- Prior art keywords
- formula
- transaction
- fraud
- credit card
- identification method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q20/00—Payment architectures, schemes or protocols
- G06Q20/38—Payment protocols; Details thereof
- G06Q20/40—Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
- G06Q20/401—Transaction verification
- G06Q20/4016—Transaction verification involving fraud or risk level assessment in transaction processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
Abstract
The invention discloses an identification method applied to credit card transaction anti-fraud, which specifically comprises the following steps: 101) aggregating transaction characteristics according to the identity characteristics; 102) aggregating transaction time characteristics according to regions; 103) a prediction model processing step and 104) a prediction result step; the invention provides an anti-fraud identification method applied to credit card transactions, which is based on fusion of multiple gradient lifting tree models.
Description
Technical Field
The present invention relates to the field of credit cards, and more particularly, to a method for identifying anti-fraud in credit card transactions.
Background
As the internet financial industry develops, the situation of performing financial service transactions through internet channels is becoming more and more popular. For both internet transaction parties, it is particularly important to be able to correctly evaluate transaction risks and prevent financial fraud and other situations from occurring in a wind control work.
For credit investigation examination and anti-fraud tests of internet financial users, various credit investigation and evaluation materials of the users need to be examined and examined, so that transaction risks are evaluated, and benefits of financial platforms are guaranteed. At present, corresponding risk examination work also needs manual access to different degrees, so that the efficiency and stability of business development are limited.
Disclosure of Invention
The invention overcomes the defects of the prior art and provides an anti-fraud identification method applied to credit card transaction based on the fusion of a plurality of gradient lifting tree models.
In order to solve the technical problems, the technical scheme of the invention is as follows:
an identification method applied to credit card transaction anti-fraud specifically comprises the following steps:
101) aggregating transaction characteristics according to the identity characteristics;
102) aggregating transaction time characteristics according to regions;
103) establishing three models based on XGboost, Catboost and LightGBM to predict credit card transactions to obtain a probability judgment value of fraud;
the XGboost model has the following specific processing formula:
in the formulaIs the term for the residual error,is a regular term, wherein gamma is the number of decision trees, T is the number of leaf nodes,lambda is a constant for the weight value of each leaf node;
will be shown in formula (1)Instead, it is changed intoAs a loss function in the formulaInstead, it is changed intoAs a regular term in the formula, the formula after conversion is as follows:
in the formulaIs a newly added t-th tree, and the changed value is recorded as ft(xi) (ii) a Wherein the t-1 tree is fit to
Further decomposing the residual sum of squares of the previous t-1 trees, and the newly fitted t-th tree, convert equation (2) to the following equation:
so that each time a decision tree is found, f is madet(xi) The maximum residual value is reduced;
will be that in formula (3)As x, then ft(xi) Δ x, obj (t) ═ F (x + Δ x), taylor expansion is performed, and this is describedTo pairIs noted as the first derivative ofiThe second derivative is denoted as hiIgnoring the constant component C, the following equation is obtained:
wherein f ist(xi) As a function of the leaf node weights based on the t-th tree, equation (4) is transformed as follows:
whereinThe samples are divided into leaf nodes, and the sequential traversal of the samples 1 to n is changed into the traversal from the sample on the leaf node 1 to the sample on the leaf node n, so that the following formula is obtained:
note the bookIs GiMemory for recordingIs HiIs converted into wjThe multivariate extreme value formula of (1):
the new objective function obtained by substituting equation (6) is:
according to the division of the leaf nodes, the divided part is divided into an L part and an R part, and the classified income formula is as follows:
obtaining a maximum fraud probability judgment value Gain of the XGboost;
104) and performing average weighted fusion on the results with low correlation according to the output results of the three models established in the step 103) by using a Pearson correlation coefficient matrix to obtain a final prediction result.
Further, the identification of the unique identity is determined based on the identification of the explicit and/or implicit identity characteristics of the client, and the transaction characteristics under the unique identity include the average amount of the transaction, the frequency of the transaction and the type of the used equipment.
Furthermore, the time characteristic is based on the time characteristic of the region, the time of the highest transaction frequency band is counted according to the region classification, and the difference value between each transaction time and the local high-frequency transaction time is calculated to serve as the important characteristic for judging the abnormal transaction.
Further, the Catboost model randomly orders the training set, and for the p-th sample, the statistical value of the previous p-1 sample values is used for replacing the p-th sample, and the specific formula is as follows:
p and a are hyper-parameters so as to reduce noise obtained in a low-frequency category, and the robustness and generalization capability of the model are improved in a sequencing promotion mode.
Further, the specific steps of step 104) are as follows:
401) acquiring Pearson correlation coefficients predicted by the three model outputs;
402) taking out the prediction result of which the Pearson correlation coefficient is lower than 0.99 in the step 401), wherein the prediction precision is close and excellent;
403) and fusing the prediction results of the three models by the same weight to output a final result as a final prediction result.
Compared with the prior art, the invention has the advantages that:
the invention has complementation among the characteristic types, and the real property of the software can be better found by fusing the characteristics of different abstraction layers. Furthermore, since the assumptions of learning algorithms are different, there is no learning algorithm that is optimal for various types of problems. It is not an easy task to select a suitable classification algorithm for different features. Different classification algorithms have induction bias, various learning algorithms can exert respective advantages by being fused, and the defects are overcome, so that the accuracy of the classification algorithms is improved, the false alarm rate is reduced, and the generalization performance of the classification algorithms is improved.
Detailed Description
The following specific embodiments are given to further illustrate the present invention.
An identification method applied to credit card transaction anti-fraud specifically comprises the following steps:
101) aggregating transaction characteristics according to the identity characteristics; the identification of the unique identity is identified based on the explicit and/or implicit identity characteristics of the client, and the transaction characteristics under the statistic unique identity comprise the average amount of the transaction, the frequency of the transaction and the type of the used equipment.
102) Aggregating transaction time characteristics according to regions; the time characteristic is based on the time characteristic of the region, the time of the highest transaction frequency band is counted according to the region classification, and the difference value between each transaction time and the local high-frequency transaction time is calculated to serve as the important characteristic for judging the abnormal transaction.
103) Establishing three models based on XGboost, Catboost and LightGBM to predict credit card transactions to obtain a probability judgment value of fraud;
the XGboost-based model has the following specific processing formula:
in the formulaIs the term for the residual error,is a regular term, wherein gamma is the number of decision trees, T is the number of leaf nodes,lambda is a constant for the weight value of each leaf node;
converting the formula (1) into the following formula (2), concretelyIs rewritten asAs a loss function, willAs a regularization term, rewrite to
In the formulaIs a newly added t-th tree, and the changed value is recorded as ft(xi) (ii) a Wherein the t-1 tree is fit to
Further decomposing the residual sum of squares of the previous t-1 trees, and the newly fitted t-th tree, convert equation (2) to the following equation:
so that each time a decision tree is found, f is madet(xi) The maximum residual value is reduced;
will be that in formula (3)As x, then ft(xi) Δ x, obj (t) ═ F (x + Δ x), taylor expansion is performed, and this is describedTo pairIs noted as the first derivative ofiThe second derivative is denoted as hiIgnoring the constant component C, the following equation is obtained:
wherein f ist(xi) As a function of the leaf node weights based on the t-th tree, ft(xi) Is determined by the weight wqAnd (4) converting and expressing the formula (4) as the following formula:
whereinThe samples are divided into leaf nodes, and the sequential traversal of the samples 1 to n is changed into the traversal from the sample on the leaf node 1 to the sample on the leaf node n, so that the following formula is obtained:
note the bookIs GiMemory for recordingIs HiIs converted into wjThe multivariate extreme value formula of (1):
the new objective function obtained by substituting equation (6) is:
according to the division of the leaf nodes, dividing the divided part into an L part and an R part, and expressing the classified benefits as follows:
and traversing all possible conditions for each division, so that leaf nodes of each layer of each newly-built tree have the optimal weight coefficient, and the maximum fraud probability judgment value Gain based on the XGboost is obtained.
Randomly ordering the training set based on a Catboost model, and replacing the p sample with the statistical value of the previous p-1 sample values for the p sample, wherein the specific formula is as follows:
p and a are hyper-parameters and are used for reducing noise obtained in a low-frequency category, and the robustness and the generalization capability of the model are improved in a sequencing and promoting mode. Because the Catboost has great advantages in processing the classified data, the general processing of the classified data can be performed by adopting a coding (such as one-hot coding) mode and the like, but the scheme adopts a more effective strategy on a Catboost model, randomly orders a training set, improves the problem of prediction offset in the GBDT, and replaces a gradient calculation method (calculating gradient by using the same data set every time) in the GBDT by an ordered boosting mode, thereby achieving the effect of reducing gradient estimation deviation and improving the robustness and generalization capability of the model.
LightGBM directly adopts an improved algorithm of GBDT algorithm proposed by Microsoft in 2015, and has the main innovation point that the method reduces the sample size, reduces the calculation overhead and ensures the considerable accuracy rate by adopting a Gradient-based One-Side Sampling (GOSS) technology and an independent Feature merging (EFB) technology.
104) Obtaining a Pearson correlation coefficient matrix according to the output results of the three models established in the step 103), and performing average weighted fusion on the results with low correlation to obtain a final prediction result. The specific process is as follows:
401) acquiring Pearson correlation coefficients predicted by the three model outputs;
402) and (4) taking out the prediction result of which the Pearson correlation coefficient is lower than 0.99 in the step 401), and the prediction accuracy is close and better. Such as: in the scheme, based on the probability judgment value of the fraud of the XGboost model, the obtained Pearson correlation coefficient performance reaches 0.95, based on the probability judgment values of the fraud of the LightGBM and the Catboost model, the obtained Pearson correlation coefficient performance reaches 0.945, the probability judgment values of the fraud of the three models are close to each other, but the correlation coefficient ratio of the output result is lower, and then fusion is needed.
403) Fusing the probability judgment values of the fraud of the three models in the step 402) by the same weight to output a final result as a final prediction result. Such as: the probability judgment values of the output fraud of the three models are y1, y2 and y3 respectively. The final output is 1/3 by y1+1/3 by y2+1/3 by y 3.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the spirit of the present invention, and these modifications and decorations should also be regarded as being within the scope of the present invention.
Claims (5)
1. An identification method applied to credit card transaction anti-fraud is characterized by comprising the following steps:
101) aggregating transaction characteristics according to the identity characteristics;
102) aggregating transaction time characteristics according to regions;
103) establishing three models based on XGboost, Catboost and LightGBM to predict credit card transactions to obtain a probability judgment value of fraud;
the XGboost model has the following specific processing formula:
in the formulaIs the term for the residual error,is a regular term, wherein gamma is the number of decision trees, T is the number of leaf nodes,lambda is a constant for the weight value of each leaf node;
will be shown in formula (1)Instead, it is changed intoAs a loss function in the formulaInstead, it is changed intoAs a regular term in the formula, the formula after conversion is as follows:
in the formulaIs a newly added t-th tree, and the changed value is recorded as ft(xi) (ii) a Wherein the t-1 tree is fit to
Further decomposing the residual sum of squares of the previous t-1 trees, and the newly fitted t-th tree, convert equation (2) to the following equation:
so that each time a decision tree is found, f is madet(xi) The maximum residual value is reduced;
will be that in formula (3)As x, then ft(xi) Δ x, obj (t) ═ F (x + Δ x), taylor expansion is performed, and this is describedTo pairIs noted as the first derivative ofiThe second derivative is denoted as hiIgnoring the constant component C, the following equation is obtained:
wherein f ist(xi) As a function of the leaf node weights based on the t-th tree, equation (4) is transformed as follows:
whereinThe samples are divided into leaf nodes, and the sequential traversal of the samples 1 to n is changed into the traversal from the sample on the leaf node 1 to the sample on the leaf node n, so as to obtain the result ofThe following formula:
note the bookIs GiMemory for recordingIs HiIs converted into wjThe multivariate extreme value formula of (1):
the new objective function obtained by substituting equation (6) is:
according to the division of the leaf nodes, the divided part is divided into an L part and an R part, and the classified income formula is as follows:
obtaining a maximum fraud probability judgment value Gain of the XGboost;
104) and performing average weighted fusion on the results with low correlation according to the output results of the three models established in the step 103) by using a Pearson correlation coefficient matrix to obtain a final prediction result.
2. An identification method applied to credit card transaction anti-fraud according to claim 1, characterized in that: the identification of the unique identity is determined based on the identification of the explicit and/or implicit identity characteristics of the client, and the transaction characteristics under the statistical unique identity comprise the average amount of the transaction, the frequency of the transaction and the type of the used equipment.
3. An identification method applied to credit card transaction anti-fraud according to claim 1, characterized in that: the time characteristic is based on the time characteristic of the region, the time of the highest transaction frequency band is counted according to the region classification, and the difference value between each transaction time and the local high-frequency transaction time is calculated to serve as the important characteristic for judging the abnormal transaction.
4. An identification method applied to credit card transaction anti-fraud according to claim 1, characterized in that: the Catboost model randomly orders the training set, and for the p-th sample, the statistical value of the previous p-1 sample values is used for replacing the p-th sample, and the specific formula is as follows:
p and a are hyper-parameters so as to reduce noise obtained in a low-frequency category, and the robustness and generalization capability of the model are improved in a sequencing promotion mode.
5. An identification method applied to credit card transaction anti-fraud according to claim 1, characterized in that: step 104) comprises the following specific steps:
401) acquiring Pearson correlation coefficients predicted by the three model outputs;
402) taking out the prediction result of which the Pearson correlation coefficient is lower than 0.99 in the step 401), and the prediction precision is close and better;
403) and fusing the prediction results of the three models by the same weight to output a final result as a final prediction result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911323155.8A CN111105241B (en) | 2019-12-20 | 2019-12-20 | Identification method for anti-fraud of credit card transaction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911323155.8A CN111105241B (en) | 2019-12-20 | 2019-12-20 | Identification method for anti-fraud of credit card transaction |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111105241A true CN111105241A (en) | 2020-05-05 |
CN111105241B CN111105241B (en) | 2023-04-07 |
Family
ID=70423762
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911323155.8A Active CN111105241B (en) | 2019-12-20 | 2019-12-20 | Identification method for anti-fraud of credit card transaction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111105241B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112101951A (en) * | 2020-09-27 | 2020-12-18 | 中国银行股份有限公司 | Payment transaction detection method and device, storage medium and electronic equipment |
CN112950397A (en) * | 2021-05-17 | 2021-06-11 | 太平金融科技服务(上海)有限公司深圳分公司 | Claims risk estimation method and device, computer equipment and storage medium |
CN116167872A (en) * | 2023-04-20 | 2023-05-26 | 湖南工商大学 | Abnormal medical data detection method, device and equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107993139A (en) * | 2017-11-15 | 2018-05-04 | 华融融通(北京)科技有限公司 | A kind of anti-fake system of consumer finance based on dynamic regulation database and method |
CN109034194A (en) * | 2018-06-20 | 2018-12-18 | 东华大学 | Transaction swindling behavior depth detection method based on feature differentiation |
US20190060766A1 (en) * | 2017-08-25 | 2019-02-28 | SixtyFive02, Inc. | Systems and methods of persistent, user-adapted personas |
CN110020868A (en) * | 2019-03-11 | 2019-07-16 | 同济大学 | Anti- fraud module Decision fusion method based on online trading feature |
-
2019
- 2019-12-20 CN CN201911323155.8A patent/CN111105241B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190060766A1 (en) * | 2017-08-25 | 2019-02-28 | SixtyFive02, Inc. | Systems and methods of persistent, user-adapted personas |
CN107993139A (en) * | 2017-11-15 | 2018-05-04 | 华融融通(北京)科技有限公司 | A kind of anti-fake system of consumer finance based on dynamic regulation database and method |
CN109034194A (en) * | 2018-06-20 | 2018-12-18 | 东华大学 | Transaction swindling behavior depth detection method based on feature differentiation |
CN110020868A (en) * | 2019-03-11 | 2019-07-16 | 同济大学 | Anti- fraud module Decision fusion method based on online trading feature |
Non-Patent Citations (1)
Title |
---|
陈安: "基于机器学习的信用卡风险评估研究" * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112101951A (en) * | 2020-09-27 | 2020-12-18 | 中国银行股份有限公司 | Payment transaction detection method and device, storage medium and electronic equipment |
CN112101951B (en) * | 2020-09-27 | 2023-09-26 | 中国银行股份有限公司 | Payment transaction detection method and device, storage medium and electronic equipment |
CN112950397A (en) * | 2021-05-17 | 2021-06-11 | 太平金融科技服务(上海)有限公司深圳分公司 | Claims risk estimation method and device, computer equipment and storage medium |
CN116167872A (en) * | 2023-04-20 | 2023-05-26 | 湖南工商大学 | Abnormal medical data detection method, device and equipment |
Also Published As
Publication number | Publication date |
---|---|
CN111105241B (en) | 2023-04-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021164382A1 (en) | Method and apparatus for performing feature processing for user classification model | |
CN111105241B (en) | Identification method for anti-fraud of credit card transaction | |
CN110659744A (en) | Training event prediction model, and method and device for evaluating operation event | |
CN106570631B (en) | P2P platform-oriented operation risk assessment method and system | |
CN109376766B (en) | Portrait prediction classification method, device and equipment | |
CN112115967B (en) | Image increment learning method based on data protection | |
CN111582538A (en) | Community value prediction method and system based on graph neural network | |
CN112700324A (en) | User loan default prediction method based on combination of Catboost and restricted Boltzmann machine | |
CN112541817A (en) | Marketing response processing method and system for potential customers of personal consumption loan | |
CN109063983B (en) | Natural disaster damage real-time evaluation method based on social media data | |
CN111047193A (en) | Enterprise credit scoring model generation algorithm based on credit big data label | |
CN107392217B (en) | Computer-implemented information processing method and device | |
CN111160959A (en) | User click conversion estimation method and device | |
CN111899055A (en) | Machine learning and deep learning-based insurance client repurchase prediction method in big data financial scene | |
CN113256409A (en) | Bank retail customer attrition prediction method based on machine learning | |
CN112330153A (en) | Non-linear orthogonal regression-based industry scale prediction model modeling method and device | |
CN115545886A (en) | Overdue risk identification method, overdue risk identification device, overdue risk identification equipment and storage medium | |
Yahaya et al. | An enhanced bank customers churn prediction model using a hybrid genetic algorithm and k-means filter and artificial neural network | |
CN115205011B (en) | Bank user portrait model generation method based on LSF-FC algorithm | |
TWI792101B (en) | Data Quantification Method Based on Confirmed Value and Predicted Value | |
Giannopoulos | The effectiveness of artificial credit scoring models in predicting NPLs using micro accounting data | |
CN111709844A (en) | Insurance money laundering personnel detection method and device and computer readable storage medium | |
Mitra et al. | an empirical study on FDI inflows in Indian it and ites sector | |
CN111429215B (en) | Data processing method and device | |
CN112633399B (en) | Sparse collaborative joint representation pattern recognition method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |