CN111105241B - Identification method for anti-fraud of credit card transaction - Google Patents
Identification method for anti-fraud of credit card transaction Download PDFInfo
- Publication number
- CN111105241B CN111105241B CN201911323155.8A CN201911323155A CN111105241B CN 111105241 B CN111105241 B CN 111105241B CN 201911323155 A CN201911323155 A CN 201911323155A CN 111105241 B CN111105241 B CN 111105241B
- Authority
- CN
- China
- Prior art keywords
- formula
- transaction
- fraud
- credit card
- identification method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q20/00—Payment architectures, schemes or protocols
- G06Q20/38—Payment protocols; Details thereof
- G06Q20/40—Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
- G06Q20/401—Transaction verification
- G06Q20/4016—Transaction verification involving fraud or risk level assessment in transaction processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Accounting & Taxation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Security & Cryptography (AREA)
- General Business, Economics & Management (AREA)
- Finance (AREA)
- Strategic Management (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses an identification method applied to credit card transaction anti-fraud, which specifically comprises the following steps: 101 Aggregate transaction characteristics based on identity characteristics; 102 Aggregate transaction time characteristics according to locality; 103 ) a prediction model processing step and 104) a prediction result step; the invention provides an anti-fraud identification method applied to credit card transactions based on fusion of multiple gradient lifting tree models.
Description
Technical Field
The invention relates to the field of credit cards, in particular to an identification method applied to anti-fraud of credit card transactions.
Background
As the internet financial industry develops, the situation of performing financial service transactions through internet channels is becoming more and more popular. For both internet transaction parties, it is particularly important to be able to correctly evaluate transaction risks and prevent financial fraud and other situations from occurring in a wind control work.
For credit investigation examination and anti-fraud tests of internet financial users, various credit investigation and evaluation materials of the users need to be examined, so that transaction risks are evaluated, and benefits of financial platforms are guaranteed. At present, corresponding risk examination work also needs manual access to different degrees, so that the efficiency and stability of business development are limited.
Disclosure of Invention
The invention overcomes the defects of the prior art and provides an anti-fraud identification method applied to credit card transaction based on the fusion of a plurality of gradient lifting tree models.
In order to solve the technical problems, the technical scheme of the invention is as follows:
an identification method applied to credit card transaction anti-fraud specifically comprises the following steps:
101 Aggregate transaction characteristics based on identity characteristics;
102 ) aggregate transaction time characteristics according to locality;
103 Establishing three models of XGboost, catboost and LightGBM to predict credit card transaction to obtain a fraud probability judgment value;
the XGboost model has the following specific processing formula:
in the formulaIs a residual term->Is a regularization term, where γ is the number of decision trees, T is the number of leaf nodes, and->Lambda is a constant for the weight value of each leaf node;
will be shown in formula (1)Changed to be->Will as a function of loss in the formula>Instead, it is changed intoAs a regular term in the formula, the formula after conversion is as follows:
in the formulaIs a newly added t tree, and the changed value is recorded as f t (x i ) (ii) a Wherein the t-1 tree is fitted to >>
Further decomposing the residual sum of squares of the previous t-1 trees, and the newly fitted t-th tree, convert equation (2) to the following equation:
so that each time a decision tree is found, f is made t (x i ) The maximum residual value is reduced;
will be that in formula (3)As x, then f t (x i ) As Δ x, obj (t) = F (x + Δ x), it is subjected to taylor expansion and ÷ based on ÷ letter/or>To (X)>Is noted as the first derivative of i The second derivative is denoted as h i Ignoring the constant component C, the following equation is obtained:
wherein f is t (x i ) As a function of the leaf node weights based on the t-th tree, equation (4) is transformed as follows:
whereinThe samples are divided into leaf nodes, the sequential traversal from the sample 1 to the sample n is changed into the traversal from the sample on the leaf node 1 to the sample on the leaf node n, and the following formula is obtained:
the new objective function obtained by substituting equation (6) is:
according to the division of the leaf nodes, the divided parts are divided into an L part and an R part, and the classified profit formula is as follows:
obtaining a maximum fraud probability judgment value Gain of the XGboost;
104 According to the output results of the three models established in the step 103), the average weighted fusion of the results with low correlation is obtained by using a Pearson correlation coefficient matrix, and the final prediction result is obtained.
Further, the identification of the unique identity is determined based on the identification of the explicit and/or implicit identity characteristics of the client, and the transaction characteristics under the unique identity include the average amount of the transaction, the frequency of the transaction and the type of the used equipment.
Furthermore, the time characteristic is based on the time characteristic of the region, the time of the highest transaction frequency band is counted according to the region classification, and the difference value between each transaction time and the local high-frequency transaction time is calculated to serve as the important characteristic for judging abnormal transactions.
Further, the Catboost model randomly orders the training set, and for the p-th sample, the statistical value of the previous p-1 sample values is used for replacing the p-th sample, and the specific formula is as follows:
p and a are hyper-parameters so as to reduce noise obtained in a low-frequency category, and the robustness and generalization capability of the model are improved in a sequencing promotion mode.
Further, the specific steps of step 104) are as follows:
401 Obtain the pearson correlation coefficients for the three model output predictions;
402 The prediction result with the Pearson correlation coefficient lower than 0.99 in the step 401) is taken out, and the self prediction precision is close and better;
403 The prediction results of the three models are fused with the same weight to output a final result as a final prediction result.
Compared with the prior art, the invention has the advantages that:
the invention has complementation between characteristic types, and can better discover the real property of software by fusing the characteristics of different abstraction levels. Furthermore, since the assumptions of learning algorithms are different, there is no learning algorithm that is optimal for various types of problems. It is not an easy task to select a suitable classification algorithm for different features. Different classification algorithms have induction bias, various learning algorithms can exert respective advantages by being fused, and the defects are overcome, so that the accuracy of the classification algorithms is improved, the false alarm rate is reduced, and the generalization performance of the classification algorithms is improved.
Detailed Description
The following specific embodiments are given to further illustrate the present invention.
A recognition method applied to credit card transaction anti-fraud specifically comprises the following steps:
101 Aggregate transaction characteristics based on identity characteristics; the identification of the unique identity is identified based on the explicit and/or implicit identity characteristics of the client, and the transaction characteristics under the statistic unique identity comprise the average amount of the transaction, the frequency of the transaction and the type of the used equipment.
102 ) aggregate transaction time characteristics according to locality; the time characteristic is based on the time characteristic of the region, the time of the highest transaction frequency band is counted according to the region classification, and the difference value between each transaction time and the local high-frequency transaction time is calculated to serve as the important characteristic for judging the abnormal transaction.
103 Establishing three models of XGboost, catboost and LightGBM to predict credit card transaction to obtain a fraud probability judgment value;
the XGboost-based model has the following specific processing formula:
in the formulaIs a residual term->Is a regularization term, where γ is the number of decision trees, T is the number of leaf nodes, and->Lambda is a constant for the weight value of each leaf node;
converting the formula (1) into the following formula (2), and concretely, converting the formula into the following formulaIs rewritten as->Will as a loss function>As a regularization term, overwritten as->
In the formulaIs a newly added t-th tree, and the changed value is recorded as f t (x i ) (ii) a Wherein the t-1 th tree is fitted to ^>
Further decomposing the residual sum of squares of the previous t-1 trees, and the newly fitted t-th tree, convert equation (2) to the following equation:
so that each time a decision tree is found, f is made t (x i ) The maximum residual value is reduced;
will be that in formula (3)As x, then f t (x i ) As Δ x, obj (t) = F (x + Δ x), it is Taylor expanded and flagged =>Is paired and/or matched>Is noted as the first derivative of i The second derivative is denoted as h i Ignoring the constant component C, the following equation is obtained:
wherein f is t (x i ) As a function of the weights of the leaf nodes based on the t-th tree, f t (x i ) Is determined by the weight w q And (3) converting the formula (4) into a formula as follows:
whereinThe samples are divided into leaf nodes, and the sequential traversal of the samples 1 to n is changed into the traversal from the sample on the leaf node 1 to the sample on the leaf node n, so that the following formula is obtained:
note the bookIs G i Is recorded and judged>Is H i Is converted into w j The multivariate extreme value formula of (1):
the new objective function can be obtained by substituting equation (6):
according to the division of the leaf nodes, dividing the divided part into an L part and an R part, and expressing the classified benefits as follows:
and traversing all possible conditions for each division, so that leaf nodes of each layer of each newly-built tree have the optimal weight coefficient, and the maximum fraud probability judgment value Gain based on the XGboost is obtained.
Randomly ordering the training set based on a Catboost model, and for the p sample, replacing the p sample with the statistical value of the previous p-1 sample values, wherein the concrete formula is as follows:
p and a are hyper-parameters and are used for reducing noise obtained in a low-frequency category, and the robustness and the generalization capability of the model are improved in a sequencing and promoting mode. Because the Catboost has great advantages in processing the classified data, the general processing of the classified data can be performed by adopting a coding (such as one-hot coding) mode and the like, but the scheme adopts a more effective strategy on a Catboost model, randomly orders a training set, improves the problem of prediction offset in the GBDT, and replaces a gradient calculation method (calculating gradient by using the same data set every time) in the GBDT by an ordered boosting mode, thereby achieving the effect of reducing gradient estimation deviation and improving the robustness and generalization capability of the model.
LightGBM directly adopts an improved algorithm of GBDT algorithm proposed by Microsoft in 2015, and has the main innovation points that a Gradient-based One-Side Sampling technology (GOSS) and an independent Feature merging technology (EFB) are adopted, so that the sample size is reduced, the calculation cost is reduced, and meanwhile, the considerable accuracy is ensured.
104 According to the output results of the three models established in the step 103), a pearson correlation coefficient matrix is obtained, and the results with low correlation are subjected to average weighted fusion to obtain the final prediction result. The specific process is as follows:
401 Obtain the pearson correlation coefficients for the three model output predictions;
402 The prediction result with the Pearson correlation coefficient lower than 0.99 in the step 401) is taken out, and the prediction accuracy is close and better. Such as: according to the scheme, based on the probability judgment value of the fraud of the XGboost model, the obtained Pearson correlation coefficient performance reaches 0.95, based on the probability judgment value of the fraud of the LightGBM and the CatBOost model, the obtained Pearson correlation coefficient performance reaches 0.945, the probability judgment value performances of the fraud of the three models are close, but the correlation coefficient ratio of the output result is lower, and then fusion is needed.
403 The fraud probability judgment values of the three models in the step 402) are fused by the same weight, and a final result is output as a final prediction result. Such as: the probability judgment values of the output fraud of the three models are y1, y2 and y3 respectively. Then the final output result is 1/3 + y1+1/3 + y2+1/3 + y3.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the spirit of the present invention, and these modifications and decorations should also be regarded as being within the scope of the present invention.
Claims (5)
1. An identification method applied to credit card transaction anti-fraud is characterized by comprising the following steps:
101 Aggregate transaction characteristics based on identity characteristics;
102 ) aggregate transaction time characteristics according to locality;
103 Establishing three models based on XGboost, catboost and LightGBM to predict credit card transactions, and obtaining a probability judgment value of fraud;
the XGboost model has the following specific processing formula:
in the formulaIs a residual term->Is a canonical term, where γ is the number of decision trees, T is the number of leaf nodes, and>lambda is a constant for the weight value of each leaf node;
will be shown in formula (1)Changed to be->Will as a function of loss in the formula>Instead, it is changed intoAs a regular term in the formula, the formula after conversion is as follows:
in the formulaIs a newly added t-th tree, and the changed value is recorded as f t (x i ) (ii) a Wherein the t-1 tree is fit to
Further decomposing the residual sum of squares of the previous t-1 trees, and the newly fitted t-th tree, convert equation (2) to the following equation:
so that each time a decision tree is found, f is made t (x i ) The maximum residual value is reduced;
will be that in formula (3)As x, then f t (x i ) As Δ x, obj (t) = F (x + Δ x), it is subjected to taylor expansion and is recorded ÷ or>Is paired and/or matched>Is noted as the first derivative of i The second derivative is denoted as h i Ignoring the constant component C, the following equation is obtained:
wherein f is t (x i ) As a function of the leaf node weights based on the t-th tree, equation (4) is transformed as follows:
whereinThe samples are divided into leaf nodes, and the sequential traversal of the samples 1 to n is changed into the traversal from the sample on the leaf node 1 to the sample on the leaf node n, so that the following formula is obtained:
note the bookIs G i Is recorded and judged>Is H i Is converted into w j The multivariate extremum formula of (c):
the new objective function obtained by substituting equation (6) is:
according to the division of the leaf nodes, the divided part is divided into an L part and an R part, and the classified income formula is as follows:
obtaining a probability judgment value Gain of the maximum fraud of the XGboost;
104 According to the output results of the three models established in the step 103), the average weighted fusion of the results with low correlation is obtained by using a Pearson correlation coefficient matrix, and the final prediction result is obtained.
2. An identification method applied to credit card transaction anti-fraud according to claim 1, characterized in that: the identification of the unique identity is determined based on the identification of the explicit and/or implicit identity characteristics of the client, and the transaction characteristics under the statistical unique identity comprise the average amount of the transaction, the frequency of the transaction and the type of the used equipment.
3. An identification method applied to credit card transaction anti-fraud according to claim 1, characterized in that: the time characteristic is based on the time characteristic of the region, the time of the highest transaction frequency band is counted according to the region classification, and the difference value between each transaction time and the local high-frequency transaction time is calculated to serve as the important characteristic for judging the abnormal transaction.
4. An identification method applied to credit card transaction anti-fraud according to claim 1, characterized in that: the Catboost model randomly orders the training set, and for the p-th sample, the statistical value of the previous p-1 sample values is used for replacing the p-th sample, and the specific formula is as follows:
p and a are hyper-parameters so as to reduce noise obtained in a low-frequency category, and the robustness and generalization capability of the model are improved in a sequencing promotion mode.
5. An identification method applied to credit card transaction anti-fraud according to claim 1, characterized in that: step 104) comprises the following specific steps:
401 Obtain the pearson correlation coefficients for the three model output predictions;
402 ) the prediction result with the Pearson correlation coefficient lower than 0.99 in the step 401) is taken out, and the prediction precision is close and better;
403 The predicted results of the three models are fused with the same weight, and the final result is output as the final predicted result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911323155.8A CN111105241B (en) | 2019-12-20 | 2019-12-20 | Identification method for anti-fraud of credit card transaction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911323155.8A CN111105241B (en) | 2019-12-20 | 2019-12-20 | Identification method for anti-fraud of credit card transaction |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111105241A CN111105241A (en) | 2020-05-05 |
CN111105241B true CN111105241B (en) | 2023-04-07 |
Family
ID=70423762
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911323155.8A Active CN111105241B (en) | 2019-12-20 | 2019-12-20 | Identification method for anti-fraud of credit card transaction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111105241B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112101951B (en) * | 2020-09-27 | 2023-09-26 | 中国银行股份有限公司 | Payment transaction detection method and device, storage medium and electronic equipment |
CN112950397B (en) * | 2021-05-17 | 2021-07-27 | 太平金融科技服务(上海)有限公司深圳分公司 | Claims risk estimation method and device, computer equipment and storage medium |
CN113781056A (en) * | 2021-09-17 | 2021-12-10 | 中国银行股份有限公司 | Method and device for predicting user fraud behavior |
CN116167872A (en) * | 2023-04-20 | 2023-05-26 | 湖南工商大学 | Abnormal medical data detection method, device and equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107993139A (en) * | 2017-11-15 | 2018-05-04 | 华融融通(北京)科技有限公司 | A kind of anti-fake system of consumer finance based on dynamic regulation database and method |
CN109034194A (en) * | 2018-06-20 | 2018-12-18 | 东华大学 | Transaction swindling behavior depth detection method based on feature differentiation |
CN110020868A (en) * | 2019-03-11 | 2019-07-16 | 同济大学 | Anti- fraud module Decision fusion method based on online trading feature |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190060766A1 (en) * | 2017-08-25 | 2019-02-28 | SixtyFive02, Inc. | Systems and methods of persistent, user-adapted personas |
-
2019
- 2019-12-20 CN CN201911323155.8A patent/CN111105241B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107993139A (en) * | 2017-11-15 | 2018-05-04 | 华融融通(北京)科技有限公司 | A kind of anti-fake system of consumer finance based on dynamic regulation database and method |
CN109034194A (en) * | 2018-06-20 | 2018-12-18 | 东华大学 | Transaction swindling behavior depth detection method based on feature differentiation |
CN110020868A (en) * | 2019-03-11 | 2019-07-16 | 同济大学 | Anti- fraud module Decision fusion method based on online trading feature |
Non-Patent Citations (1)
Title |
---|
陈安.基于机器学习的信用卡风险评估研究.江西财经大学.2018,全文. * |
Also Published As
Publication number | Publication date |
---|---|
CN111105241A (en) | 2020-05-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111105241B (en) | Identification method for anti-fraud of credit card transaction | |
CN110188198B (en) | Anti-fraud method and device based on knowledge graph | |
CN110659744B (en) | Training event prediction model, and method and device for evaluating operation event | |
Moody et al. | Architecture selection strategies for neural networks: Application to corporate bond rating prediction | |
US7283982B2 (en) | Method and structure for transform regression | |
Joanes | Reject inference applied to logistic regression for credit scoring | |
CN111291816A (en) | Method and device for carrying out feature processing aiming at user classification model | |
CN109376766B (en) | Portrait prediction classification method, device and equipment | |
CN109739844B (en) | Data classification method based on attenuation weight | |
CN106570631B (en) | P2P platform-oriented operation risk assessment method and system | |
CN109063983B (en) | Natural disaster damage real-time evaluation method based on social media data | |
CN107392217B (en) | Computer-implemented information processing method and device | |
CN112330153A (en) | Non-linear orthogonal regression-based industry scale prediction model modeling method and device | |
CN115293235A (en) | Method for establishing risk identification model and corresponding device | |
CN1252588C (en) | High spectrum remote sensing image combined weighting random sorting method | |
CN116993490B (en) | Automatic bank scene processing method and system based on artificial intelligence | |
CN112508684A (en) | Joint convolutional neural network-based collection risk rating method and system | |
CN117372144A (en) | Wind control strategy intelligent method and system applied to small sample scene | |
CN117114705A (en) | Continuous learning-based e-commerce fraud identification method and system | |
CN116542763A (en) | Internet financial credit default prediction method based on big data | |
CN116128339A (en) | Client credit evaluation method and device, storage medium and electronic equipment | |
CN115760454A (en) | Financial fraud identification method based on cycle width learning | |
Ramani et al. | Gradient boosting techniques for credit card fraud detection | |
EP3739517A1 (en) | Image processing | |
CN116843368B (en) | Marketing data processing method based on ARMA model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |