CN112967053A - Method and device for detecting fraudulent transactions - Google Patents
Method and device for detecting fraudulent transactions Download PDFInfo
- Publication number
- CN112967053A CN112967053A CN202110236023.2A CN202110236023A CN112967053A CN 112967053 A CN112967053 A CN 112967053A CN 202110236023 A CN202110236023 A CN 202110236023A CN 112967053 A CN112967053 A CN 112967053A
- Authority
- CN
- China
- Prior art keywords
- transaction
- model
- detected
- detection model
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 108
- 238000001514 detection method Methods 0.000 claims abstract description 176
- 238000012549 training Methods 0.000 claims abstract description 85
- 238000010801 machine learning Methods 0.000 claims abstract description 20
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 17
- 238000012795 verification Methods 0.000 claims description 65
- 238000013213 extrapolation Methods 0.000 claims description 20
- 238000012360 testing method Methods 0.000 claims description 19
- 238000012545 processing Methods 0.000 claims description 7
- 238000007619 statistical method Methods 0.000 claims description 5
- 238000003860 storage Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 description 28
- 230000000694 effects Effects 0.000 description 23
- 230000008569 process Effects 0.000 description 18
- 238000004891 communication Methods 0.000 description 9
- 230000007774 longterm Effects 0.000 description 9
- 230000006870 function Effects 0.000 description 7
- 238000012216 screening Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 230000008520 organization Effects 0.000 description 6
- 238000012546 transfer Methods 0.000 description 6
- 230000004927 fusion Effects 0.000 description 5
- PCTMTFRHKVHKIS-BMFZQQSSSA-N (1s,3r,4e,6e,8e,10e,12e,14e,16e,18s,19r,20r,21s,25r,27r,30r,31r,33s,35r,37s,38r)-3-[(2r,3s,4s,5s,6r)-4-amino-3,5-dihydroxy-6-methyloxan-2-yl]oxy-19,25,27,30,31,33,35,37-octahydroxy-18,20,21-trimethyl-23-oxo-22,39-dioxabicyclo[33.3.1]nonatriaconta-4,6,8,10 Chemical compound C1C=C2C[C@@H](OS(O)(=O)=O)CC[C@]2(C)[C@@H]2[C@@H]1[C@@H]1CC[C@H]([C@H](C)CCCC(C)C)[C@@]1(C)CC2.O[C@H]1[C@@H](N)[C@H](O)[C@@H](C)O[C@H]1O[C@H]1/C=C/C=C/C=C/C=C/C=C/C=C/C=C/[C@H](C)[C@@H](O)[C@@H](C)[C@H](C)OC(=O)C[C@H](O)C[C@H](O)CC[C@@H](O)[C@H](O)C[C@H](O)C[C@](O)(C[C@H](O)[C@H]2C(O)=O)O[C@H]2C1 PCTMTFRHKVHKIS-BMFZQQSSSA-N 0.000 description 4
- 230000004044 response Effects 0.000 description 4
- 238000005070 sampling Methods 0.000 description 4
- 230000006399 behavior Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 230000002354 daily effect Effects 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000011897 real-time detection Methods 0.000 description 3
- 230000002829 reductive effect Effects 0.000 description 3
- 230000002411 adverse Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000004140 cleaning Methods 0.000 description 2
- 238000011217 control strategy Methods 0.000 description 2
- 238000007477 logistic regression Methods 0.000 description 2
- 238000000528 statistical test Methods 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000004931 aggregating effect Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013475 authorization Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000007123 defense Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 210000001503 joint Anatomy 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 230000036961 partial effect Effects 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q20/00—Payment architectures, schemes or protocols
- G06Q20/38—Payment protocols; Details thereof
- G06Q20/382—Payment protocols; Details thereof insuring higher security of transaction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q20/00—Payment architectures, schemes or protocols
- G06Q20/38—Payment protocols; Details thereof
- G06Q20/40—Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
- G06Q20/401—Transaction verification
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Accounting & Taxation (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- General Business, Economics & Management (AREA)
- Strategic Management (AREA)
- Finance (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Control Of Vending Devices And Auxiliary Devices For Vending Devices (AREA)
Abstract
The invention discloses a method and a device for detecting fraudulent transactions, wherein the method comprises the following steps: acquiring a transaction to be detected; determining target time sequence characteristic information of the transaction to be detected; inputting the transaction to be detected and the target time sequence characteristic information into at least one transaction detection model to obtain at least one model scoring result of the transaction to be detected; aiming at any transaction detection model in the at least one transaction detection model, the transaction detection model is obtained by training an initial model corresponding to the transaction detection model according to a machine learning algorithm corresponding to the transaction detection model based on a data set corresponding to the transaction detection model; any piece of training data in the data set corresponding to the transaction detection model comprises a transaction, timing characteristic information of the transaction and a transaction label; and determining whether the transaction to be detected is a fraudulent transaction or not according to the at least one model scoring result.
Description
Technical Field
The invention relates to the technical field of transaction security, in particular to a method and a device for detecting fraudulent transactions.
Background
A fraudulent transaction is a transaction that a lawbreaker pretends to be performed by a cardholder. The current risk of the fraudulent transaction is increasingly prominent, the fraudulent manipulation is various, the fraudulent transaction has a plurality of adverse effects on the society, and the fraudulent transaction is widely valued by the supervision department and the banking industry. However, fraudulent transactions are becoming more and more technically intensive and fraudulent activity is becoming more and more concealed.
Therefore, it is important to the security of the bank transaction to detect whether a transaction is fraudulent in time. In the current fraud transaction detection method, a simple logic rule combination is established for identification based on wind control experience. However, the combination of rules with simple logic is not easy to find out the deep characteristics of fraudulent behaviors, so that the detection of fraudulent transactions is not accurate enough at present.
Disclosure of Invention
The invention provides a method and a device for detecting fraudulent transactions, which solve the problem that the detection of the current fraudulent transactions in the prior art is not accurate enough.
In a first aspect, the present invention provides a method of detecting fraudulent transactions, comprising:
acquiring a transaction to be detected;
determining target time sequence characteristic information of the transaction to be detected;
inputting the transaction to be detected and the target time sequence characteristic information into at least one transaction detection model to obtain at least one model scoring result of the transaction to be detected; aiming at any transaction detection model in the at least one transaction detection model, the transaction detection model is obtained by training an initial model corresponding to the transaction detection model according to a machine learning algorithm corresponding to the transaction detection model based on a data set corresponding to the transaction detection model; any piece of training data in the data set corresponding to the transaction detection model comprises a transaction, timing characteristic information of the transaction and a transaction label;
and determining whether the transaction to be detected is a fraudulent transaction or not according to the at least one model scoring result.
In the above manner, for any transaction detection model in the at least one transaction detection model, the transaction detection model is not trained only according to the transaction in the training process, but also timing sequence characteristic information of the transaction is considered, and the transaction has batch and associated timing sequence characteristics, so that the transaction detection model can learn knowledge of the transaction in the timing sequence aspect, so that after target timing sequence characteristic information is obtained when the transaction to be detected is detected, the transaction to be detected and the target timing sequence characteristic information are input into the at least one transaction detection model, the transaction to be detected and the target timing sequence characteristic information can be considered, the transaction to be detected can be detected more fully and comprehensively, and detection of fraudulent transactions is more accurate.
Optionally, any piece of training data in the data set corresponding to the transaction detection model is selected; the training data is obtained as follows:
acquiring original data of the training data; the original data comprises transactions, timing characteristic information of the transactions and transaction labels; the transaction tag is a normal transaction, or a suspected fraud transaction, or a fraud transaction;
if the transaction label is suspected fraud transaction, inputting the original data into an initial model of the transaction detection model to obtain a model scoring result of the original data;
and modifying the transaction label into normal transaction or fraudulent transaction according to the model scoring result of the original data.
In the method, the suspected fraud transaction is scored, and the transaction label is modified into a normal transaction or a fraud transaction in time, so that the accuracy of the data set and the completeness of the training data are improved.
Optionally, the determining the target timing characteristic information of the transaction to be detected includes:
acquiring a plurality of transaction time sequence information related to the transaction to be detected in a time sequence of a plurality of dimensions;
and carrying out statistical analysis on the transaction time sequence information to determine the target time sequence characteristic information of the transaction to be detected.
In the method, the target time sequence characteristic information of the transaction to be detected is obtained by comprehensively considering a plurality of transaction time sequence information related to the transaction to be detected in a time sequence of a plurality of dimensions, so that the target time sequence characteristic information is more accurate.
Optionally, the plurality of transaction timing information includes at least one of: the transaction timing sequence information of the transaction to be detected in the geographic position dimension in the first preset time period, the transaction timing sequence information of the transaction to be detected in the merchant dimension in the second preset time period, and the transaction timing sequence information of the transaction to be detected in the card dimension in the third preset time period.
Optionally, the transaction detection model is obtained specifically according to the following method:
dividing a data set corresponding to the transaction detection model into a plurality of sub data sets;
for any sub data set of the multiple sub data sets, dividing the sub data sets into a training set of the sub data sets, a verification set of the sub data sets and an extrapolation test set of the sub data sets according to the time sequence characteristics of data in the sub data sets;
training an intermediate model of the transaction detection model according to a machine learning algorithm corresponding to the transaction detection model based on the training set of the sub data set and the verification set of the sub data set; the intermediate model is the initial model or a model obtained by training according to the initial model and the subdata set;
if the model after the intermediate model training does not meet the preset convergence condition of the subdata set, updating the intermediate model; and otherwise, taking the intermediate model at the moment as the transaction detection model.
Optionally, it is determined whether the intermediate model meets a preset convergence condition of the sub data set according to the following manner:
verifying the model after the intermediate model is trained according to a K-S verification method based on the verification set of the sub data set and the extrapolation test set of the sub data set to obtain a K-S verification result of the intermediate model; and/or based on the verification set of the sub data set and the extrapolation test set of the sub data set, verifying the model after the middle model training according to an AUC verification method, and obtaining an AUC verification result of the middle model;
and determining whether the intermediate model meets the preset convergence condition or not according to the K-S verification result and/or the AUC verification result of the intermediate model.
In the above manner, the stability and generalization ability of the model can be judged according to the K-S verification result and/or the AUC verification result of the intermediate model, so as to further consider the stability of the model.
Optionally, the determining, according to the at least one model scoring result, whether the transaction to be detected is a fraudulent transaction includes:
obtaining a comprehensive model scoring result of the transaction to be detected according to the at least one model scoring result and a weighted average method; or,
inputting the at least one model scoring result into a high-level nested model to obtain a comprehensive model scoring result of the transaction to be detected; the high-level nested model is obtained by training according to a machine learning algorithm based on a model scoring result obtained when a data set corresponding to the at least one transaction detection model is trained;
and determining whether the transaction to be detected is a fraud transaction according to the comprehensive model scoring result.
In the above manner, by fusing the scoring results of the multiple models, the comprehensive situation of the multiple models can be considered, so that a comprehensive transaction detection model is further obtained.
In a second aspect, the present invention provides a fraudulent transaction detection apparatus comprising:
the acquisition module is used for acquiring the transaction to be detected;
the processing module is used for determining the target time sequence characteristic information of the transaction to be detected;
the determining module is used for inputting the transaction to be detected and the target time sequence characteristic information into at least one transaction detection model to obtain at least one model scoring result of the transaction to be detected; aiming at any transaction detection model in the at least one transaction detection model, the transaction detection model is obtained by training an initial model corresponding to the transaction detection model according to a machine learning algorithm corresponding to the transaction detection model based on a data set corresponding to the transaction detection model; any piece of training data in the data set corresponding to the transaction detection model comprises a transaction, timing characteristic information of the transaction and a transaction label; and the system is used for determining whether the transaction to be detected is a fraudulent transaction or not according to the at least one model scoring result.
Optionally, any piece of training data in the data set corresponding to the transaction detection model is selected; the apparatus further comprises an establishment module configured to:
the training data were obtained as follows:
acquiring original data of the training data; the original data comprises transactions, timing characteristic information of the transactions and transaction labels; the transaction tag is a normal transaction, or a suspected fraud transaction, or a fraud transaction; if the transaction label is suspected fraud transaction, inputting the original data into an initial model of the transaction detection model to obtain a model scoring result of the original data; and modifying the transaction label into normal transaction or fraudulent transaction according to the model scoring result of the original data.
Optionally, the processing module is specifically configured to:
acquiring a plurality of transaction time sequence information related to the transaction to be detected in a time sequence of a plurality of dimensions;
and carrying out statistical analysis on the transaction time sequence information to determine the target time sequence characteristic information of the transaction to be detected.
Optionally, the plurality of transaction timing information includes at least one of: the transaction timing sequence information of the transaction to be detected in the geographic position dimension in the first preset time period, the transaction timing sequence information of the transaction to be detected in the merchant dimension in the second preset time period, and the transaction timing sequence information of the transaction to be detected in the card dimension in the third preset time period.
Optionally, the apparatus further includes an establishing module, where the establishing module is specifically configured to:
the transaction detection model is obtained as follows:
dividing a data set corresponding to the transaction detection model into a plurality of sub data sets;
for any sub data set of the multiple sub data sets, dividing the sub data sets into a training set of the sub data sets, a verification set of the sub data sets and an extrapolation test set of the sub data sets according to the time sequence characteristics of data in the sub data sets;
training an intermediate model of the transaction detection model according to a machine learning algorithm corresponding to the transaction detection model based on the training set of the sub data set and the verification set of the sub data set; the intermediate model is the initial model or a model obtained by training according to the initial model and the subdata set;
if the model after the intermediate model training does not meet the preset convergence condition of the subdata set, updating the intermediate model; and otherwise, taking the intermediate model at the moment as the transaction detection model.
Optionally, the establishing module is specifically configured to:
determining whether the intermediate model satisfies the preset convergence condition in the following manner:
verifying the model after the intermediate model is trained according to a K-S verification method based on the verification set of the sub data set and the extrapolation test set of the sub data set to obtain a K-S verification result of the intermediate model; and/or based on the verification set of the sub data set and the extrapolation test set of the sub data set, verifying the model after the middle model training according to an AUC verification method, and obtaining an AUC verification result of the middle model;
and determining whether the intermediate model meets the preset convergence condition of the subdata set or not according to the K-S verification result and/or the AUC verification result of the intermediate model.
Optionally, the determining module is specifically configured to:
obtaining a comprehensive model scoring result of the transaction to be detected according to the at least one model scoring result and a weighted average method; or inputting the at least one model scoring result into a high-level nested model to obtain a comprehensive model scoring result of the transaction to be detected; the high-level nested model is obtained by training according to a machine learning algorithm based on a model scoring result obtained when a data set corresponding to the at least one transaction detection model is trained;
and determining whether the transaction to be detected is a fraud transaction according to the comprehensive model scoring result.
The advantageous effects of the second aspect and the various optional apparatuses of the second aspect may refer to the advantageous effects of the first aspect and the various optional methods of the first aspect, and are not described herein again.
In a third aspect, the present invention provides a computer device comprising a program or instructions for performing the method of the first aspect and the alternatives of the first aspect when the program or instructions are executed.
In a fourth aspect, the present invention provides a storage medium comprising a program or instructions which, when executed, is adapted to perform the method of the first aspect and the alternatives of the first aspect.
These and other aspects of the invention are apparent from and will be elucidated with reference to the embodiments described hereinafter.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a schematic diagram of a first system architecture to which a method for detecting a fraudulent transaction according to an embodiment of the present invention is applicable;
fig. 2 is a schematic diagram of a second system architecture to which a method for detecting a fraudulent transaction according to an embodiment of the present invention is applicable;
FIG. 3 is a flowchart illustrating a method for detecting fraudulent transactions according to an embodiment of the present invention;
fig. 4 is a schematic deployment diagram of at least one transaction detection model in a method for detecting fraudulent transactions according to an embodiment of the present invention;
fig. 5 is a schematic flow chart illustrating a process of acquiring timing characteristic information in a method for detecting a fraudulent transaction according to an embodiment of the present invention;
fig. 6 is a schematic diagram illustrating a specific process of acquiring timing characteristic information in a method for detecting a fraudulent transaction according to an embodiment of the present invention;
fig. 7 is a schematic flow chart illustrating the data set acquisition for establishing at least one transaction detection model in the method for detecting fraudulent transactions according to the embodiment of the present invention;
fig. 8 is a schematic flow chart of a specific data set process for establishing at least one transaction detection model in the method for detecting fraudulent transactions according to the embodiment of the present invention;
fig. 9 is a schematic flowchart illustrating an effect detection of at least one transaction detection model in a method for detecting a fraudulent transaction according to an embodiment of the present invention;
fig. 10 is a schematic flow chart illustrating data set partitioning for establishing at least one transaction detection model in a method for detecting fraudulent transactions according to an embodiment of the present invention;
FIG. 11 is a schematic flow chart illustrating optimization of at least one transaction detection model in a method for detecting fraudulent transactions according to an embodiment of the present invention;
fig. 12 is a schematic flow chart illustrating fusion of at least one transaction detection model in a method for detecting fraudulent transactions according to an embodiment of the present invention;
fig. 13 is a schematic structural diagram of a device for detecting fraudulent transactions according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The method for detecting the fraudulent transaction provided by the embodiment of the invention can be applied to various scenes.
For example, at least one transaction detection model in the fraud transaction detection method may be a fraud detection scoring model for determining whether a transaction to be detected is a fraud transaction performed by disguising an original bank card, and may be applied to a fraud detection scoring model system for a counterfeit card.
Specifically, as shown in fig. 1, the system is an online architecture of a pseudo card fraud detection scoring model system, a transaction to be detected enters a bank card real-time transfer clearing system through a point of sale (POS) or an Automated Teller Machine (ATM), the real-time transfer clearing system initiates a scoring request to the pseudo card fraud detection scoring model system, and a scoring result is sent to a bank along with a transaction message for decision making.
For example, at least one transaction detection model in the fraud transaction detection method may be a counterfeit card binding detection scoring model, is used to determine whether a transaction to be detected is a fraud transaction in which a device of a counterfeit user makes a card binding request to an original bank card, and may be applied to a counterfeit card binding detection scoring model system.
Specifically, as shown in fig. 2, for an online architecture of a counterfeit card binding detection scoring model system, a cardholder initiates Near Field Communication (NFC) card binding transaction through a mobile phone or other devices, enters a union pay NFC card binding system, sends card binding transaction information to a wind control system, calls scoring by the wind control system, and determines whether the card is a counterfeit card binding according to a scoring result.
For example, at least one transaction detection model in the fraud transaction detection method may be a counterfeit card binding detection scoring model, is used to determine whether a transaction to be detected is a fraud transaction in which a device of a counterfeit user makes a card binding request to an original bank card, and may be applied to a counterfeit card binding detection scoring model system.
A real-time scoring system is deployed in a bank card transaction switching network to realize real-time anti-fraud quantitative calculation before bank card swiping transaction authorization:
for a transaction of a bank card across banks, when transaction request information initiated by an acquiring bank passes through a transaction switching network of a bank card organization (such as a Unionpay), information in the transaction and historical information related to the transaction are extracted and analyzed in real time by using a computer automatic program, quantitative scoring of the transaction is realized by combining a specific intelligent model, the scoring information is attached to the transaction information in real time and is sent to a card issuing bank, and anti-fraud decision-making action is carried out by the card issuing bank according to the scoring information.
The bank card switching clearing network has multidimensional mass transaction flow, comprises historical transaction information of cards, merchants and equipment dimensions, and compared with a bank card sending end and an institution receiving end, the switching system has abundant data of cross-bank and cross-acquiring institution transactions of full-quantity cards and full-quantity merchants, so that more comprehensive information and transaction risks in the whole network are mastered, and a bank card fraud real-time detection scoring model is deployed at a bank card transaction switching position to have a better practical effect and more effective than a single wind control strategy of a traditional card sending bank or an acquiring institution.
The invention can realize real-time detection of fraudulent transactions of bank cards, and detect fraudulent transactions through real-time scoring when the transactions occur. Compared with the existing batch analysis early warning method after the transaction occurs, the method has stronger real-time property, can directly intervene in the transaction process and intercept the fraudulent transaction in real time, directly refuses the successful occurrence of the fraudulent transaction by the intervention mode in the event, and avoids causing economic loss.
Obviously, the above-described embodiment has the following advantages:
based on the position of the card organization transfer hub, the invention creates multi-dimensional historical transaction information of card numbers, merchants and equipment, deeply digs the characteristics of fraudulent transactions, and constructs a machine learning model to detect and identify the fraudulent transactions of bank cards. Compared with the existing simple expert rule combination, the scoring model can count characteristics of more dimensions, longer time and more complete transaction, can accurately depict fraud characteristics, has generalization capability, and can better cope with continuously renewed and complex and various fraud methods in the current transaction network.
Fig. 3 is a flow chart illustrating a step of a method for detecting a fraudulent transaction according to an embodiment of the present invention.
Step 301: and acquiring the transaction to be detected.
Step 302: and determining the target time sequence characteristic information of the transaction to be detected.
Step 303: and inputting the transaction to be detected and the target time sequence characteristic information into at least one transaction detection model to obtain at least one model scoring result of the transaction to be detected.
Aiming at any transaction detection model in the at least one transaction detection model, the transaction detection model is obtained by training an initial model corresponding to the transaction detection model according to a machine learning algorithm corresponding to the transaction detection model based on a data set corresponding to the transaction detection model; any piece of training data in the data set corresponding to the transaction detection model comprises transactions, timing characteristic information of the transactions and transaction labels.
Step 304: and determining whether the transaction to be detected is a fraudulent transaction or not according to the at least one model scoring result.
It should be noted that, in practical applications, the method for detecting a fraudulent transaction provided by the embodiment of the present invention may be cooperatively worked by a plurality of modules in an application system. Fig. 4 is a schematic diagram of the deployment of at least one transaction detection model.
Specifically, the first to the third represent information transmission steps arranged according to a time sequence, the information transmission steps are main links and are synchronous interfaces, and information transmission between a communication module, a real-time cache module, a short-time calculation module and a long-time calculation module and the main links are asynchronous.
In actual production, offline feature engineering in at least one offline transaction detection training process needs to be implemented as online feature engineering in a production system, and the actual method is to divide a feature processing process into: the method comprises three parts of information related to pen transactions, information related to transactions (short-term characteristics) of a bank card in a short-term history and information related to transactions (long-term characteristics) of the bank card in a long-term history, wherein the short term can mean 1-2 days, and the long term can mean a longer time span such as 3 months or 1 year. The following is a detailed description of one example of a feature:
the characteristic logic is as follows: the area of the exchange is not the area where the card has been exchanged in the past 1 year, and the time difference between the exchange and the last exchange is less than 60 minutes. Firstly, temporarily storing the elements of the current transaction in the feature calculation module, specifically including the area of the current transaction, two elements of the current transaction time, then calling all transaction elements of the card in the past 48 hours in the short-time feature module, forming a list of the transaction areas of the transactions, taking the transaction time of the latest transaction according to the time reverse order, then calling the transaction area list of the card in the past 1 year to T-1(T is a positive integer) days in the long-time feature module, aggregating and removing the transaction area list in the short-time feature module and the transaction area list in the long-time feature module in the real-time cache module to obtain a final list, finally comparing the list with the transaction area in the feature calculation module, and comparing the current transaction time with the current transaction time, and obtaining a final numerical value of the characteristic calculation.
A communication module, a feature calculation module, a short-time feature module, a long-time feature module, a real-time cache module and a grading module are established in a real-time grading system, functions and connection modes of the modules are established through the following methods, and finally, the communication module is connected with an existing transaction switching network to complete the real-time system with intelligent anti-fraud grading.
1) Establishing a communication module, wherein the functions comprise:
a. the system is directly connected with the transaction switching network system through a real-time interface, and is used for responding to the transaction requested by the transaction switching network system in real time and returning the score of the transaction.
b. And sending the transaction data received from the transaction transfer network system to the short-time feature module, and storing the transaction data in a time sliding window mode.
c. And sending the transaction data to the characteristic calculation module, and waiting for the grade of the transaction returned by the characteristic calculation module.
2) Establishing a feature calculation module, wherein the functions of the feature calculation module comprise:
a. and a real-time interface is established with the communication module, and the transaction data sent by the communication module is received and returned to the communication module after being scored.
b. And establishing a real-time interface with a cache module, after transaction data are obtained, extracting corresponding transaction characteristic data and intermediate statistical results from the cache module according to elements (such as card numbers, merchant numbers and the like) in the transaction data, and calculating to obtain the characteristic data finally entering the model after combining the transaction data.
c. And establishing a real-time interface with the scoring module, sending the characteristic data required by the model to the scoring module, and waiting for a scoring operation result of the scoring module.
The characteristic of the entering model is formed by splicing a short-time part and a long-time part, for example, the total transaction number in 90 days of the card is formed by summing the transaction number of the past 1 day of the card (the short-time part) and the transaction number of the T-1 day of the card (the long-time part).
3) The scoring module is established, and the functions of the scoring module comprise:
and receiving the feature data of the feature calculation module, loading a corresponding finalized model file according to a specific mechanism to which the transaction belongs and a transaction scene, calculating to obtain a model prediction probability value, converting the model prediction probability value into an integer score according to linear conversion, and returning the integer score to the feature calculation module.
4) Establishing a short-term computing module, which has the functions of:
and receiving the transaction data sent by the communication module, storing the transaction data in a short time (for example, 48 hours) in a time sliding window mode, merging the transaction data according to the dimension of the bank card number, and sending the merged transaction data to the real-time cache module.
5) Establishing a long-term calculation module which has the functions of:
a. acquiring historical transaction data and risk information data from an offline database with the updating frequency of T-1 day;
b. and calculating the value of the long-term characteristic part according to the data and the long-term characteristic part logic required by the characteristics, and sending the value of the long-term characteristic part to the cache module in a form of timed batch sending every day.
The long-term module of the characteristic calculation mainly aims at the large data volume of bank card historical transactions, merchant historical transactions and the like, the operation process is updated in a daily batch mode, the short-term module mainly aims at the small data volume of the card or merchant transactions and the like in the same day, the operation process is updated in real time, calculation and storage resources can be reasonably distributed through the combination of the long-term module and the short-term module and the current transaction information, and meanwhile, the phenomenon that the characteristics with complex calculation logic in an offline characteristic engineering are excessively discarded is avoided, so that the effect attenuation of the online scoring model relative to the offline scoring model is influenced.
In the implementation process of the real-time calculation scheme of the fraud transaction detection scoring model, the real-time calculation of the long-time historical characteristics of the current transaction is realized through splitting calculation, the real-time calculation method is characterized in that the splitting calculation is carried out on the long-time historical characteristics, the long-time module, the short-time module and the current module are split, the long-time module calculates the transaction characteristics before the occurrence time of the current transaction T-1 day, the short-time module calculates the transaction characteristics of the current transaction day, the two parts of characteristics are calculated and stored in a flash memory before the occurrence of the current transaction, when the current transaction occurs, the parts related to the current transaction are calculated again, and the long-time historical characteristics of the current transaction are obtained through combination of the results calculated by the three modules.
6) Establishing a real-time cache module, wherein the functions of the real-time cache module are as follows:
a. respectively receiving the short-time characteristic part from the short-time module and the long-time characteristic part from the long-time module, and splicing and calculating the short-time characteristic part and the long-time characteristic part to obtain a characteristic result required by the model and a middle statistical result taking the card number and the business number as main keys;
b. and receiving the transaction data of the characteristic calculation module, and returning corresponding transaction characteristic data and intermediate statistical results according to the card number, the merchant number and other elements in the transaction data.
In the process, the millisecond-level real-time calculation of the complex features of the transaction anti-fraud model is realized by a method of splitting firstly and then combining, the retention degree of the offline features is improved, and the consistency precision of the online/offline feature engineering is ensured at the same time:
in the process of using the scoring model to detect the bank card fraud transaction, contradiction between model effect and calculation time exists, the model effect can be improved by using the characteristics of the card and the like with over-long dimension and over-complex logic, but the calculation is long in time consumption, and the requirement of real-time detection cannot be met. Therefore, the characteristics can be eliminated or replaced by some characteristics with simple logic, but the invention provides a method for not discarding the characteristics and simultaneously meeting the real-time calculation requirement. In the real-time cache module, when a pen transaction occurs, the part related to the pen transaction is calculated, and a long-time historical characteristic of the pen transaction is obtained by combining the results calculated by the three modules, so that the requirement of real-time calculation is met.
An alternative implementation of step 302 is as follows:
acquiring a plurality of transaction time sequence information related to the transaction to be detected in a time sequence of a plurality of dimensions; and carrying out statistical analysis on the transaction time sequence information to determine the target time sequence characteristic information of the transaction to be detected.
Specifically, the foregoing implementation manner is specifically exemplified as follows:
as shown in fig. 5, for each bank reporting the occurring fraud transactions to the bank card fraud information sharing platform, the bank card organization can use the reported fraud transaction data to master the latest occurring fraud cases, analyze the behavior characteristics of the fraud, and use the behavior characteristics to develop the characteristic variables of the bank card fraud transaction detection scoring model to improve the effect of the model.
For example, in the characteristic variable construction process of the fraud transaction detection scoring model, fraud characteristics shared in a transaction network are established through bank fraud information sharing, and the method is characterized in that a bank fraud information sharing platform is established and used for each bank to report fraud transactions of a card of the bank in time, a switching mechanism timely grasps newly-occurring fraud cases and new fraud scheme characteristics in the network, and establishes characteristics capable of being used for calculating transaction scores of other bank cards.
For example, when a merchant of a transaction has a card of other banks for transaction, if so, how long the time is away from the time of the previous fraudulent transaction, the suspicious degree of the fraud of the merchant transaction is judged, and meanwhile, the number of recent daily average transactions of the merchant is calculated, so that large merchants are excluded, the model is prevented from outputting high scores to a large number of transactions of the large merchants, and the user experience of a cardholder is guaranteed.
The invention provides a method for preventing and controlling transaction fraud risk by industry joint defense, which is beneficial to preventing and controlling the transaction risk of a bank card, reducing the economic loss of an organization or a card holder, purifying a bank card transaction network and improving the user experience of the card holder. Establishing fraud statistical characteristics based on quick shared information through an inter-bank fraud information sharing mechanism so as to improve the identification, prevention and control effects of an anti-fraud scoring model on group-partner fraud:
as the bank card transaction switching is positioned at the position of a hub between banks and a receiving organization and between banks, the invention provides that a fraud information sharing platform between banks is built for each bank to report the fraud transaction of the bank card in time, and the switching organization timely grasps the newly-generated fraud case and the new fraud proposal characteristics in the network and establishes the characteristics which can be used for calculating the transaction scores of other bank cards. For example, when a merchant of a transaction has a card of other banks for transaction, if so, how long the time is away from the time of the previous fraudulent transaction, the suspicious degree of the fraud of the merchant transaction is judged, and meanwhile, the number of recent daily average transactions of the merchant is calculated, so that large merchants are excluded, the model is prevented from outputting high scores to a large number of transactions of the large merchants, and the user experience of a cardholder is guaranteed.
More specifically, fig. 6 shows a feature engineering and feature screening process flow of the fraud detection scoring model.
In the fraud detection and scoring model of the pseudo card, the table 1 with the fraud transaction tag can be used as a main table, the historical transaction data of the card in the auxiliary table is similar to that in the table 1, but the card does not have the fraud tag dimension, and the merchant statistical information is shown in table 5. The primary table and the secondary table are associated by the card number and the merchant number dimensions in table 1 of this embodiment.
TABLE 5 Merchant statistics partial content schematic
The method comprises the steps of generating characteristic variables for bank card transaction information through automatic characteristic engineering and manual service characteristic development, completing characteristic variable screening through correlation coefficients and a machine learning tree model, eliminating the characteristics which are high in correlation among the characteristics and low in correlation with a fraud label, and keeping the characteristics which are high in correlation with the fraud label and important in the machine learning model. The characteristics of the false card fraud detection scoring model after screening are shown in table 6, and the characteristics of the false card binding detection scoring model after screening are shown in table 7.
TABLE 6 content schematic of the feature part of the fraud detection scoring model for pseudo-card
TABLE 7 content representation of feature part of counterfeit money binding card detection scoring model
The correlation between the characteristic variables, or the correlation of the characteristic variables with the fraud tags, is shown in equation (1).
X and Y are 2 different feature variables, or one feature variable, one fraud label, Cov (X, Y) is the covariance of X and Y, Var [ X ] is the variance of X, and Var [ Y ] is the variance of Y.
PSI is an index used to check stability, and can be used to check the stability of features, and its formula is:
where Ai and Ei represent the actual and expected ratios, respectively, when verifying the stability between different time windows, i.e. the ratio of window 1 and the ratio of window 2, i is the number of the features to bin. The smaller PSI represents that the two distributions are closer, namely the stability is better, and 0.1 is generally used as an empirical judgment parameter for good and bad boundary. By evaluating all features for PSI values, and deleting corresponding features above 0.1, stable features can be retained.
In the above optional implementation of step 302, the transaction timing information includes at least one of: the transaction timing sequence information of the transaction to be detected in the geographic position dimension in the first preset time period, the transaction timing sequence information of the transaction to be detected in the merchant dimension in the second preset time period, and the transaction timing sequence information of the transaction to be detected in the card dimension in the third preset time period.
Specifically, the method comprises the following steps:
the method for acquiring the transaction sequence information by using the full amount card of the switching system and the rich data of the cross-bank and cross-acquirer transaction of the full amount commercial tenant comprises the following steps:
1) transaction timing sequence information of the transaction to be detected on the geographic position dimension in a first preset time period is as follows: for example, the method for manufacturing the consistency characteristic of the card when the pen is used with historical transactions comprises the following steps: firstly, extracting the transaction characteristics of the current transaction and the characteristics of the card within a period of time in the past, secondly, carrying out consistency comparison on the transaction characteristics and the card, for example, judging whether the country where the current transaction is located is the country where the card has been transacted historically, whether the country where the current transaction is transacted most frequently in historical transactions and the like, and finally, combining the result of the consistency comparison with other factors such as time difference and the like to form a rule characteristic with certain fraud service interpretability, wherein the rule characteristic is used as a final result characteristic, for example, whether the ratio of the geographic distance between the current transaction location and the previous transaction location to the time difference exceeds a certain threshold value or not.
2) The transaction timing sequence information of the transaction to be detected in the merchant dimension in a second preset time period is as follows: for example, the risk characteristics of the transaction merchant (or country or region) are produced by the following steps: the method comprises the steps of firstly sorting a full amount of historical fraud transaction data sets, secondly carrying out totaling statistics on the data sets according to merchants, namely calculating the earliest and latest time, the number of fraud transactions, the total number of transactions and other elements of each merchant, wherein the time, the number of fraud transactions and the total number of transactions are found as embezzled or recorded as merchant, and finally merging the elements with transaction information according to the principle that the occurrence time is not earlier than the current transaction time to form expert rule characteristics, for example, whether the merchant of the current transaction has fraud embezzlement or not recently.
3) The transaction timing sequence information of the transaction to be detected on the card dimension in the third preset time period is as follows: for example, the card history summary statistics mainly include a series of summary statistics such as total transaction amount, number of merchants, etc. of the card in a past period of time, so as to distinguish the usage status and cardholder characteristics of different bank cards.
In step 303, it should be noted that, the data set for establishing at least one transaction detection model is obtained as shown in fig. 7. Fig. 7 shows a basic flow of data set acquisition to establish at least one transaction detection model. The normal transaction data can be subjected to down-sampling and complete confirmation, namely a fraud transaction sample is combined with training of at least one preliminary transaction detection model, the transaction data (namely the transaction data of suspected fraud transactions) with full response failure is scored by using at least one transaction detection model, the transactions exceeding a scoring threshold value are judged to be fraud transactions and the transactions confirming fraud are combined into an expanded fraud transaction sample, then the part of the data set subjected to down-sampling again is removed from the data set of the full transaction detection model to obtain a data set of normal transaction data, and a bank card fraud transaction detection scoring model is retrained and verified.
In the actual bank card transaction, the number of fraud transactions is small, and the fraud rate is about ten thousandth, so that the problem of extreme unbalance of positive and negative samples is faced in the establishment process of the bank card transaction detection scoring model, and great difficulty is brought to the training and stability of the model. In practice, some fraudulent transactions fail to respond due to password errors, insufficient balance or interception of a wind control strategy, no economic loss occurs, and a cardholder cannot sense the fraudulent transactions and cannot confirm the fraudulent transactions, so that the transaction which fails to respond necessarily contains a part of the fraudulent transactions. The invention provides that the known complete fraud sample and the downsampled normal transaction sample are used for carrying out primary model training, then the obtained primary model is used for grading the normal transaction, the transaction in which the grading output is high and the actual response fails is divided into the fraud transaction sample, so that the fraud sample data is expanded, the expanded fraud sample is used for retraining the bank card fraud transaction detection grading model, the model effect is improved, and the generalization capability is enhanced.
In an optional embodiment, the transaction detection model is applied to any piece of training data in a data set corresponding to the transaction detection model; the training data is obtained as follows:
step (1-1): and acquiring the original data of the training data.
Step (1-2): and if the transaction label is suspected fraud transaction, inputting the original data into an initial model of the transaction detection model to obtain a model scoring result of the original data.
Step (1-3): and modifying the transaction label into normal transaction or fraudulent transaction according to the model scoring result of the original data.
The original data comprises transactions, timing characteristic information of the transactions and transaction labels; the transaction tag is a normal transaction, or a suspected fraudulent transaction, or a fraudulent transaction.
For example, the content of the raw data in step (1-1) is as follows:
the bank card transaction data is acquired, as shown in table 1, the fake card fraud data is shown in table 2, the NFC binding card transaction data is shown in table 3, and the fake binding card data is shown in table 4. The counterfeit card binding detection scoring model method takes the table 3 with the fraudulent transactions as a main table, historical card binding data of the auxiliary table card is similar to that of the table 3, but the auxiliary table card does not have the dimensionality of the fraudulent transactions, and the main table and the auxiliary table are associated through the card, the mobile phone number and the account number dimensionality of the table 3.
Table 1 bank card transaction data portion content schematic
Table 2 pseudo-card fraud data portion content schematic
Table 3 NFC bind-card transaction data part content schematic
Table 4 illustration of contents of data part of counterfeit binding card
Card number | Date of binding card | SEID | …… |
H | 20190821 | ABCD | …… |
I | 20191201 | HXYZ | …… |
……. | …… | …… | …… |
More specifically, the processing procedure of the raw data of the training data may be as shown in fig. 8, and the specific procedure may include the procedures of cleaning, expanding, matching, and the like.
For example, the pseudo card fraud detection scoring model method includes the steps of cleaning pseudo card fraud data, screening out pseudo card fraud sample data through fraud types, judging that all transactions of the card a at the merchant XYZXYZ are pseudo card fraud transactions on the 7 th and 11 th days in 2019 according to business rules when the pseudo card fraud sample is expanded, judging that all balance inquiry transactions of the card a on the merchant XYZXYZ on the 7 th and 11 th days in 2019 are pseudo card fraud transactions, combining transaction dates and transaction times into one field dimension transaction time representing transaction occurrence time, and judging that all offline transactions of the card a within 13 minutes and 14 seconds before and after 12 minutes and 14 seconds on the 7 th and 11 th days in 2019 are pseudo card fraud transactions through the combined transaction time dimension to obtain expanded pseudo card fraud data. And the offline transaction screens out POS, MPOS and ATM transactions through transaction channel dimensions. The card number, transaction date, transaction time and merchant number dimensions in table 1 and table 2 of this embodiment are used to match the pseudo card fraud tag to the bank card transaction data.
According to the counterfeit card binding detection scoring model method, original fraud data of different equipment types are integrated, as shown in table 3, when a counterfeit card binding sample is expanded, for example, a counterfeit card binding occurs on equipment with an SEID (identity) being ABCD (abnormal identity) in 8 and 21 days in 2019, all card binding transactions of the card H in 8 and 21 days in 2019 are judged to be counterfeit card binding, all card binding transactions of the card H on the ABCD equipment are counterfeit card binding, and card binding transactions of other card numbers in 21 days in 8 and 8 days in 2019 of the ABCD equipment are counterfeit card binding, so that the expanded counterfeit card binding data are obtained. The counterfeit card binding tag is matched into the card binding transaction data through the card number, the transaction date and the SEID dimensions of the table 3 and the table 4 in the embodiment.
Specific processes from step (1-2) to step (1-3) can be exemplified as follows:
and for false card fraud detection, eliminating online transactions in fraud samples, and screening false card fraud transactions according to offline transaction conditions. Further utilizing the concurrent running transaction of the fraud cards to expand fraud samples, wherein the expanded main rules comprise all transactions of the same merchant within the same day when the fraud cards generate false card fraud, all balance inquiry transactions within the same day when the fraud cards generate false card fraud, and transactions between the fraud cards and 30 minutes before and after the false card fraud, and the fraud cards are judged to be false card fraud transactions according to any one of the rules and the offline transaction conditions; for counterfeit card binding detection, firstly, sample data of counterfeit card binding is obtained, expansion is carried out according to the sample data, the main rule of the expansion comprises all card binding transactions of the card on the day when the card is subjected to counterfeit card binding, all card binding transactions of the card and equipment when the card is subjected to counterfeit card binding, and all card binding transactions of the equipment on the day when the card is subjected to counterfeit card binding, and the counterfeit card binding transactions are judged if any one of the rules is met; the other method is that the known complete fraud sample and the down-sampled normal transaction sample are used for carrying out primary model training, then the obtained primary model is used for grading the normal transaction, the grade is output to be a high grade (larger than a set value), and the transaction (suspected fraud transaction, the response failure can be specifically defined according to the scene, such as password input error) which is failed in the actual response is divided into the fraud transaction sample, so that the fraud sample data is expanded.
In step 303, when at least one transaction detection model is used, the at least one transaction detection model may be deployed in the transaction link to detect fraudulent transactions in real time, and there may be two following implementation manners: one is that at least one transaction detection model deployed is directly called by a bank card transaction transfer system, and a scoring result is transmitted to a bank in a transfer message or system butt joint mode, so that the bank can timely intercept the transaction to be detected which is judged to be fraudulent transaction; and the other method is that the real-time wind control system of the bank card transaction switching system calls at least one transaction detection model, the scoring result is used for carrying out real-time intervention on the transaction to be detected, and the transaction to be detected which is judged to be fraudulent is intercepted in time.
Further, as shown in fig. 9, a score and effect check method flow is output for the transaction detection model.
In the embodiment of the invention, each 100 points are set as the score threshold, and when the output score is greater than the score threshold, the transaction is judged to be fraudulent, and the statistical test of the model effect is shown in table 8.
TABLE 8 fraud detection scoring model effectiveness statistical test schematic
The accuracy, coverage and amount of coverage of the fraud detection model are calculated according to the following equations (2) - (4).
The accuracy rate is the number of hit strokes/the number of alarm strokes; (2)
coverage rate is the number of hits/the number of fraudulent strokes; (3)
the coverage amount is the sum of the transaction amounts of the hit fraudulent transactions; (4)
in an alternative embodiment, the transaction detection model is obtained by:
step (2-1): and dividing a data set corresponding to the transaction detection model into a plurality of sub data sets.
Step (2-2): and aiming at any sub data set of the multiple sub data sets, dividing the sub data sets into a training set of the sub data sets, a verification set of the sub data sets and an extrapolation test set of the sub data sets according to the time sequence characteristics of data in the sub data sets.
FIG. 10 shows a specific method and flow of sample division in model training and validation: after the structured feature data is determined, the data set is divided into multiple parts according to data characteristics (such as specific organizations, transaction scenes (such as division into internal and external), and the like), each part of the data set is divided into a temporary set a and an extrapolation test set according to time sequence characteristics (such as a time window), the time window of the extrapolation test set can be set to be close to the set a for about 2-4 months, and the set a is divided into a training set (a training set of sub data sets) and a verification set (a verification set of the sub data sets) according to a random sampling principle, wherein the sampling ratio can be 7:3, for example.
Step (2-3): and training an intermediate model of the transaction detection model according to a machine learning algorithm corresponding to the transaction detection model based on the training set of the sub data set and the verification set of the sub data set.
The intermediate model is the initial model or a model obtained by training according to the initial model and the subdata set.
Step (2-4): if the model after the intermediate model training does not meet the preset convergence condition of the subdata set, updating the intermediate model; and otherwise, taking the intermediate model at the moment as the transaction detection model.
In the above step (2-1) to step (2-5), it may be determined whether the intermediate model satisfies the preset convergence condition of the sub data set in the following manner:
verifying the model after the intermediate model is trained according to a K-S verification method based on the verification set of the sub data set and the extrapolation test set of the sub data set to obtain a K-S verification result of the intermediate model; and/or based on the verification set of the sub data set and the extrapolation test set of the sub data set, verifying the model after the middle model training according to an AUC verification method, and obtaining an AUC verification result of the middle model;
and determining whether the intermediate model meets the preset convergence condition of the subdata set or not according to the K-S verification result and/or the AUC verification result of the intermediate model.
Specifically, as shown in fig. 11, a model training method is selected first, and considering that the transaction detection model generally belongs to a binary problem in terms of service, a mainstream model training method such as an integrated tree model of Logistic Regression (LR), Support Vector Machine (SVM), GBDT, XGBoost, and the like is mainly selected, and then under the model training method, models can be trained one by one in a grid search mode under the condition that a hyper-parameter space is set, and index verification is performed on a verification set. The index verification adopts a K-S verification method and an AUC verification method, and the formulas are respectively formula (3) and formula (4):
KS=max(|G(range)-B(range)|) (3)
wherein G (range) represents the accumulated sample proportion of each segmented range after the samples are sorted according to the model predicted values, B (range) represents the accumulated bad sample proportion of each segmented range after the samples are sorted according to the model predicted values, and K-S is the maximum value of the accumulated difference. In the experimental process, the K-S index is found to be between 0.3 and 0.4 to indicate that the model has a certain discrimination, between 0.4 and 0.5 to indicate that the model has a better discrimination, and above 0.5, the K-S index has a good discrimination.
AUC=area_under(ROC) (4)
Namely, AUC represents the area of the lower part of the central line of the ROC curve, the ROC curve is a TPR-FPR two-dimensional graph obtained after traversing all the segmentation threshold values according to the model predicted value, TPR represents the coverage rate, and FPR represents the false alarm rate. The AUC is 0.7-0.85, which shows that the model effect is good, and the AUC is more than 0.85, which shows that the model effect is good.
And (3) performing index verification on the training set and the verification set respectively, and if the indexes of the training set are better and the effect of the verification set is obviously worse than that of the training set (for example, the K-S difference between the training set and the verification set is more than 0.05), supposing that the model is over-fitted. At this time, the model complexity is first reduced (e.g., the tree depth of the tree model is reduced, the minimum sample number of leaf nodes is increased, etc.) by means of model hyper-parameter adjustment, and the model is retrained and observed. If the overfitting still occurs, the overfitting is considered to be caused by the reason of feature engineering, and at the moment, the overall complexity of the model needs to be reduced by reducing the features and removing the features with poor business interpretability or instability so as to eliminate the overfitting. If the model has a good effect on the training and verification set but has a serious effect on the extrapolation test set, the characteristic instability of the important features in the model along with the time change is considered, at this time, the PSI stability of the features in different time windows and the change of the correlation relationship between the features and the target variables between the training/verification set and the extrapolation test set need to be observed, and a part of adverse factors of the extrapolation effect decline can be eliminated by deleting the features which are unstable along with the time.
It should be noted that, one implementation of step 304 may be as follows:
obtaining a comprehensive model scoring result of the transaction to be detected according to the at least one model scoring result and a weighted average method; or inputting the at least one model scoring result into a high-level nested model to obtain a comprehensive model scoring result of the transaction to be detected; the high-level nested model is obtained by training according to a machine learning algorithm based on a model scoring result obtained when a data set corresponding to the at least one transaction detection model is trained; and determining whether the transaction to be detected is a fraud transaction according to the comprehensive model scoring result.
For example, FIG. 12 illustrates a method and flow of model fusion. The transaction detection models can be obtained by using different model training methods, the model scoring results corresponding to the transaction detection models are analyzed, and the transaction detection models with different differences, which are obtained after the transaction detection models with poor effects or good effects but high similarity with other transaction detection models, are deleted, can be used as at least one transaction detection model according to preset model screening rules.
At this time, model fusion may be performed by using a weighted average method (see table 9 for a specific example), or a high-level nested model may be retrained by using a predicted model value as an input, and a predicted value of the high-level nested model is used as a final result (see table 10 for a specific example).
TABLE 9 weighted averaging results in the fusion model
TABLE 10 high level nested model results from the fusion model
In the using process, at least one transaction detection model is used, the at least one model scoring result is integrated to obtain a final integrated model scoring result, and whether the transaction to be detected is a fraud transaction is determined according to the integrated model scoring result.
After the two types of transaction detection model fusion methods and the effect and stability between the two types of transaction detection models and the single model method are comprehensively compared, the transaction detection model with the optimal comprehensive evaluation index is used as the transaction detection model for final finalization, and only the transaction detection model is used for judging fraud transactions.
As shown in fig. 13, the present invention provides a fraudulent transaction detection device, comprising:
an obtaining module 1301, configured to obtain a transaction to be detected;
a processing module 1302, configured to determine target timing characteristic information of the transaction to be detected;
the determining module 1303 is configured to input the transaction to be detected and the target time sequence feature information into at least one transaction detection model, and obtain at least one model scoring result of the transaction to be detected; aiming at any transaction detection model in the at least one transaction detection model, the transaction detection model is obtained by training an initial model corresponding to the transaction detection model according to a machine learning algorithm corresponding to the transaction detection model based on a data set corresponding to the transaction detection model; any piece of training data in the data set corresponding to the transaction detection model comprises a transaction, timing characteristic information of the transaction and a transaction label; and the system is used for determining whether the transaction to be detected is a fraudulent transaction or not according to the at least one model scoring result.
Optionally, any piece of training data in the data set corresponding to the transaction detection model is selected; the apparatus further comprises an establishing module 1301 configured to:
the training data were obtained as follows:
acquiring original data of the training data; the original data comprises transactions, timing characteristic information of the transactions and transaction labels; the transaction tag is a normal transaction, or a suspected fraud transaction, or a fraud transaction; if the transaction label is suspected fraud transaction, inputting the original data into an initial model of the transaction detection model to obtain a model scoring result of the original data; and modifying the transaction label into normal transaction or fraudulent transaction according to the model scoring result of the original data.
Optionally, the processing module 1302 is specifically configured to:
acquiring a plurality of transaction time sequence information related to the transaction to be detected in a time sequence of a plurality of dimensions;
and carrying out statistical analysis on the transaction time sequence information to determine the target time sequence characteristic information of the transaction to be detected.
Optionally, the plurality of transaction timing information includes at least one of: the transaction timing sequence information of the transaction to be detected in the geographic position dimension in the first preset time period, the transaction timing sequence information of the transaction to be detected in the merchant dimension in the second preset time period, and the transaction timing sequence information of the transaction to be detected in the card dimension in the third preset time period.
Optionally, the apparatus further includes an establishing module, and the establishing module 1303 is specifically configured to:
the transaction detection model is obtained as follows:
dividing a data set corresponding to the transaction detection model into a plurality of sub data sets;
for any sub data set of the multiple sub data sets, dividing the sub data sets into a training set of the sub data sets, a verification set of the sub data sets and an extrapolation test set of the sub data sets according to the time sequence characteristics of data in the sub data sets;
training an intermediate model of the transaction detection model according to a machine learning algorithm corresponding to the transaction detection model based on the training set of the sub data set and the verification set of the sub data set; the intermediate model is the initial model or a model obtained by training according to the initial model and the subdata set;
if the model after the intermediate model training does not meet the preset convergence condition of the subdata set, updating the intermediate model; and otherwise, taking the intermediate model at the moment as the transaction detection model.
Optionally, the establishing module 1303 is specifically configured to:
determining whether the intermediate model satisfies a preset convergence condition of the sub data set according to the following manner:
verifying the model after the intermediate model training according to a K-S verification method based on the verification set of the sub data set and the extrapolation test set of the sub data set to obtain a K-S verification result of the intermediate model; and/or based on the verification set of the subdata set and the extrapolation test set of the subdata set, verifying the model after the middle model is trained according to an AUC verification method, and obtaining an AUC verification result of the middle model;
and determining whether the intermediate model meets the preset convergence condition of the subdata set or not according to the K-S verification result and/or the AUC verification result of the intermediate model.
Optionally, the determining module 1303 is specifically configured to:
obtaining a comprehensive model scoring result of the transaction to be detected according to the at least one model scoring result and a weighted average method; or,
inputting the at least one model scoring result into a high-level nested model to obtain a comprehensive model scoring result of the transaction to be detected; the high-level nested model is obtained by training according to a machine learning algorithm based on a model scoring result obtained when a data set corresponding to the at least one transaction detection model is trained;
and determining whether the transaction to be detected is a fraud transaction according to the comprehensive model scoring result.
Based on the same inventive concept, embodiments of the present invention also provide a computer device, which includes a program or instructions, and when the program or instructions are executed, the method for detecting fraudulent transactions and any optional method provided by the embodiments of the present invention are executed.
Based on the same inventive concept, embodiments of the present invention also provide a computer-readable storage medium, which includes a program or instructions, and when the program or instructions are executed, the method for detecting fraudulent transactions and any optional method provided by the embodiments of the present invention are executed.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
Claims (10)
1. A method of detecting fraudulent transactions, comprising:
acquiring a transaction to be detected;
determining target time sequence characteristic information of the transaction to be detected;
inputting the transaction to be detected and the target time sequence characteristic information into at least one transaction detection model to obtain at least one model scoring result of the transaction to be detected; aiming at any transaction detection model in the at least one transaction detection model, the transaction detection model is obtained by training an initial model corresponding to the transaction detection model according to a machine learning algorithm corresponding to the transaction detection model based on a data set corresponding to the transaction detection model; any piece of training data in the data set corresponding to the transaction detection model comprises a transaction, timing characteristic information of the transaction and a transaction label;
and determining whether the transaction to be detected is a fraudulent transaction or not according to the at least one model scoring result.
2. The method of claim 1, wherein any piece of training data in the data set for the transaction detection model corresponds; the training data is obtained as follows:
acquiring original data of the training data; the original data comprises transactions, timing characteristic information of the transactions and transaction labels; the transaction tag is a normal transaction, or a suspected fraud transaction, or a fraud transaction;
if the transaction label is suspected fraud transaction, inputting the original data into an initial model of the transaction detection model to obtain a model scoring result of the original data;
and modifying the transaction label into normal transaction or fraudulent transaction according to the model scoring result of the original data.
3. The method of claim 1, wherein the determining the target timing characteristic information of the transaction to be detected comprises:
acquiring a plurality of transaction time sequence information related to the transaction to be detected in a time sequence of a plurality of dimensions;
and carrying out statistical analysis on the transaction time sequence information to determine the target time sequence characteristic information of the transaction to be detected.
4. The method of claim 3, wherein the plurality of transaction timing information comprises at least one of: the transaction timing sequence information of the transaction to be detected in the geographic position dimension in the first preset time period, the transaction timing sequence information of the transaction to be detected in the merchant dimension in the second preset time period, and the transaction timing sequence information of the transaction to be detected in the card dimension in the third preset time period.
5. The method according to any of the claims 1 to 4, characterized in that the transaction detection model is obtained in particular in the following way:
dividing a data set corresponding to the transaction detection model into a plurality of sub data sets;
for any sub data set of the multiple sub data sets, dividing the sub data sets into a training set of the sub data sets, a verification set of the sub data sets and an extrapolation test set of the sub data sets according to the time sequence characteristics of data in the sub data sets;
training an intermediate model of the transaction detection model according to a machine learning algorithm corresponding to the transaction detection model based on the training set of the sub data set and the verification set of the sub data set; the intermediate model is the initial model or a model obtained by training according to the initial model and the subdata set;
if the model after the intermediate model training does not meet the preset convergence condition of the subdata set, updating the intermediate model; and otherwise, taking the intermediate model at the moment as the transaction detection model.
6. The method of claim 5, wherein determining whether the intermediate model satisfies a preset convergence condition for the sub data set is performed by:
verifying the model after the intermediate model is trained according to a K-S verification method based on the verification set of the sub data set and the extrapolation test set of the sub data set to obtain a K-S verification result of the intermediate model; and/or based on the verification set of the sub data set and the extrapolation test set of the sub data set, verifying the model after the middle model training according to an AUC verification method, and obtaining an AUC verification result of the middle model;
and determining whether the intermediate model meets the preset convergence condition of the subdata set or not according to the K-S verification result and/or the AUC verification result of the intermediate model.
7. The method according to any one of claims 1 to 4, wherein said determining whether the transaction to be detected is a fraudulent transaction based on the result of said at least one model score comprises:
obtaining a comprehensive model scoring result of the transaction to be detected according to the at least one model scoring result and a weighted average method; or,
inputting the at least one model scoring result into a high-level nested model to obtain a comprehensive model scoring result of the transaction to be detected; the high-level nested model is obtained by training according to a machine learning algorithm based on a model scoring result obtained when a data set corresponding to the at least one transaction detection model is trained;
and determining whether the transaction to be detected is a fraud transaction according to the comprehensive model scoring result.
8. A device for detecting fraudulent transactions, comprising:
the acquisition module is used for acquiring the transaction to be detected;
the processing module is used for determining the target time sequence characteristic information of the transaction to be detected;
the determining module is used for inputting the transaction to be detected and the target time sequence characteristic information into at least one transaction detection model to obtain at least one model scoring result of the transaction to be detected; aiming at any transaction detection model in the at least one transaction detection model, the transaction detection model is obtained by training an initial model corresponding to the transaction detection model according to a machine learning algorithm corresponding to the transaction detection model based on a data set corresponding to the transaction detection model; any piece of training data in the data set corresponding to the transaction detection model comprises a transaction, timing characteristic information of the transaction and a transaction label; and the system is used for determining whether the transaction to be detected is a fraudulent transaction or not according to the at least one model scoring result.
9. A computer device comprising a program or instructions that, when executed, perform the method of any of claims 1 to 7.
10. A computer-readable storage medium comprising a program or instructions which, when executed, perform the method of any of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110236023.2A CN112967053A (en) | 2021-03-03 | 2021-03-03 | Method and device for detecting fraudulent transactions |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110236023.2A CN112967053A (en) | 2021-03-03 | 2021-03-03 | Method and device for detecting fraudulent transactions |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112967053A true CN112967053A (en) | 2021-06-15 |
Family
ID=76276323
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110236023.2A Pending CN112967053A (en) | 2021-03-03 | 2021-03-03 | Method and device for detecting fraudulent transactions |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112967053A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113344469A (en) * | 2021-08-02 | 2021-09-03 | 成都新希望金融信息有限公司 | Fraud identification method and device, computer equipment and storage medium |
CN113506109A (en) * | 2021-07-27 | 2021-10-15 | 中国工商银行股份有限公司 | Fraud transaction identification method and device |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105931051A (en) * | 2015-12-31 | 2016-09-07 | 中国银联股份有限公司 | Safety payment method and apparatus |
CN106447333A (en) * | 2016-11-29 | 2017-02-22 | 中国银联股份有限公司 | Fraudulent trading detection method and server |
CN106529960A (en) * | 2016-11-07 | 2017-03-22 | 中国银联股份有限公司 | Fraud transaction detection method for electronic transaction |
CN110009174A (en) * | 2018-12-13 | 2019-07-12 | 阿里巴巴集团控股有限公司 | Risk identification model training method, device and server |
CN110298663A (en) * | 2018-03-22 | 2019-10-01 | 中国银联股份有限公司 | Based on the wide fraudulent trading detection method learnt deeply of sequence |
CN110322349A (en) * | 2019-06-25 | 2019-10-11 | 阿里巴巴集团控股有限公司 | A kind of processing method of data, device and equipment |
-
2021
- 2021-03-03 CN CN202110236023.2A patent/CN112967053A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105931051A (en) * | 2015-12-31 | 2016-09-07 | 中国银联股份有限公司 | Safety payment method and apparatus |
CN106529960A (en) * | 2016-11-07 | 2017-03-22 | 中国银联股份有限公司 | Fraud transaction detection method for electronic transaction |
CN106447333A (en) * | 2016-11-29 | 2017-02-22 | 中国银联股份有限公司 | Fraudulent trading detection method and server |
CN110298663A (en) * | 2018-03-22 | 2019-10-01 | 中国银联股份有限公司 | Based on the wide fraudulent trading detection method learnt deeply of sequence |
CN110009174A (en) * | 2018-12-13 | 2019-07-12 | 阿里巴巴集团控股有限公司 | Risk identification model training method, device and server |
CN110322349A (en) * | 2019-06-25 | 2019-10-11 | 阿里巴巴集团控股有限公司 | A kind of processing method of data, device and equipment |
Non-Patent Citations (1)
Title |
---|
张嗣瀛,王福利主编: "《外汇汇率与国际原油价格波动预测 TEI@I方法论》", 31 May 2019, 湖南大学出版社, pages: 224 - 225 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113506109A (en) * | 2021-07-27 | 2021-10-15 | 中国工商银行股份有限公司 | Fraud transaction identification method and device |
CN113344469A (en) * | 2021-08-02 | 2021-09-03 | 成都新希望金融信息有限公司 | Fraud identification method and device, computer equipment and storage medium |
CN113344469B (en) * | 2021-08-02 | 2021-11-30 | 成都新希望金融信息有限公司 | Fraud identification method and device, computer equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Nami et al. | Cost-sensitive payment card fraud detection based on dynamic random forest and k-nearest neighbors | |
CN110334737B (en) | Customer risk index screening method and system based on random forest | |
Bahnsen et al. | Cost sensitive credit card fraud detection using Bayes minimum risk | |
Bahnsen et al. | Improving credit card fraud detection with calibrated probabilities | |
CN113011973B (en) | Method and equipment for financial transaction supervision model based on intelligent contract data lake | |
US11562372B2 (en) | Probabilistic feature engineering technique for anomaly detection | |
CN112417176B (en) | Method, equipment and medium for mining implicit association relation between enterprises based on graph characteristics | |
CN112767136A (en) | Credit anti-fraud identification method, credit anti-fraud identification device, credit anti-fraud identification equipment and credit anti-fraud identification medium based on big data | |
CN110414780A (en) | A kind of financial transaction negative sample generation method based on generation confrontation network | |
CN110020868B (en) | Anti-fraud module decision fusion method based on online transaction characteristics | |
CN112967053A (en) | Method and device for detecting fraudulent transactions | |
CN117010914A (en) | Identification method and device for risk group, computer equipment and storage medium | |
Wang et al. | Credit Card Fraud Detection using Logistic Regression | |
Ahmed et al. | A Survey on Detection of Fraudulent Credit Card Transactions Using Machine Learning Algorithms | |
CN113538126A (en) | Fraud risk prediction method and device based on GCN | |
Dash et al. | Developing AI-based Fraud Detection Systems for Banking and Finance | |
Zhou | Loan Default Prediction Based on Machine Learning Methods | |
CN115496364A (en) | Method and device for identifying heterogeneous enterprises, storage medium and electronic equipment | |
CN113919937A (en) | KS monitoring system based on loan assessment wind control | |
Sandica et al. | Implications of macroeconomic conditions on Romanian portfolio credit risk. A cost-sensitive ensemble learning methods comparison | |
CN112463893A (en) | Intelligent analysis system and method for network fund | |
Makatjane et al. | Detecting Financial Fraud in South Africa: A Comparison of Logistic Model Tree and Gradient Boosting Decision Tree | |
Akinwamide et al. | ‘Prediction of fraudulent or genuine transactions on credit card fraud detection dataset using machine learning techniques | |
Chen et al. | Research and Implementation of Bank Credit Card Fraud Detection System Based on Reinforcement Learning and LSTM | |
Roijmans | Macroeconomic factors in loan default prediction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |