WO2019194679A1

WO2019194679A1 - Systems and methods for detecting fraudulent transactions

Info

Publication number: WO2019194679A1
Application number: PCT/NL2019/050208
Authority: WO
Inventors: Sebastiaan Arnoldus Bernardus VAN SCHRIEK
Original assignee: ABN AMRO Bank N .V.
Priority date: 2018-04-06
Filing date: 2019-04-08
Publication date: 2019-10-10
Also published as: NL2020729B1

Abstract

A computer-implemented method for detecting a fraudulent transaction is disclosed. The method comprises a number of steps. One step comprises storing a plurality of transaction records. Each transaction record comprises at least two variables, a first variable indicating whether or not the transaction record relates to a fraudulent transaction and a second variable, such as an amount of the transaction. Another step of the method comprises, on the basis of transaction records in the supervised data set, determining a transformation function for transforming the second variable into a transformed variable, wherein the transformed second variable has a substantially monotonic relation, e.g. a substantially monotonically increasing relation, with the probability of fraud. Another step comprises determining a relation between the transformed variable and a probability of fraud, wherein determining the relation between the transformed second variable and the probability of fraud comprises performing a logistic regression analysis for determining said relation. Another step comprises receiving a particular transaction record comprising a value for the second variable and transforming the value into a transformed value in accordance with the determined transformation. Another step of the method comprises determining a probability of fraud indicating a probability that the received transaction record relates to a fraudulent transaction based on the transformed value and based on the determined relation.

Description

Systems and methods for detecting fraudulent transactions

FIELD OF THE INVENTION

This disclosure relates to systems and methods for detecting fraudulent transactions. In particular to methods comprising determining a transformation function for transforming a variable of a transaction record into a transformed variable and determining a relation between the transformed variable and a probability of fraud. This disclosure in particular relates to systems configured to execute such methods.

BACKGROUND

A bank typically processes thousands of transactions per day. Unfortunately, a small percentage of transactions are fraudulent. An example of a fraudulent transaction would be one that is ordered by a criminal who has unlawfully obtained authorization information, such as a PIN number, from a customer of the bank. In a particular example, the criminal has performed a support service scam as follows. First the criminal cold-calls the customer pretending to work for a software technical support service. The criminal then attempts to get the customer to allow remote access to his computer and after access has been gained, the criminal tries to gain the victim's trust to pay for the supposed "support" services, which enables the criminal to see credit card information and/or login information for internet banking. With this information, the criminal can transfer funds from the victim’s account to his own account or use the victim’s funds to make online payments to purchase from online merchants.

US 2015/0379426 A1 discloses a decision tree based model that may be employed to detect fraudulent transactions. In each node of a decision tree a condition relating to one or more variables is tested and depending on the outcome the model flows to either one of two sub-nodes, in which another condition is tested.

A disadvantage of such a decision tree based model is that the number of nodes, and thus the number of conditions that the model needs to be able to test, can easily become very large. Further, when a decision tree model is used to assess a transaction record, many of the preprogrammed conditions, namely the conditions that sit in non-selected branches of the tree, will not be tested. The construction of a decision tree model is thus inefficient as conditions may prove to be purposelessly determined, programmed and stored in a data storage. Also, determining which conditions to test in which node is not straightforward. Furthermore, as known in the art, the predictive accuracy of decision tree models is limited. In decision trees it is for example not possible to assign a weight to a variable, which limits the predictive power of decision tree models.

Therefore, there is a need in the art for improved methods and systems for detecting fraudulent transactions that alleviate at least some of the above described drawbacks.

SUMMARY

To that end, a computer-implemented method for detecting a fraudulent transaction is disclosed. The method comprises a number of steps. One step comprises storing a plurality of transaction records. Each transaction record comprises at least two variables, a first variable indicating whether or not the transaction record relates to a fraudulent transaction and a second variable, such as an amount of the transaction. Another step of the method comprises, on the basis of transaction records in the supervised data set, in particular on the basis of the values for the first variable of transaction records in the supervised data set, determining a transformation function for transforming the second variable into a transformed variable. Another step comprises determining a relation between the transformed variable and a probability of fraud. Another step comprises receiving a particular transaction record comprising a particular value for the second variable and transforming the particular value into a transformed particular value in accordance with the determined transformation. Another step of the method comprises

determining a probability of fraud indicating a probability that the received transaction record relates to a fraudulent transaction based on the transformed particular value and based on the determined relation.

The second variable may be a numerical variable or a categorical variable.

Advantageously, the method is not based on a decision tree, which prevents to design numerous conditions that are to be tested in each node of such a decision tree. Instead, the method comprises determining a transformation function for a variable, which transformation function can subsequently be applied to any incoming transaction record that is to be analyzed. In contrast, as explained above, decision tree models typically contain conditions that are not always tested in the assessment of a transaction record.

The transaction records may comprises additional variables that may be involved in the method. In one embodiment, each transaction record comprises a third variable. In this embodiment, the method comprises determining a further transformation function for transforming the third variable into a further transformed variable. Furthermore, in this embodiment, the step of determining the relation between the transformed variable and the probability of fraud is embodied as determining a relation between the probability of fraud and a group of variables comprising the transformed variable and the further transformed variable. This embodiment comprises receiving the particular transaction record comprising a third particular value for the third variable and transforming the third particular value into a further transformed particular value in accordance with the further transformation function. This embodiment also comprises determining a probability that the received transaction record relates to a fraudulent transaction based on the transformed particular value and based on the further transformed particular value and based on the determined relation. Advantageously, this embodiment allows to take a plurality of variables into account in the assessment of the received particular transaction record.

In one embodiment, the transformed second variable has a substantially monotonic relation, e.g. a substantially monotonically increasing relation, with the probability of fraud.

Two variables x and y may be understood to have a monotonic relation if they have a monotonically increasing relation or a monotonically decreasing relation. Two variables may be understood to have a monotonically increasing relation if for the relation, denoted by f, one has f(x)<=f(y) for all x and y such that x<=y. Two variables may be understood to have a monotonically decreasing relation if for the relation, denoted by f, one has f(x)>=f(y) for all x and y such that x<=y.

This embodiment advantageously allows to perform linear regression and/or logistic regression in order to determine a relation between the probability of fraud and the transformed variable. Regression methods are namely based on the assumption that input variable has a monotonic relation with the output variable.

In one embodiment, determining the relation between the transformed second variable and the probability of fraud comprises performing a regression analysis for determining said relation. This embodiment provides an efficient method for determining the relation with high accuracy.

In one embodiment, the regression analysis comprises a logistic regression analysis. This embodiment is advantageous, because logistic regression is particularly suitable for finding a relation between two variables, wherein one variable is binary in the sense that it can only adopt either one of two values. This is the case for the first variable since a transaction record relates to either one of a fraudulent or non-fraudulent transaction. Hence, logistic regression is well-suited to find a relation between the transformed variable and the probability of fraud and therefore enables to detect fraudulent transactions with improved accuracy.

In one embodiment, determining the transformation function comprises determining a preliminary transformation function for transforming the second variable into a preliminary transformed variable, determining a predictive power of the preliminary transformed variable indicating how well transaction records relating to fraudulent transaction can be distinguished from transaction records relating to non- fraudulent transactions on the basis of the preliminary transformed variable and determining that the predictive power of the preliminary transformed variable is equal to or higher than a threshold predictive power, and in response selecting the preliminary transformation function as transformation function. This embodiment enables to only transform variables of a transaction if such a transformed variable is actually useful for predicting whether the record relates to a fraudulent transaction.

If the predictive power is not equal to or higher than the threshold predictive power, then another preliminary transformation function may be determined and assessed for the second variable or the second variable may be discarded in the sense that the second transformed variable is not taken into account when estimating the probability that particular transaction records relate to fraudulent transactions.

One embodiment comprises determining for a plurality of variables, for example for each variable, comprised in transaction records at least one preliminary transformation function and determining for each associated transformed variable a predictive power. Then each predictive power may be compared with a threshold predictive power, which enables to only select the best variables.

In one embodiment, determining the transformation function comprises determining a first preliminary transformation function for transforming the second variable into a first preliminary transformed variable, determining a predictive power of the first preliminary transformed variable indicating how well transaction records relating to fraudulent transaction can be distinguished from transaction records relating to non-fraudulent transactions on the basis of the first preliminary

transformed variable. In this embodiment, the method further comprises determining a second preliminary transformation function for transforming the second variable into a second preliminary transformed variable and determining a predictive power of the second preliminary transformed variable indicating how well transaction records relating to fraudulent transaction can be distinguished from transaction records relating to non-fraudulent transactions on the basis of the second preliminary transformed variable. In this embodiment, the method also comprises, based on a comparison of the respective predictive powers of the first and second preliminary transformed variables, selecting the second preliminary transformation function as the transformation function.

It should be understood that more than two preliminary transformation functions may be determined, for example at least two, at least three, at least four, at least five preliminary transformation function similarly as to how the first and/or second transformation functions are determined. The predictive powers of these respective transformed variables associated with the respective transformation functions may be determined similarly as the predictive powers of the first and/or second preliminary transformation function. Then, based on a comparison of the determined predictive powers, one preliminary transformation function may be selected as the

transformation function. This embodiment enables to select the best transformation function for a variable.

In one embodiment, determining a transformation function, such as the transformation functions, further transformation functions, preliminary transformation functions described above, for a variable in the transaction records comprises, on the basis of one or more rules for determining the transformation function stored in a data storage, assigning each transaction record of the plurality of transaction records on the basis of a value for the variable to one of at least two subsets of transaction records. This embodiment comprises, for each subset of transaction records, determining a subset score based on a number of transaction records assigned to the subset and for which the first variable indicates that they relate to fraudulent transactions and based on at least one of

-a total number of transaction records assigned to the subset, -a number of transaction records assigned to the subset and for which the first variable indicates that they relate to non-fraudulent transactions,

-a total number of transaction records assigned to the subsets,

-a number of transaction records assigned to the subsets and for which the first variable indicates that they relate to fraudulent transactions, and

-a number of transaction records assigned to the subsets and for which the first variable indicates that they relate to non-fraudulent transactions.

In one embodiment, the variable, which may be the second variable, is a categorical variable. Each subset may then be associated with a particular categorical value of the categorical variable and the one or more rules may prescribe that each transaction record is assigned to the subset that is associated with the categorical value the transaction record itself comprises. To illustrate, under the assumption that the variable relates to the payee of the transaction and can hold either one of the following categorical values:“Merchant A”,“Merchant B”,“Merchant C” and“Merchant D”, then four subsets would be created respectively associated with these categorical values. Then, the rules may prescribe that any transaction having“Merchant A” for the variable“Payee” is assigned to subset associated with “Merchant A”, transactions having“Merchant B” to subset associated with“Merchant B”, et cetera.

The subset score may be based on a Weight of Evidence (WoE).

In one embodiment, the subset score WoE for a subset i is determined in accordance with:

WoEi = ln ( ) - ln (^ ), wherein

Fj denotes the number of records relating to fraudulent transactions in subset i, F denotes the total number of records relating to fraudulent transactions in all subsets, NFi denotes the number of records relating to non-fraudulent transactions in subset i, and NF denotes the total number of records relating to non-fraudulent transactions in all subsets.

This embodiment ensures that a relation between the transformed variable and the probability of fraud is a substantially monotonically increasing relation.

In one embodiment, determining a predictive power of a variable comprises calculating a measure of distributional inequality of transaction records relating to fraudulent transaction among the at least two subsets. In one embodiment, determining a predictive power of a variable comprises calculating at least one of a Gini-coefficient and an area under a Receiver Operating Characteristic curve. Herein, the Gini-coefficient should not be mistaken for a Gini impurity value used in decision tree algorithms. These embodiments enable to accurately determine the predictive power of a variable.

In one embodiment, the variable is a numerical variable. In this embodiment, the one or more rules define at least one boundary value for the variable defining at least two value ranges. The at least two subsets are respectively associated with the at least two value ranges. In this embodiment, assigning each transaction record to one of at least two subsets of transaction records comprises, for each transaction record, determining in which particular value range of the at least two value ranges the value for the variable is and assigning the transaction record to the subset associated with the particular value range.

To illustrate, if the variable relates to the amount that is transferred, the one or more rules may define that transaction records having a value for the amount lower than 10 USD are assigned to a first subset and that transaction records having a value for the amount higher than or equal to 10 USD are assigned to a second subset.

As such, the transaction records in the plurality of transaction records may be understood to be bucketed based on their values for said variable.

In one embodiment, the one or more rules for determining the first preliminary transformation function define a first set of one or more boundary values defining at least two value ranges and the one or more rules used for determining the second preliminary transformation function define a second set of one or more boundary values defining at least two value ranges that is different from the first set of one or more boundary values.

A boundary value may constitute an end point of either of two adjacent value ranges. The end point may or may not be included in the value range.

In one embodiment, the second set of boundary values comprises a different number of, e.g. more or less, boundary values than the first set of one or more boundary values. The second set of boundary values may thus be understood to define a different number of value ranges, e.g. more or less, than the first set. This embodiment advantageously allows to find an even better transformation function, resulting in better fraud detection. In one embodiment, transforming a value into a transformed value in

accordance with the transformation function comprises applying the one or more rules to associate the value with a particular subset of the at least two subsets, a particular subset score having been determined for the particular subset, and determining the particular subset score to be the transformed value.

In one embodiment, a transformation function F(x), which may be a preliminary transformation function, for transforming a value x, such as a value for the second variable or a value of a further variable, into a transformed value x' is given by

B /, denote boundary values, wherein the value x is a numerical value.

In one embodiment, a transformation function F(x) for transforming a value x into a transformed value x' is given by

CAT, denote categories, wherein the value x is a categorical value.

The transformed values x’, may thus correspond to the subset scores, in particular the WoE, described above.

Preferably BVo is lower than the lowest possible value that the value x can be, BVofor example equals zero or minus infinity. Preferably BV_N is higher than the highest possible value that the value x can be, BVofor example equals infinity. This ensures that the transformation function is defined for any value that x can have.

In one embodiment, the method comprises determining that the probability that the received transaction record relates to a fraudulent transaction is higher than a first threshold probability and, in response, outputting an indication that the transaction record may relate to a fraudulent transaction.

The indication may be output by optical and/or acoustical means. An alarm may sound. In another example, the indication is displayed on a screen so that a user may become aware that the particular transaction record may relate to a fraudulent transaction. The indication may also be output as part of a message transmitted from the fraud detection system to another system, e.g. to another computer system within a bank.

This embodiment enables further investigation of the particular transaction, for example by an operational team of bank employees who may call the customer to check whether the transaction is indeed fraudulent or not.

In one embodiment, the method comprises determining that the probability that the received transaction record relates to a fraudulent transaction is higher than a second threshold probability and in response preventing the transaction from occurring. Preferably, the second threshold probability is higher than the first threshold probability, if the first threshold probability is implemented. The transaction may thus be automatically blocked without human intervention. It is advantageous that a fraud is detected before funds are transferred as this prevents the need to reverse the transaction.

One aspect of this disclosure relates to a system for detecting fraudulent transactions. The system comprises a computer comprising a computer readable storage medium having computer readable program code embodied therewith, and a processor, preferably a microprocessor, coupled to the computer readable storage medium. Responsive to executing the computer readable program code, the processor is configured to perform one or more of the method steps of methods described herein.

One aspect of this disclosure relates to a computer program or suite of computer programs comprising at least one software code portion or a computer program product storing at least one software code portion, the software code portion, when run on a computer system, being configured to cause the computer system to execute one or more method steps of the methods described herein.

One aspect of this disclosure relates to a non-transitory computer-readable storage medium storing at least one software code portion, the software code portion, when executed or processed by a computer, causes the computer to perform one or more method steps of methods described herein.

As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, a method or a computer program product.

Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a "circuit," "module" or

"system." Functions described in this disclosure may be implemented as an algorithm executed by a processor/microprocessor of a computer. Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied, e.g., stored, thereon.

Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a computer readable storage medium may include, but are not limited to, the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of the present invention, a computer readable storage medium may be any tangible medium that can contain, or store, a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber, cable, RF, etc., or any suitable combination of the foregoing. Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java(TM), Smalltalk, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).

Aspects of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program

instructions may be provided to a processor, in particular a microprocessor or a central processing unit (CPU), of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer, other programmable data processing apparatus, or other devices create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of

manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the

instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Moreover, a computer program for carrying out the methods described herein, as well as a non-transitory computer readable storage-medium storing the computer program are provided. A computer program may, for example, be downloaded (updated) to the existing computer systems (e.g. to the existing fraud detection systems or be stored upon manufacturing of these systems.

Elements and aspects discussed for or in relation with a particular embodiment may be suitably combined with elements and aspects of other embodiments, unless explicitly stated otherwise. Embodiments of the present invention will be further illustrated with reference to the attached drawings, which schematically will show embodiments according to the invention. It will be understood that the present invention is not in any way restricted to these specific embodiments.

BRIEF DESCRIPTION OF THE DRAWINGS

Aspects of the invention will be explained in greater detail by reference to exemplary embodiments shown in the drawings, in which:

FIG. 1 illustrates an data processing system according to an embodiment;

FIG. 2 is a flow chart illustrating a method according to an embodiment; FIG. 3 illustrates a fraud detection system according to one embodiment;

FIG. 4 shows an exemplary supervised data set that may be used in an embodiment;

FIGs. 5 and 6 illustrate how a transformation function may be determined according to an embodiment;

FIG. 7 illustrates how a predictive power of a transformed variable may be determined;

FIG. 8 illustrates how overfitting of a data set may be prevented according to an embodiment;

FIG. 9 illustrates how a transformation function may be determined according to an embodiment;

FIG. 10 illustrates a model resulting from performing a logistic regression analysis according to an embodiment;

FIG. 11 shows ROC curves indicative of a predictive sensitivity and accuracy for different methods, among which a method according to an embodiment.

DETAILED DESCRIPTION OF THE DRAWINGS

Fig. 1 depicts a block diagram illustrating an exemplary data processing system that may be used in a system for detecting fraudulent transactions as described with reference to Fig. 2.

As shown in Fig. 1 , the data processing system 100 may include at least one processor 102 coupled to memory elements 104 through a system bus 106. As such, the data processing system may store program code within memory elements 104. Further, the processor 102 may execute the program code accessed from the memory elements 104 via a system bus 106. In one aspect, the data processing system may be implemented as a computer that is suitable for storing and/or executing program code. It should be appreciated, however, that the data processing system 100 may be implemented in the form of any system including a processor and a memory that is capable of performing the functions described within this specification.

The memory elements 104 may include one or more physical memory devices such as, for example, local memory 108 and one or more bulk storage devices 110. The local memory may refer to random access memory or other non-persistent memory device(s) generally used during actual execution of the program code. A bulk storage device may be implemented as a hard drive or other persistent data storage device. The processing system 100 may also include one or more cache memories (not shown) that provide temporary storage of at least some program code in order to reduce the number of times program code must be retrieved from the bulk storage device 110 during execution.

Input/output (I/O) devices depicted as an input device 112 and an output device 114 optionally can be coupled to the data processing system. Examples of input devices may include, but are not limited to, a keyboard, a pointing device such as a mouse, or the like. Examples of output devices may include, but are not limited to, a monitor or a display, speakers, or the like. Input and/or output devices may be coupled to the data processing system either directly or through intervening I/O controllers.

In an embodiment, the input and the output devices may be implemented as a combined input/output device (illustrated in Fig. 1 with a dashed line surrounding the input device 112 and the output device 114). An example of such a combined device is a touch sensitive display, also sometimes referred to as a“touch screen display” or simply“touch screen”. In such an embodiment, input to the device may be provided by a movement of a physical object, such as e.g. a stylus or a finger of a user, on or near the touch screen display.

A network adapter 116 may also be coupled to the data processing system to enable it to become coupled to other systems, computer systems, remote network devices, and/or remote storage devices through intervening private or public networks. The network adapter may comprise a data receiver for receiving data that is transmitted by said systems, devices and/or networks to the data processing system 100, and a data transmitter for transmitting data from the data processing system 100 to said systems, devices and/or networks. Modems, cable modems, and Ethernet cards are examples of different types of network adapter that may be used with the data processing system 100.

As pictured in Fig. 1 , the memory elements 104 may store an application 118. In various embodiments, the application 118 may be stored in the local memory 108, the one or more bulk storage devices 110, or apart from the local memory and the bulk storage devices. It should be appreciated that the data processing system 100 may further execute an operating system (not shown in Fig. 1 ) that can facilitate execution of the application 118. The application 118, being implemented in the form of executable program code, can be executed by the data processing system 100, e.g., by the processor 102. Responsive to executing the application, the data processing system 100 may be configured to perform one or more operations or method steps described herein.

In one aspect of the present invention, the data processing system 100 may represent a computer system and/or fraud detection system as described herein.

In another aspect, the data processing system 100 may represent a client data processing system. In that case, the application 118 may represent a client application that, when executed, configures the data processing system 100 to perform the various functions described herein with reference to a "client". Examples of a client can include, but are not limited to, a personal computer, a portable computer, a mobile phone, or the like.

In yet another aspect, the data processing system 100 may represent a server. For example, the data processing system may represent an (HTTP) server, in which case the application 118, when executed, may configure the data processing system to perform (HTTP) server operations.

Various embodiments of the invention may be implemented as a program product for use with a computer system, where the program(s) of the program product define functions of the embodiments (including the methods described herein). In one embodiment, the program(s) can be contained on a variety of non- transitory computer-readable storage media, where, as used herein, the expression “non-transitory computer readable storage media” comprises all computer-readable media, with the sole exception being a transitory, propagating signal. In another embodiment, the program(s) can be contained on a variety of transitory computer- readable storage media. Illustrative computer-readable storage media include, but are not limited to: (i) non-writable storage media (e.g., read-only memory devices within a computer such as CD-ROM disks readable by a CD-ROM drive, ROM chips or any type of solid-state non-volatile semiconductor memory) on which information is permanently stored; and (ii) writable storage media (e.g., flash memory, floppy disks within a diskette drive or hard-disk drive or any type of solid-state random-access semiconductor memory) on which alterable information is stored. The computer program may be run on the processor 102 described herein.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or

components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of embodiments of the present invention has been presented for purposes of illustration, but is not intended to be exhaustive or limited to the implementations in the form disclosed. Many

modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the present invention. The

embodiments were chosen and described in order to best explain the principles and some practical applications of the present invention, and to enable others of ordinary skill in the art to understand the present invention for various embodiments with various modifications as are suited to the particular use contemplated.

FIG. 2 schematically shows a system 220 for detecting fraudulent transactions according to an embodiment and illustrates how system 220 in one embodiment interacts with other systems, such as a system for forming a supervised data set 218 and a system 222 for processing transactions.

The systems 218, 220 and 222 may communicate with each other via a network (not shown). This network may be an internal network of the bank. Typically the connections between these systems are highly secure.

System 218 is configured to form a supervised data set, which may be understood to be a data set comprising historical transaction records comprising an indication whether or not they relate to fraudulent transactions. The system 218 may receive input from bank employees who have verified that transactions were indeed fraudulent or not.

System 222 is configured to process transactions. System 222 is configured to receive transaction orders, e.g. originating from customers of the bank, and to arrange that the correct amount is transferred from one party to the other. System 222 may be configured to, prior to transferring funds, transmit a transaction record of a to be executed transaction to the system 220 for detection fraudulent transactions. Only if the system 222 receives an indication from system 220 that it has not identified the transaction record as relating to a fraudulent transaction, system 222 can proceed and execute the transaction.

In step 224, the system 218 for forming a supervised data set transmits a supervised data set comprising a plurality of records to system 220 for detecting fraudulent transactions. The system 220 receives in step 224 the supervised data set from system 218 and stores the data set in step 226. Each transaction record comprises at least two variables, a first variable indicating whether or not the transaction record relates to a fraudulent transaction. This first variable may have been added to the transaction record after the authenticity of the transaction has been verified as will be explained further below. Each transaction record also comprises a second variable, such as an amount of the transaction. Of course, typically, each transaction record may comprise further variables in addition to the first and second variables.

Step 228 comprises the system 220 determining a transformation function for transforming the second variable into a transformed variable and step 230 comprises determining a relation between the transformed variable and a probability of fraud. This step may comprise performing a logistical regression based on the transformed second variable and on the first variable.

In step 232, the system 222 for processing transactions transmits a particular transaction record to system 220. The system 222 may have received an instruction to process a transaction and arrange a transfer of funds from one party to another party. Before system 222 actually initiates this transfer of funds, it may first transmit a transaction record of the to be executed transaction to the system 220 for detecting fraudulent transactions in order to check that the transaction is authentic. In step 232, the system 220 receives the particular transaction record from system 222. The particular transaction record comprises a particular value for the second variable.

Step 234 comprises the system 220 transforming the particular value into a transformed particular value in accordance with the determined transformation and step 236 comprises determining a probability of fraud indicating a probability that the received transaction record relates to a fraudulent transaction based on the transformed particular value and based on the determined relation. In an optional step 238 the system 220 may determine that the probability that the received transaction record relates to a fraudulent transaction is higher than a threshold probability. In response, in optional step 240, the system 220 may output an indication that the transaction record may relate to a fraudulent transaction. In step 240, the indication is output in the sense that a message is transmitted to the system 220 for processing transactions, the message comprising the indication that the record relates to a fraudulent transaction. Optionally, herewith, the transaction is blocked and does not occur.

Furthermore, in an optional step 242, the system 220 may output an indication the transaction record may relate to a fraudulent transaction to system 218 for forming a supervised data set, so that the system can from a further supervised data set of transaction records comprising the received labelled transaction record in optional step 242.

FIG. 3 illustrates a fraud detection system 320 according to one embodiment. The fraud detection system 320 is in communication with a system 318 for forming a supervised data set, a storage 319 of an operational team within a bank that verifies transactions and with a transaction processing system 322. The transaction processing system 322 is connected to a network 317, such as an internal computer network of a bank.

The system 320 for detecting fraudulent transactions comprises a number of data processing systems or modules, for example data processing systems as described with reference to FIG. 1 , that may be understood to relate to either one of training and inference as indicated. Flerein, training may be understood to relate to learning a capability to a computer system on the basis of existing data and inference to applying this capability to new data. In the present example, training relates to determining a model based on which transaction records relating to fraudulent transactions can be identified, in particular to determining at least one transformation function for transforming at least one variable to a transformed variable and to determining a relation between the transformed variable and the probability of fraud. Inference in this example relates to predicting whether a to be processed transaction is fraudulent or not.

Module 320a indicates a module for selecting variables that are suitable for predicting whether a transaction record relates to a fraudulent transaction. The variable selector 320 selects these variables based on a supervised data set that it receives from storage 318. The variable selector module 320 may comprise a number of sub-modules (not shown) such as a module for determining a

transformation function for transforming a variable into a transformed variable and a module for determining a predictive power of transformed variables. If the predictive power is higher than a threshold predictive power, then variable selector 320a may select the variable, in particular its transformed version, for detecting fraudulent transactions.

Module 320b represents a module for transforming variables of transaction records in the supervised data set. Module 320b may receive from variable selector 320a a set of one or more variables and associated transformation functions for transforming these variables into transformed variables. Then, module 320b may transform for a plurality of transactions in the supervised data set the selected variables into transformed variables.

Module 320c represents a regression module for finding a relation between a probability of fraud and transformed variables. Module 320c receives from variable selector 320b a set of one or more transformed variables. The regression module 320c can perform a regression analysis, for example a logistic regression analysis, for finding a relation between a probability that a transaction record relates to a fraudulent transaction and the transformed variables of a transaction record.

The output of models 320a, 320b and 320c may be input to the modules related to inference as indicated by the bold arrows. To illustrate, the variable selector 320a may output its selected variables to variable extractor 320d and regression module 320c may output the relation it has found between probability of fraud and

transformed variables to module 320f for calculating a probability that a particular transaction record relates to a fraudulent transaction.

The system for processing transactions 322 comprises a processor 322a and a data storage 322b. The system 322 may receive via network 317 a transaction that is to be processed in the form of a transaction record specifying different aspects of the transactions. However, before the transaction system 322 initiates an irreversible money transfer in accordance with the received transaction record, it inputs the record to the fraud detection system 320 in order to evaluate the risk that the transaction is fraudulent.

Module 320d represents a module for extracting the variables from the to be evaluated transaction record that are used by the fraud detection system 320. These extracted variables are subsequently output to module 320e for transforming the variable in accordance with determined transformation functions. Then, module 320f calculates based on the thus obtained transformed variables and based on the relation determined by regression module 320c, the probability that the transaction record relates to a fraudulent transaction. This result may then be output to module 320g which may compare the calculated probability of fraud with one or more threshold probabilities and, depending on this comparison, the module 320g classifies the transaction as either relating to a fraudulent transaction, or relating to a non-fraudulent transaction, or relating to a suspicious transaction. As shown, the module 320g may output the transaction record to the system for forming a supervised data set 318 and/or to storage 319 and/or to the transaction processing system 322. If classifier module 320g has classified the transaction record as non- fraudulent, the record is output to system 318 and to the system 322 for processing transactions together with an indication that the transaction is non-fraudulent and can thus be executed. If classifier module 320g has classified the transaction record as fraudulent, the record is output to system 318 and to the system 322 for processing transactions together with an indication that the transaction is fraudulent and should thus not be executed. If classifier module 320g has classified the transaction as suspicious, the transaction record is output to data storage 319, which preferably is a data storage of a computer system of an operational team within the bank that verifies whether transactions are fraudulent, for example by contacting a customer by phone. After the operational team has determined whether the transaction is fraudulent, the transaction record is output to systems 318 and 322 where it is processed similarly as other incoming transaction records labelled fraudulent or non- fraudulent.

The system 320 for detecting fraudulent transactions may receive periodically, for example monthly, a supervised data set from the system for forming the supervised data set 318. Then, the system 320 determines a model based on this set. This step may comprise determining which transformation functions are to be applied to which variables of a transaction record.

FIG. 4 shows an exemplary supervised data set comprising a plurality of transaction records that, in one embodiment, the system 220 described with reference to FIG. 2 stores, for example in one or more of the memory elements 104 as described with reference to FIG. 1. The plurality of transaction records may respectively relate to transactions that have already been processed by the

transaction system 222 described with reference to FIG. 2. Preferably the plurality of transaction records relate to transactions that with some degree of certainty have been determined to relate to either fraudulent or non-fraudulent transactions. As shown, a value for a variable may differ per transaction, the variable itself may be identical for each transaction.

The supervised data set of FIG. 4 comprises a plurality of transaction records, in particular fifty transaction records. Each of the transaction records is associated with a record identifier ID. In this example, the fifty records are numbered 1 -50 as shown in column“ID”. Each record comprises at least two variables, in this example three variables, namely“Amount” indicating how much money was transferred from payer to payee,“Payee”, indicating the party to which money was transferred and “Fraud” indicating whether or not the transaction record relates to a fraudulent transaction. If the value for“Fraud” for a transaction record is“1” the transaction record relates to a fraudulent transaction, whereas a value of“0” indicates that a transaction record relates to a non-fraudulent transaction.

In one embodiment, the method comprises, after obtaining such a supervised data set, dividing it into a number of subsets, for example a train set and a test set. The transaction records in the train set may be used for machine-learning purposes for developing a method for detecting fraudulent transactions. For these purposes, the train set may be subdivided into two subsets, a train subset and a validation set. The test set may be used to test the developed method.

The division of the supervised data set into subsets may be performed such that 20-40%, preferably 25-35%, for example 30%, of the transactions end up in the test set, and may be performed such that 60-80%, preferably 65-75%, for example 70%, of the transaction records end up in the train set. Furthermore, the train set may be subdivided such that 20-40%, preferably 25-35%, for example 30% of the transaction records in the train set end up in the validation set and may be performed such that 60-80%, preferably 65-75%, for example 70%, of the transaction records in the train set end up in the train subset.

Preferably, the supervised data set is divided into subsets by stratified sampling. In an example, as a first step of this sampling method, two subgroups, also called“strata”, are formed, wherein one stratum contains the records relating to fraudulent transactions and the other stratum contains the records relating to non- fraudulent transactions. Then, as a second step, the subsets may be formed by proportionally allocating a fraction of each stratum to a subset. To illustrate, for forming a test set containing 30% of the transaction records in the supervised data set, 30% of the transaction records in the first stratum may be assigned to the test set and 30% of the transaction records in the second stratum may be assigned to the test set. Stratified sampling is advantageous because it prevents that transaction records relating to fraudulent transactions, which in reality are typically very sparse, are unevenly distributed among the subsets which may reduce the effectiveness of the methods disclosed herein.

FIG. 4 shows an exemplary supervised data set. In reality, such a supervised data set may comprise thousands of transaction records and each records may comprise many more variables, such as payer, number of transactions in the previous month, type of payer’s bank account, et cetera.

As described above, the method for detecting a fraudulent transaction comprises determining a transformation function for transforming the second variable into a transformed variable. Preferably, the relationship between the probability that a transaction record relates to a fraudulent transaction, i.e. the“probability of fraud”, and the transformed variable is a substantially monotonic relation, e.g. a substantially monotonically increasing relation meaning that for increasing values of the

transformed second variable the probability of fraud does not increase, preferably increases as well.

FIGs. 5, 6 and 7 exemplify how such a transformation function may be determined. To provide an overview, first a preliminary transformation function for transforming the second variable into a first preliminary transformed variable and a second preliminary transformation function for transforming the second variable into a second preliminary transformed variable are determined and a respective predictive power of their associated transformed variables is determined. Then, based on a comparison of the respective predictive powers, the second preliminary transformation function is selected as transformation function.

FIG. 5 relates to determining the first preliminary transformation function. In one embodiment, the system for detecting fraudulent transactions stores in a data storage one or more rules for assigning each transaction record of the plurality of transaction records on the basis of a value for the second variable to one of at least two subsets of transaction records. More particularly, in this example, the one or more rules define at least one boundary value for the second variable, defining at least two value ranges. Even more particularly, in this example, the one or more rules define a boundary for“Amount” of 45.00, which defines two value ranges, namely a first range from minus infinity up to (and optionally including) 45.00 and a second range from 45.00 to infinity. The at least two subsets, which may also be called “buckets”, are respectively associated with the at least two value ranges. FIG. 4 illustrates that the subset“Preliminary bucket 1” is associated with the range [minus infinity, 45) and that the subset“Preliminary bucket 2” is associated with the range [45.00, infinity). Assigning each transaction record to one of at least two subsets of transaction records comprises, for each transaction record, determining in which particular value range of the at least two value ranges the value for the second variable is and assigning the transaction record to the subset associated with the particular value range. Thus, in this example, transaction records having values for “Amount” that are lower than 45.00 are assigned to subset“Preliminary bucket 1” and the transaction records having values for“Amount” that are equal to or higher than 45.00 are assigned to subset“Preliminary bucket 2”.

The summary table below the two buckets summarizes how many records in total, how many records relating to fraudulent transactions and how many records relating to non-fraudulent transactions are in each bucket. Furthermore, the summary table shows in column“WoE” a subset sore for each subset. In this example, a weight of evidence, WoE, is calculated as subset score for each bucket i, wherein the WoE is defined as:

wherein F, denotes the number of records relating to fraudulent transactions in subset i, F denotes the total number of records relating to fraudulent transactions in all subsets, NF, denotes the number of records relating to non-fraudulent transactions in subset i, and NF denotes the total number of records relating to non-fraudulent transactions in all subsets.

Thus, based on this definition, the WoE for the two subsets amounts to

0.167

WOE₂ = ln (0 - ln 0) = 0.169 .

The first preliminary transformation function for transforming the second variable into a transformed second variable is accordingly defined as:

P preliminary i (Amount) = -0.167 for Amount < 45.00

0.169 for Amount => 45.00

The subset score for each subset may be determined differently. In an example, the subset score may be a log odds ratio (LOR) that may be defined for a bucket i as follows:

Accordingly, the subset scores for the two subsets shown in FIG. 5 would be:

0.000, and

LOR₂ = ln { j = 0.336.

FIG. 6 shows that in one embodiment a second preliminary transformation function is determined, optionally in a similar manner as described with reference to FIG. 5. For this second preliminary transformation function, the boundary value is different, and thus the value ranges associated with the different subsets are different. In particular, one boundary value is defined, which in this example equals 27.36, as a result of which transaction records having a value for Amount lower than 27.36 are assigned to subset“Preliminary bucket 3” and transaction records having a value for Amount higher than 27.36 are assigned to subset“Preliminary bucket 4”.

The second preliminary transformation function may be defined as:

In one embodiment, determining the transformation function comprises determining a predictive power of a preliminary transformed second variable. In one embodiment, determining the predictive power of a variable comprises calculating a Gini-coefficient, which will be explained with reference to FIG. 7.

FIG. 7 is a graph showing on the horizontal axis the cumulative percentage of the transaction records in the train subset and on the vertical axis the cumulative percentage of records in the train subset that relate to fraudulent transactions. The dashed line is a straight line from (0;0) to (100; 100).

The graph illustrates how a Gini-coefficient is calculated for the preliminary transformed variable associated with the first preliminary transformation function described above. Curve 750, consisting of part 750a and part 750b, relates to the first preliminary transformation function. Line 750 is based on the summary

information depicted in in the summary table of FIG. 5. A cumulative curve such as, curve 750, in FIG. 7 is generated by plotting the subsets in a particular order, which means that the subsets contribute to the cumulation of the percentages in the graph in this particular order. The particular order of plotting the subsets, or buckets, is such that the subset scores of the subsets are ordered from high to low or from low to high.

In FIG. 7, the order in which the subsets are plotted and thus contribute to the cumulation of the percentages is such that the subset scores are ordered from high to low. Therefore, part 750a of line 750 illustrates the contribution of“Preliminary bucket 2” to the cumulative percentages on the vertical and horizontal axes respectively, whereas part 750b of line 750 illustrates the contribution of“Preliminary bucket 1”. Point 752 is at (50;54), as“Preliminary bucket 2” contains 50% of the transaction records in the train subset and 54% of all transaction records relating to fraudulent transactions.“Preliminary bucket 1” then contributes the remaining 50% on the horizontal axis and the remaining 46% on the vertical axis to arrive at the point (100; 100).

In one embodiment, determining the predictive power of a transformed variable is performed on the basis of a surface area below a curve, such as curve 750. The curve may indicate a relation between a cumulation of number of records relating to fraudulent transactions (see vertical axis) and a cumulation of a number of

transaction records (see horizontal axis) and may be constructed such that the order in which transaction records contribute to said cumulations is determined by the subset scores of each transaction’s subset. Transactions in a particular subset may make a grouped contribution to said cumulations, as opposed to letting each transaction contribute to the cumulations separately. Transactions in a particular subset may make a grouped contribution to said cumulations in the sense that with such a grouped contribution the total number of records in the particular subset is added to the cumulation of the number of records and that the total number of fraudulent records in the particular subset is added to the cumulation of the number of records relating to fraudulent transactions. The respective grouped contributions for the at least two subsets may thus be ordered based on the subset scores of the subsets. In a particular example, first the transaction records in a subset having a first subset score make a grouped contribution to said cumulations and then the transaction records in a subset having a second subset score, that is lower than the first subset score, make a grouped contribution to said cumulations. The Gini coefficient may be defined as the surface area B enclosed by curve 750 and the dashed line divided by the surface area A+B+C enclosed by curve 758 and the dashed line. The Gini coefficient for the first preliminary transformation function is thus given by:

wherein A, B and C are the surface areas as indicated in FIG. 6.

Curve 758 is the ideal curve that is associated with the ideal case wherein one bucket would contain all records relating to fraudulent transactions and would contain only records relating to fraudulent transactions. In such case, the associated transformation function and transformed variable can be used to perfectly distinguish fraudulent transaction records from non-fraudulent transaction records. From the above, it is clear that a maximum Gini-coefficient equals 1.

The graph also illustrates how a Gini-coefficient is calculated for the preliminary transformed variable associated with the second preliminary transformation function described above. Line 754 is based on the summary information presented in the summary table of FIG. 6. Point 756 is at (71 ;85) as“Preliminary bucket 4" contains 71 % of the transaction records in the train subset and 85% of the transaction records relating to fraudulent transactions in the train subset.

Similarly as above, the Gini coefficient for the second preliminary transformation function is:

A +B

G preliminary 2 A+B+C 0.30

wherein A is the surface as indicated in FIG. 6. The surface areas required for determining the Gini coefficient may be determined using conventional algorithms for determining an area under a curve.

Determining a predictive power of a transformation function may additionally or alternatively comprise determining an area under a Receiver Operating

Characteristic curve, wherein the predictive power is defined as the surface area under the ROC curve. In an example, the ROC curve would show the true positive vs false positive rate and would be constructed by (preliminary) labelling each

transaction records as fraudulent or non-fraudulent on the basis of the subset score of the subset to which it was assigned, for example on the basis of the weight of evidence of the subset to which the record was assigned. In particular, the records may be labelled as fraudulent or non-fraudulent by comparing their associated subset scores with a threshold subset score. If the subset score associated with a

transaction record is higher or lower than the threshold subset score, then the record is labelled as fraudulent or non-fraudulent respectively. The ROC curve may then be constructed by starting with a high threshold subset score, for which no transaction records are labelled as fraudulent. This threshold subset score would correspond with the point (0,0) in the ROC curve. Then, the threshold subset score would be lowered and the above described labelling operation would be performed for a plurality of threshold subset scores. At some point, with low enough threshold subset score, transactions will be labelled as fraudulent. Of these labelled records, some may be labelled fraudulent justly (the true positives) other may be labelled fraudulent unjustly (the false positives). The development of the true positives vs the false positives with decreasing threshold subset score can be plotted in an ROC curve as known in the art. The area under such an ROC curve may be indicative for a predictive power of a transformation function based on which the subset scores are determined.

It should be understood that more preliminary transformation functions may be defined of which the associated predictive power may be determined. In one particular example, the sets of one or more rules, which sets respectively form the basis for a preliminary transformation function may be adjusted several times such that the sets define respective boundary values at predetermined percentiles.

The table below shows nine boundary values, defined by rules forming a basis for preliminary transformation functions. For each boundary value, a predictive power is determined, in particular a Gini coefficient. The boundary values in this example are the 10^th, 20^th, 30^th, 40^th, 50^th, 60^th, 70^th, 80^th and 90^th percentile of the train subset. The 30^th percentile for example corresponds to the boundary value of 27.36, which forms a basis for the second preliminary transformation function described above. The boundary values each define two value ranges, or buckets.

In one embodiment, (not shown) once it has been established that one particular boundary value, in this example the boundary value at the 30^th percentile, yields the preliminary transformation function having the transformed variable with the highest predictive power, in this example 0.30, further boundary values near this particular boundary value may be investigated. For example, new boundary values, or formulated more precisely, new preliminary transformation functions based on rules defining these new boundary values, may be tested in the sense that the predictive power of the resulting transformed variables may be determined. In this particular example, the 22^th, 24^th, 26^th, 28^th, 32^nd, 34^th, 36^th, 38^th percentiles as boundary values may be investigated.

Even further, if one of these boundary values, for example the one at the 24^th percentile, yields a higher predictive power than any of the other boundary values, again, further boundary values may be investigated, for example at the 22.5^th, 23^rd, 23.5^th, 24.5^th, 25^th and 25.5^th percentile.

Ultimately, one transformation function is determined to transform the second variable into a transformed variable having the highest predictive power. Then, this transformation function may be validated on the validation set in the sense that a predictive power of this transformation function may be determined when it is applied to the transaction records in the validation set. Again, the predictive power may be determined as described above, with the difference that now records of the validation set are assigned to two subsets instead of records of the train subset. In this example, the predictive power of the transformation function based on a boundary value of 27.36 when it is applied to the validation set equals 0.40.

Further, the method may comprise determining even further preliminary transformation functions that are based on one or more rules that define two boundary values and thus define three value ranges and thus three subsets or buckets. The table shown below shows thirty-six combinations of boundary values which respectively form the basis for thirty-six preliminary transformation functions. The first row of the table below relates to a preliminary transformation function that is based on or more rules that define a first boundary value for the Amount variable at 5.11 , the tenth percentile, and a second boundary value at 7.73, the twentieth percentile. A plurality of combinations, e.g. all combinations, of the nine percentiles may be investigated.

The above table shows that the combination of the two boundary values 27.36 and 83.41 yields the highest predictive power, in this example highest Gini coefficient, of 0.50. Then, this combination of boundary values, i.e. this

transformation function defined based on one or more rules defining these two boundary values, is tested on the validation set. As shown, this results in a predictive power (Gini coefficient) of 0.40. Since this is not higher than the Gini coefficient that was determined for one boundary value on the validation set, which was also 0.40, no further preliminary transformation functions are investigated. The preliminary transformation function based on the one boundary value of 27.36 is selected as transformation function.

The combination of the first boundary, in this example 27.36, and second boundary value, in this example 83.41 , may be refined, which may comprise defining a set of boundary values near the first boundary value and a set of boundary values near the second boundary value. Then, for all possible combinations of two boundary values, one from each set, a predictive power may be calculated and the combination with the highest predictive power is selected.

It should be appreciated that in general Amount and probability of fraud do not possess a monotonically increasing relation. It is namely not true that transactions having higher amounts have a higher probability of being fraudulent. It is easily understood that the above described methods may result in a transformation function that transforms a high value for the amount into a first transformed value and a lower value for the amount into a second transformed value, wherein the second

transformed value is higher than the first transformed value and that the transformed variable can have a monotonically increasing relation with the probability of fraud.

If the predictive power for two boundary values on the validation set is higher than the predictive power for one boundary value on the validation set, the procedure may be continued in the sense that combinations of three boundary values will be investigated. Similarly as for the one boundary and two boundary case, a predictive power, e.g. a Gini coefficient, will at some point be calculated on the validation set. Again, if this predictive power is not higher than the predictive power determined on the validation set for two boundaries, then no further preliminary transformation function is investigated. In contrast, if this predictive power for three boundary values on the validation set is indeed higher than the predictive power determined on the validation set for two boundaries, the procedure is repeated for four boundaries.

Curve 960 in FIG. 8 shows, for another supervised data set unrelated to the one shown in FIG. 4, a relation between the number of value ranges, or buckets, against the highest predictive power determined for these number of buckets on the train subset. Curve 960 shows that for higher number of buckets, the predictive power increases. Flowever, the more value ranges are defined, the higher the chance that overfitting occurs, which is undesirable. To illustrate, if the one or more rules would define twenty-four value ranges, each transaction record in the train sub-set may be assigned to its own subset. It is easily understood that this will result in a maximum Gini score of 1.0, however, these buckets will not be useful for determining the probability of fraud for a particular transaction record that is not in the train sub-set, because in that case the model has been overfitted on the train sub-set. Therefore, once for a certain number of boundary values, the best boundary values have been found, i.e. the boundary values associated with the highest predictive power, these boundary values are tested on another set that was not used for determining these boundary values. Curve 962 shows the results of such tests on the validation set and shows that for the transformation function based on one or more rules defining seven value ranges, the predictive power starts to decrease with respect to the predictive power of the transformation function based on one or more rules defining six value ranges. Therefore, in this particular example, the best transformation function based on one or more rules defining six value ranges is selected as transformation function.

It should be understood that the maximum predictive power for a variable as tested on the validation set may be compared with a threshold predictive power. If the maximum predictive power is higher than the threshold, then the variable is used when assessing the probability of fraud for the received to be evaluated transaction.

If the maximum predictive power is not higher the threshold predictive power, then the variable is not used. In this manner, a maximum predictive power may be determined for a plurality of variables, for example for all variables, comprised in the transaction records. Hence, only the variables that, after a suitable transformation, have the highest predictive power will be selected for determining the probability that a to be evaluated transaction record relates to a fraudulent transaction.

FIG. 9 illustrates that the second variable, or third variable if a third variable would be involved, may be a categorical variable. The one or more rules then may define that each category is assigned to its own subset as shown. In accordance with the summary table shown in FIG. 7, the transformation function for transforming the Payee variable into a transformed variable of this example is defined as:

-0.27 for Payee = Merchant A

0.32 for Payee = Merchant B

Payee (Payee)

0.97 for Payee = Merchant C

1.08 for Payee = Merchant D

It should be appreciated that even for such categorical variables, the

transformed variable may have a monotonically increasing relation with the probability of fraud, which renders the transformed variable suitable for regression analyses, e.g. logistic regression analyses. FIG. 10 illustrates a logistic regression model for determining a relation between the transformed variable associated with the transformation function

F (Amount) = -1.083 for Amount < 27.36

0.439 for Amount => 27.36

For this logistic regression model, the variable amount of the transaction records in the train subset is transformed in accordance with the transformation function into the transformed variable. The transformed variable can have either one of two values, namely -1.083 or 0.439. Furthermore, the variable Fraud can have either one of two values, namely 0, indicating non-fraud, and 1 , indicating fraud. Each record is subsequently plotted as shown.

Subsequently, a logistic regression is performed based on these points in order to determine a relation between the probability of fraud and the transformed variable. The curve 1064 illustrates this relation.

The dotted lines in FIG. 10 illustrate that a value for the transformed variable of -1.083 is associated with a probability of fraud of approximately 0.25, whereas a value for the transformed variable of 0.439 is approximately associated with 0.85.

It should be appreciated that more than one variable in the transaction records may be transformed. In such case, a multivariable logistic regression analysis is performed to find a relation between the probability of fraud and a group of transformed variables.

FIG. 11 shows Receiver Operating Characteristic (ROC) curves of three methods for detecting fraudulent transactions. As known, the area under an ROC curve, the AUC, of a method is indicative of how well the method detects records relating to fraudulent transactions.

FIG. 11 shows ROC curves for three methods for detecting fraudulent transactions applied to the test set in the supervised data set. An ROC curve indicates how well a model performs with decreasing threshold probability. To illustrate, the method as disclosed herein calculates for each transaction in the test set a probability of fraud, which may be subsequently be compared with a threshold probability. If the probability of fraud for a transaction record is higher than (or equal to) the threshold probability, it is labelled as relating to a fraudulent transaction. If the determined probability of fraud is lower than the threshold probability, the transaction record is labelled as relating to a non-fraudulent transaction. Thus, how ultimately a transaction record is labelled depends on the threshold probability. Preferably, of course, transaction records indeed relating to fraudulent transactions are labelled “fraudulent”. These correctly labelled records are called true positives. However, transaction records that actually do not relate to fraudulent transaction records may be labelled“fraudulent”. These mistakenly labelled transaction records are called false positives. The ROC curves shown in FIG. 11 were constructed by decreasing the threshold probability and determining for a plurality of thresholds, how many true positives and false positives have been identified in the test set. Preferably, the area under the curve is as large as possible, maximum 1 , because this would mean that there is a threshold probability for which all fraudulent transaction records in the test set are labelled fraudulent and for which no records are mistakenly labelled as fraudulent.

FIG. 11 shows three ROC curves for three methods. A first method based on a decision tree, a second method using logistic regression (without transforming variables) and an embodiment of the methods disclosed herein.

The method based on the decision tree comprised a standard decision tree algorithm that was constructed using standard methods. A decision tree typically does not calculate a probability of fraud for a transaction record. Rather, a transaction record ends up in an end leaf node that is labelled either fraudulent or non-fraudulent. In order to be able to construct the ROC curve, the output value for a particular transaction record is the ratio between frauds and non-frauds in the end leaf node in which the particular transaction record has ended up.

The method based on logistic regression without transforming variables comprised a standard multi-variable logistic regression analysis.

In the embodiment of the method disclosed herein, the two variables“Amount” and“Payee” were both transformed in accordance with respective transformation functions F(_Prenminary_2) and Fpayee given above. Then, a multi-variable logistic regression analysis was performed for finding a relation between the probability of fraud and the two transformed variables. The transformation functions and the found relation were applied to the test set to determine for each record in the test set a probability that it relates to a fraudulent transaction. These probabilities are compared with a predetermined threshold probability in order to classify the transaction records as relating to either fraudulent or non-fraudulent transactions as described above. FIG. 11 shows that the method according to an embodiment best detects the fraudulent transactions in the test set, because its Area under Curve (AUC) score of 0.82 is highest. It should be appreciated that the exemplary supervised data set was generated in randomly and as such was not biased to be particularly suitable for any of the three methods.

Claims

1. A computer-implemented method for detecting a fraudulent transaction, the method comprising

storing in a data storage a supervised data set comprising a plurality of transaction records, wherein each transaction record comprises at least two variables, a first variable indicating whether or not the transaction record relates to a fraudulent transaction and a second variable, such as an amount of the transaction, on the basis of transaction records in the supervised data set, determining a transformation function for transforming the second variable into a transformed variable wherein the transformed second variable has a substantially monotonic relation, e.g. a substantially monotonically increasing relation, with a probability of fraud,

determining a relation between the transformed variable and the probability of fraud wherein determining the relation between the transformed second variable and the probability of fraud comprises performing a logistic regression analysis for determining said relation, and

receiving a to be evaluated transaction record comprising a value for the second variable, and

transforming the value into a transformed value in accordance with the determined transformation,

determining a probability of fraud indicating a probability that the received transaction record relates to a fraudulent transaction based on the transformed value and based on the determined relation.

2. The method according to claim 1 , wherein determining the transformation function comprises

determining a first preliminary transformation function for transforming the second variable into a first preliminary transformed variable,

determining a predictive power of the first preliminary transformed variable indicating how well transaction records relating to fraudulent transaction can be distinguished from transaction records relating to non-fraudulent transactions on the basis of the first preliminary transformed variable,

determining a second preliminary transformation function for transforming the second variable into a second preliminary transformed variable, determining a predictive power of the second preliminary transformed variable indicating how well transaction records relating to fraudulent transaction can be distinguished from transaction records relating to non-fraudulent transactions on the basis of the second preliminary transformed variable,

based on a comparison of the respective predictive powers of the first and second preliminary transformed variables, selecting the second preliminary transformation function as the transformation function.

3. The method according to claim 1 or 2, wherein determining a transformation function for a variable in the transaction records comprises, on the basis of one or more rules for determining the transformation function stored in a data storage, assigning each transaction record of the plurality of transaction records on the basis of a value for the variable to one of at least two subsets of transaction records, and for each subset of transaction records, determining a subset score based on a number of transaction records assigned to the subset and for which the first variable indicates that they relate to fraudulent transactions and based on at least one of -a total number of transaction records assigned to the subset,

-a number of transaction records assigned to the subset and for which the first variable indicates that they relate to non-fraudulent transactions,

-a total number of transaction records assigned to the subsets,

4. The method according to one or more of the preceding claims, wherein determining a transformation function for a variable in the transaction records comprises

on the basis of one or more rules for determining the transformation function stored in a data storage, assigning each transaction record of the plurality of transaction records on the basis of a value for the variable to one of at least two subsets of transaction records,

for each subset of transaction records, determining a subset score, wherein the subset score WoEifor a subset i is determined in accordance with: WoEi = In ^ - ln (^ ), wherein

Fi denotes the number of records relating to fraudulent transactions in subset i,

F denotes the total number of records relating to fraudulent transactions in all subsets, NFj denotes the number of records relating to non-fraudulent transactions in subset i, and NF denotes the total number of records relating to non-fraudulent transactions in all subsets.

5. The method according to one or more of claims 2-4, wherein determining a predictive power of a variable comprises calculating at least one of a Gini-coefficient and an area under a Receiver Operating Characteristic curve.

6. The method according to one or more of claims 3-5, wherein

the variable is a numerical variable and wherein the one or more rules define at least one boundary value for the variable defining at least two value ranges, the at least two subsets being respectively associated with the at least two value ranges, and wherein assigning each transaction record to one of at least two subsets of transaction records comprises

for each transaction record, determining in which particular value range of the at least two value ranges the value for the variable is and assigning the transaction record to the subset associated with the particular value range.

7. The method according to the preceding claim, wherein the one or more rules for determining the first preliminary transformation function define a first set of one or more boundary values defining at least two value ranges and wherein the one or more rules used for determining the second preliminary transformation function define a second set of one or more boundary values defining at least two value ranges that is different from the first set of one or more boundary values.

8. The method according to the preceding claim, wherein the second set of boundary values comprises a different number of boundary values than the first set of one or more boundary values.

9. The method according to one or more of the preceding claims, wherein transforming a value into a transformed value in accordance with the transformation function comprises applying the one or more rules to associate the value with a particular subset of the at least two subsets, a particular subset score having been determined for the particular subset, and determining the particular subset score to be the transformed value.

10. The method according to one or more of the preceding claims, wherein a transformation function F(x) for transforming a value x into a transformed value x’ is given by

either

BV_I denote boundary values, wherein the value x is a numerical value, or given by

CAT, denote categories, wherein the value x is a categorical value.

11. The method according to one or more of the preceding claims, comprising determining that the probability that the received transaction record relates to a fraudulent transaction is higher than a first threshold probability and in response outputting an indication that the transaction record may relate to a fraudulent transaction.

12. The method according to one or more of the preceding claims, comprising determining that the probability that the received transaction record relates to a fraudulent transaction is higher than a second threshold probability and in response preventing the transaction from occurring.

13. A system for detecting fraudulent transactions, the system comprising a computer comprising

a computer readable storage medium having computer readable program code embodied therewith, and

a processor, preferably a microprocessor, coupled to the computer readable storage medium, wherein responsive to executing the computer readable program code, the processor is configured to perform the method according to one or more of the preceding claims 1 -12.

14. A computer program or suite of computer programs comprising at least one software code portion or a computer program product storing at least one software code portion, the software code portion, when run on a computer system, being configured for executing the method according to one or more of the claims 1 -12.

15. A non-transitory computer-readable storage medium storing the computer program according to claim 14.