WO2018167404A1

WO2018167404A1 - Detection by machine learning of anomalies in a set of banking transactions by optimization of the average precision

Info

Publication number: WO2018167404A1
Application number: PCT/FR2018/050544
Authority: WO
Inventors: Jordan FRERY; Amaury HABRARD; Marc SEBBAN; Liyun GUELTON; Olivier CAELEN
Original assignee: Worldline; Universite Jean Monnet Saint Etienne; Centre National De La Recherche Scientifique
Priority date: 2017-03-16
Filing date: 2018-03-09
Publication date: 2018-09-20
Also published as: FR3064095B1; EP3596685A1; CN110678890A; FR3064095A1

Abstract

The invention relates to a method for detecting anomalies in a set of payment transactions, consisting of: - establishing (E3) a meta-model formed from a set of models, each trained on a training set in order to determine a risk for each transaction of being anomalous, the meta-model being established by the "gradient boosting" technique so as to optimize a differentiable function expressing the average precision of the meta-model; - submitting (E4) said set to the meta-model so as to determine risks for each transaction of said set, and - determining a subset of transactions corresponding to a risk that is greater than a determined threshold in order to provide a predetermined number of transactions in said subset.

Description

AUTOMATIC LEARNING DETECTION OF ANOMALIES IN A SET OF BANKING TRANSACTIONS BY OPTIMIZING THE AVERAGE PRECISION

FIELD OF THE INVENTION

The present invention relates to a mechanism for detecting anomalies by automatic learning in a set of banking transactions. It applies in particular to the detection of fraud.

BACKGROUND OF THE INVENTION

Fraud on payment transactions, mainly involving bank transactions, is an important and growing phenomenon, particularly as a result of the generalization of online transactions via telecommunication networks. In addition to fraud, other types of anomalies can also occur (errors ...).

Also, different mechanisms for detecting anomalies have been deployed, particularly by banking institutions.

These mechanisms can be set up before or after the transaction authorization by a payment server. In the first case, we talk about detection of fraud, or anomalies, in real time. In the second case, it is near-real-time detection.

The first case has the advantage of being able to block a fraudulent transaction before it takes place, but it is subject to a strong constraint on the processing time, since the mechanism delays the finalization of the payment and impact transaction. therefore negatively the user's experience. The second case allows for more of time and thus to be able to put in place treatments more accounts and finer.

Solutions for detecting anomalies in this second case have been proposed. For the most part, these solutions are based on different classification mechanisms.

However, most conventional classification technologies can not be applied directly because of the specificities of anomaly detection in a set of payment transactions. In particular, the very strong imbalance in the data tends to induce models predicting only non-fraudulent transactions.

First, the consequences of fraud are extremely important and very sensitive. If it is therefore important to detect the maximum number of cases of fraud, it is also very harmful to cancel a suspicious transaction while it is lawful. The gravity and complexity of the situation today does not allow automatic processing and the existing solutions consist in presenting a certain number of contentious transactions to a human operator, and it is this human operator who, in the last resort, is responsible the final classification of a disputed transaction as an anomaly or lawful.

In addition, because of the confidential and sensitive nature of information relating to payments and banking data, very little information is publicly available on the tools put in place for the detection of fraud. It is therefore difficult to compare the solutions of the state of the art.

SUMMARY OF THE INVENTION

The object of the present invention is to provide a solution at least partially overcoming the aforementioned drawbacks. More particularly, the invention aims to provide tools enabling the determination of a set of transactions presenting a certain risk of being in anomaly (frauds or other phenomena), and which can be presented to a human operator.

For this purpose, the present invention proposes a method for the detection of anomalies in a set of payment transactions, consisting in

- establish a meta-model consisting of a set of models, each optimized on a training game to determine a risk for each transaction to be an anomaly, said meta-model being established by the technique of "gradient boosting", so as to optimize a differentiable function expressing the average accuracy of said meta-model;

- submit said set to said meta-model, to determine risks for each transaction of said set, and,

determining a subset of transactions corresponding to a risk greater than a determined threshold to provide a predetermined number of transactions in said subset.

According to preferred embodiments, the invention comprises one or more of the following features which can be used separately or in partial combination with one another or in total combination with one another:

said subset is presented to one or more human experts and said threshold is determined according to the number of transactions that can be processed by said one or more human experts; prior to the establishment of the meta-model, a subsampling step (E2) is applied to said set of transactions, in order to improve the balance between anomalous transactions and legitimate transactions;

said sub-sampling step consists in optimizing a measurement F2; the optimization of said measurement F2 consists in minimizing a differentiable function expressing said measurement F2;

said average precision is applied to the rank of a transaction, the transactions being ordered by risk level, the average precision AP is expressed by the equation:

is the risk determined as a fraud for transaction x. ;

y _i is equal to 1 if said transaction is in anomaly, 0 otherwise;

I () is the indicator function, it is equal to 1 if the condition is true, 0 otherwise;

N the number of transactions of the learning game;

and n is the rank of transaction x; compared to the ranking of all transactions, predicted by model F said function is expressed by the equation

(), with:

and in which a is a

smoothing parameter.

Another object of the invention is a computer program comprising instructions which, when executed by a processor of a computer system, result in the implementation of a method as previously described. Another object of the invention is a device for the detection of anomalies comprising means enabling the implementation of the previously described method. Other features and advantages of the invention will appear on reading the following description of a preferred embodiment of the invention, given by way of example and with reference to the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

Figure 1 shows schematically an example of the flow of the method according to one embodiment of the invention.

DETAILED DESCRIPTION OF THE INVENTION

As has been said in the introduction, the invention consists in determining within a set of transactions, the subset of transactions presenting a high risk and to be presented to one (or more) human operator.

The cardinal of this subset may be predetermined since it may correspond to the number of transactions that can be processed over a given duration (for example a day) by human operators.

The problem solved by the invention therefore consists in quickly finding the k transactions presenting the highest risk of being anomalies, where k is the number of transactions that can be processed by the human operators.

As a first step, a preprocessing step can be implemented. This step is referenced El in FIG.

This pretreatment consists in preparing the data corresponding to the transactions in order to allow their good treatment by the subsequent stages. This data includes both data previously contained in the transactions, and data external to them.

More particularly, this pretreatment can cover at least two operations:

A first operation consists in formatting the data present in the submitted transactions, in order to allow their processing by the "machine learning" type algorithm to which they are then subjected. For example, the date of the transaction can be transformed into several data, or characteristics ("features"): day, month, year, hour, minute ...

A second operation is to associate new features to the transactions. These new features can be created from the history of the parties to the transaction, including the holder of a payment card used for the transaction: average amounts spent, previously visited stores, etc.

These features are intended to be relevant to the problem that is the detection of anomalies and more particularly fraud. Thus, an amount much higher than the average of the previous transactions can be an element of risk, without characterizing itself an anomaly.

The transactions, each formed of a set of characteristics, are then transmitted to a subsampling step E2.

This step E2 can be omitted in the overall process according to the invention, but it makes it possible to improve the performances and the processing time.

In particular, it makes it possible to improve the learning game on which the statistical models of the subsequent step will be trained. Indeed, as mentioned above, the number of transactions in anomaly is certainly too high, but it nevertheless represents a proportion very low total transaction volume (for example, around 0.2%). It shows that the transaction population is very unbalanced, and this imbalance creates significant problems for most learning mechanisms. One of the objectives of the invention is to take into account this specificity and to propose a solution to remedy it.

It is in this step E2 to discard a certain number of transactions that can be judged as not being in anomaly (that is to say, which are "legitimate"), in order to partly reduce the number of transactions involved in the learning game and, on the other hand, improve the distribution between anomalous transactions and legitimate transactions.

Such sub-sampling techniques, for example described in the article "Smote-rsb *: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using smoothness and rough set theory" by E. Ramentol, Y. Caballero and F. Herrera, in Knowledge and Information Systems, 33 (2), 2012, pages 245-265. This article also presents a complementary or alternative technique of oversampling the set of data, that is, creating "synthetic" data of the minority class.

More specifically, the step E2 is a binary classification step of assigning each transaction of the learning set submitted in a class "transaction in anomaly" or in a class "legitimate transaction". It can aim at optimizing a measurement F2 combining the recall rate and the precision measured for this sub-sampling step.

The recall rate for a given class is defined by the ratio between the number of correctly classified transactions and the number of transactions actually in that class. Accuracy is defined as the ratio of the number of correctly classified transactions to the total number of transactions.

If we consider the usual classification criteria of "true positives" TP, "false positives" FP and "false negatives" FN, the recall rate and accuracy can be expressed:

The "true positive" TP, FP "false positive" and FN "false negative" rates can be expressed according to the scores provided by an F model established for this binary classification step with two classes "+1" and "0" .

We consider the probability for a transaction x _;

to belong to the positive class "1". In a learning set of N transactions, we can associate with each transaction X _i , the "real" class yi. This learning game can be written

We can then write:

These two criteria, precision and reminder, are generally insufficient to measure the performance of a classification mechanism. It is indeed possible to obtain a very high recall rate (equal to at most 1) to the detriment of a very low accuracy and vice versa.

There are several classic measures combining recall and precision, so as to capture a performance considered relevant and representative of the mechanism's ability to provide acceptable results.

Such a measurement may for example be the measure F (or F measure), defined by:

According to one embodiment of the invention, a measurement F2 is preferred for the emphasis it places on recall, rather than accuracy.

that is to say :

By insisting on the recall, the sub-sampling step allows to discard a large number of "legitimate" transactions, while keeping a maximum of transactions in anomaly for the next step E3.

According to one embodiment, the optimization of the measurement F2 consists in minimizing a differentiable function expressing said measurement F2.

To do this, we can define approximations for each sum TP, FP, FN, in which the indicator function is replaced by an approximation with the sigmoid function.

The definitions given above then become:

These definitions can be used to construct an approximation of the measure F2,

This approximation can be used as an objective function in a classical optimization process. This optimization process can be a gradient descent and for example use the "gradient boosting" technique, this same way as step E3. These optimization methods will be more detailed in the following paragraphs, in connection with step E3.

This step E3 consists in establishing a meta-model formed of a set of models, each optimized on a training set, by the "gradient boosting" technique, so as to optimize a differentiable function expressing the average accuracy of said meta-model. model.

In general terms, it is a question of establishing a meta-model that, in general, makes it possible to determine a subset of transactions corresponding to a risk greater than a threshold determined to provide a predetermined number of transactions in said sub-set. together

Indeed, as mentioned above, transactions considered as "at risk" are subject to expert users so that they decide whether they are anomalies or not. As a result, available human resources provide this predetermined number. The problem is therefore to provide the k most risky transactions, where k is the number of transactions that the expert users can process. In order to solve this technical problem, the inventors considered that the usual criterion of accuracy (or "accuracy" in English) of the model to be used for generalization was not optimum. They consider that the average accuracy criterion is better able to account for the specificity of the technical problem. As part of the overall training implemented in the invention, optimizing the average accuracy makes it possible to promote the learning of models that generate a good precision on the transactions presenting the highest risks.

The method used in the context of the invention is a set-learning method, that is to say based on a global model, or meta-model, formed of a set of "individual" models. Each individual model, or "basic", is built and optimized from a learning game.

These combined methods have been presented in the state of the art in numerous publications in the field of machine learning.

In general, they are based on the limitations that we see for any model when looking for a good compromise between bias and variance. Studies have shown that not considering a model but a set of models, one could improve both the bias and the variance of the meta-model.

In the prediction phase, each model performs a prediction, and the final prediction, performed by the meta-model, is a combination of individual predictions. Different combinations are possible: majority vote, weighted majority vote, threshold vote, unanimity, etc.

In the context of the invention, the combination can be made with a weighted majority vote.

In automatic learning, each model learns autonomously, iteratively, and is evaluated with respect to a result to be achieved which, in the context of the invention, is the optimization of a function expressing a mean accuracy of the models.

According to one embodiment of the invention, the set-up technique used is a "boosting" technique, or stimulation, and more particularly of "gradient boosting", since it is a function optimization.

The technique of boosting has been proposed by R. Schapire in the article "The strength of week learnability" in Machine Learning, 5, 1990, pages 197-227, and has since been the subject of an abundant literature.

The basic idea is to consider the transactions that have been poorly learned by the models and focus on them in order to improve their learning priority over other transactions, in the following iterations of the learning process.

An implementation of this principle is detailed for example in the AdaBoost algorithm provided by the article "Experiments with a new boosting algorithm" by Y. Freund and R. Schapire, International Conference on Machine Learning, 1996, pages 148-156.

In its generality, the principle consists of assigning weights to the examples of the learning games and, at each iteration, to change its weight by increasing the weights of the badly classified examples and by decreasing those of the well classified examples. Similarly, the use of the "boosting" technique to perform a gradient descent optimization is well known and for example described in the article by JH Friedman, "Greedy function approximation: a gradient boosting machine" in Annals of statistics, 2001, pages 1189-1232.

The invention does not lie in a new boosting algorithm or boosting gradient, but on how to use them. From a practical point of view, an embodiment of the invention may be a method, implemented by software, using such an algorithm as an autonomous functional module, which may be provided by a library for example. The problem we are trying to solve with the boosting gradient algorithm is to improve the set of "best"k's anomaly transactions, where k is the number of transactions that expert users can betray. Therefore, an objective function based on ranks (or rankings) is particularly suitable.

We can define the rank n of transaction x; by the expression

Each transaction x; belongs to a class "+1", corresponding to transactions in anomaly, or to a class "0" corresponding to transactions "legitimate". F is a model that has a risk output, that is, a probability for a transaction to belong to the class "+1". I () represents the indicator function. Finally, N is the number of transactions in the learning set S. This learning set can be written in which the transaction x _i is associated with a class y _i .

The above expression therefore defines the number of transactions that have a risk greater than or equal to the transaction x _i .

The precision p _i of this rank n can then be defined by the expression

The average accuracy AP can then be obtained by:

with M

In other words, we consider the average precision applied to the rank of the transactions, these being ordered by risks F (x _i ).

This average precision AP can therefore also be written

The use of a "gradient boosting" technique requires that the objective function be differentiable in order to allow a gradient descent.

An idea of the invention therefore consists in approximating this expression of the average accuracy by a differentiable function expressing this average precision. It is this differentiable function that will be optimized by the gradient boosting algorithm.

To do this, we determine a derivable approximation of the indicator function I ():

Or :

with and in which a is a

smoothing parameter. The higher this parameter increases, the closer the approximation approaches the actual average accuracy AP.

Using this approximation of the indicator function, we can write the function

to optimize as follows:

with

We can finally write:

It is then possible to introduce the function as a function to

minimize by a gradient descent according to the "gradient boosting" technique. The invention can be implemented by the use of a "gradient boosting" algorithm known per se, but modified by the introduction of a specific function to be minimized which is a differentiable function expressing the average precision of the model.

Thus, at the end of the learning phase, the meta-model is driven so as to minimize the average accuracy. In a step E4, it can then be used in anticipation to assign a risk to the transactions.

To do this, we submit all transactions to this metamodel. Each transaction is assigned a risk, but above all it is possible to determine a subset of transactions corresponding to a risk greater than a determined threshold to provide a predetermined number, k, of transactions in this subset that correspond to those that the experts can treat.

This predetermined threshold may have been learned during the learning phase. His learning can be empirical and constant. It is also possible to vary it according to certain parameters like the date, because certain calendar events are likely to influence the rates of anomalies and frauds (holidays, weekends ...). On those events where fraud is more prevalent, the threshold will be increased to obtain a constant number of "at risk" transactions (assuming that human resources remain constant). Of course, the present invention is not limited to the examples and to the embodiment described and shown, but it is capable of numerous variants accessible to those skilled in the art.

Claims

A method for detecting anomalies in a set of payment transactions, consisting of

- establish (E3) a meta-model formed of a set of models, each optimized on a training set to determine a risk for each transaction to be in anomaly, said metamodel being established by the technique of "gradient boosting ", so as to optimize a differentiable function expressing the average accuracy of said meta-model;

submitting (E4) said set to said metamodel, in order to determine risks for each transaction of said set, and,

2. Method according to the preceding claim, wherein said subset is presented to one or more human experts and said threshold is determined according to the number of transactions that can be processed by said one or more human experts.

3. Method according to one of the preceding claims, wherein prior to the establishment of the meta-model, a subsampling step (E2) is applied to said set of transactions, to improve the balance between transactions anomaly and legitimate transactions.

4. Method according to the preceding claim, wherein said subsampling step consists in optimizing a measurement F2.

5. Method according to the preceding claim, wherein the optimization of said measurement F2 consists in minimizing a differentiable function expressing said measurement F2. 6. Method according to one of the preceding claims, wherein said average accuracy is applied to the rank of a transaction, the transactions being ordered by level of risk. 7. Method according to the preceding claim, wherein the average accuracy AP is expressed by the equation:

with and in which F (¾) is the risk determined for

the transaction X _i , y; is equal to 1 if said transaction is in anomaly, 0 otherwise, I () is the indicator function, N is the transaction number of the training set, and r _i is the transaction rank x _i 8. Method according to claim preceding, in which said function is expressed by the equation

, with:

smoothing.

9. Computer program comprising instructions which, when executed by a processor of a computer system, result in the implementation of a compilation method according to one of claims 1 to 8.

10. Device for detecting anomalies comprising means for implementing the method according to one of claims 1 to 8.