EP3574462A1

EP3574462A1 - Automatic detection of frauds in a stream of payment transactions by neural networks integrating contextual information

Info

Publication number: EP3574462A1
Application number: EP17832295.4A
Authority: EP
Inventors: Mathieu GARCHERY; Olivier CAELEN; Liyun HE-GUELTON; Michael GRANITZER; Konstantin ZIEGLER; Stefan ZWICKLBAUER
Original assignee: Worldline SA
Current assignee: Worldline SA
Priority date: 2017-01-30
Filing date: 2017-12-22
Publication date: 2019-12-04
Also published as: FR3062504A1; WO2018138423A1; CN110226179A

Abstract

The invention relates to a method for detecting fraudulent transactions in a collection of payment transactions, consisting in submitting the transactions to a classification system trained on an training set and providing for each new transaction of said collection a probability of being a fraudulent transaction, characterized in that with each transaction is associated contextual information, and in that the classification system is a neural network.

Description

AUTOMATIC DETECTION OF FRAUD IN A STREAM OF PAYMENT TRANSACTIONS BY NEURON NETWORKS INCLUDING CONTEXTUAL INFORMATION

FIELD OF THE INVENTION

The present invention relates to a mechanism for detecting anomalies in a bank transaction flow. It applies in particular to the detection of fraud.

BACKGROUND OF THE INVENTION

Fraud on banking transactions is a growing phenomenon, particularly because of the generalization of payment transactions via telecommunication networks.

When a payment transaction is authorized by a payment server, two mechanisms can be put in place: before authorization and / or after.

In the first case, we are talking about fraud detection in real time.

In the second case, it is near-real-time fraud detection. The first case has the advantage of being able to block a fraudulent transaction before it takes place, but it is subject to a strong constraint on the processing time, since it delays the finalization of the payment transaction and therefore the experience for the user. The second case makes it possible to have more time and thus to be able to put in place more complex and finer treatments.

In general, this problem is considered by rules-based techniques. Solutions have been proposed, based on different classification mechanisms. However, it is noted in the state of the art that the detection of fraud in payment systems has specific features. Therefore, conventional classification techniques do not apply directly and effectively.

First, the consequences of fraud are extremely important and very sensitive. In addition, since data on bank data and cards and other payment instruments are confidential, very little information is publicly available on the tools put in place for the detection of fraud. It is therefore difficult to compare the solutions of the state of the art.

SUMMARY OF THE INVENTION

The object of the present invention is to provide a solution at least partially overcoming the aforementioned drawbacks.

More particularly, the invention aims to provide a solution for automatic detection of fraudulent transactions in a set of transactions using contextual information, that is to say, not contained in transactions subject to processing.

To this end, the present invention provides a method for detecting fraudulent transactions in a set of payment transactions, comprising subjecting the transactions to a classification system driven on a training set and providing for each new transaction of said set a probability to be a fraudulent transaction, characterized in that each transaction is associated contextual information, and in that the classification system is a neural network. Typically, this training game can form a disjoint set of all transactions on which is then carried out the generalization, or forecast, during the operating phase classifiers trained on the training game.

According to preferred embodiments, the invention comprises one or more of the following features which can be used separately or in partial combination with one another or in total combination with one another:

said classification system uses said contextual information by means of plunging graphs;

said contextual information includes data relating to the country associated with the transaction;

said contextual information includes data relating to days off;

said classification system is based on the Word2Vec algorithm.

Another object of the invention relates to a device comprising means for implementing the method as previously described.

Other features and advantages of the invention will appear on reading the following description of a preferred embodiment of the invention, given by way of example and with reference to the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 schematically represents experimental results obtained according to one embodiment of the invention.

DETAILED DESCRIPTION OF THE INVENTION The number of frauds represents only a very small percentage of the volume of banking transactions: it is considered that the average fraud rate is of the order of 0.5%. Also, the detection of fraud corresponds to a problem of detection of anomalies, which is characterized by an unbalanced distribution between two populations (normal cases / cases in anomaly). This type of problem is very badly solved by learning mechanisms like "machine learning"

According to one embodiment of the invention, the set of transactions to be considered is modified by the deletion of cases that can be considered a priori legitimate. Thus, we can increase the balance between the two populations. This mechanism makes it possible to increase the performance of the neural network. Another specificity of payment transaction frauds (by credit card) lies in the complex nature of the problem: fraud is difficult to distinguish from legitimate transactions, and there may be overlaps between classes resulting from the classification process. In addition, fraud schemes can be used by fraudsters, generating various situations, and it is therefore difficult to detect fraud based on "signatures" of typical fraud cases.

The problem is to identify frauds among all payment transactions.

According to the invention, a classification system is set up, using machine learning techniques, in order to generate two classes: a class comprising legitimate transactions and a class comprising fraudulent transactions.

Typically, this type of mechanism is based on a learning phase and a prediction phase which consists of a generalization of the training game on which the learning phase was based.

According to the invention, the prediction of the class of a transaction takes into account various attributes associated with the transaction, among which contextual information. Taking this contextual information into account is an innovative idea in relation to the state of the art.

It can for example be a date (including time) of the transaction, its geographical location, calendar events (school holidays, holidays ....).

Attributes may also more typically include the owner of the credit card (or other payment instrument), etc.

The use of contextual information makes it possible to distinguish fraudulent transactions from legitimate transactions with greater precision.

As with any classification mechanism, a classifier is first constructed from a training game during the learning phase. Then, this classifier is used during the prediction phase to classify new transactions.

Different types of classifiers are possible, but through the use of contextual information, these can be based on a larger volume of data for each transaction and thus enrich the possibilities of determinations of a discrimination model to form two classes well identified. The invention is therefore based on the injection of contextual information into the classification mechanism. More particularly, according to one embodiment of the invention, this contextual information is injected into a neural network. Two sources of information can be considered to explain the mechanisms of the invention:

a relational database D, representing the data of the internal application;

a semantic graph G = {V, E} representing the contextual information.

Suppose furthermore that there exists an attribute j in D, for which the set of values Aj = {dj: d GD} can be identified with a subset of vectors V * _Ξ V of G. At such semantic graph allows you to structure the contextual information.

A graph or semantic network, or knowledge graph is a multi-relational oriented graph composed of entities such as nodes and links.

In the context of the invention, the integration of these graphs in the neural networks is carried out by graph embeddings, that is to say vector representations of the nodes of the semantic network, which capture the semantic properties of a particular node.

These embeddings are used to initialize a dipping layer of the neural network. During the learning phase, these dipping layers are adapted from the contextual information. For example, attributes like "country" or "year" can be found in an external graph such as the DBpedia graph. DBpedia is an academic and community project for automatic exploration and extraction of Wikipedia-derived data. Its principle is to propose a structured version and in the form of standardized data in the semantic web format of the encyclopedic contents of each encyclopaedic file.

It is thus possible to take advantage of existing models structuring contextual information. Without loss of generality, one can also assume that j = 1 and identify the values for the first attribute with the vectors in V *.

Each tuple of D has the form d = (v, d2, ..., dk) for v G V *

The problem of semantic contextual information injection is then a combination of features ("features"): finding the dimension n> 0 and vector representations GR n for all vGV *.

That is, v "captures" the semantics of v and thus improves the mechanisms of the "machine learning" classifier on D * = {(d, d2, ..., dk): d GD }.

Embeddings are n-dimensional vectors associated with concepts.

These vectors inherit certain semantic properties of the concerts, so that in particular similar concepts are associated with nearby vectors. These proximities can be easily expressed by cosine similarities.

Plungers form a well-known research area in the field of automatic language processing, to represent the semantics of words in a corpus.

For example, "word embedding" or "lexical embedding" is a method of automatic learning from deep learning (or "deep learning" in English) focusing on the learning of a representation of words. This technique makes it possible to represent the words of a dictionary by vectors in order to facilitate their semantic and syntactic analysis. Thus, each word will be represented by a vector of reals and words appearing in similar contexts will have vectors that are closer than others appearing in different contexts. This new representation makes it possible to reduce considerably the space of dimensionality (because one does not store anymore an entire dictionary but only a space of continuous vectors).

The best-known algorithm is probably the Word2Vec algorithm. A Wikipedia page is devoted to this algorithm:

https://en.wikipedia.org/wiki/Word2vec

Word2Vec is an unsupervised learning algorithm group for creating word embedding from textual documents. In order to train its embeddings, Word2Vec uses a two-layer neural network that takes raw documents without labels. The architectural model of the neural network can be based on the "continue bag of words" (CBOW) model, or on a "skipgram" architecture.

In the first case (CBOW), the entry of the model can be wi-2, wi-1, wi + 1, wi + 2, that is to say the preceding and following words of a current word wi. The output of the network and the probability of wi to be the correct word. This task can be described as the prediction of a word given its context.

In the second case (skip-gram), the model works the opposite: the input of the network is a word wi and Word2Vec predicts the context around this word: wi-2, wi-1, wi + 1, wi 2. Unlike other neural networks for natural language processing Word2Vec is very fast and can be further accelerated using parallel learning techniques. Thus, training on Wikipedia's corpus can take around 90 minutes with a personal computer equipped with a quad-core processor Intel brand running at 4x3.4 GHz, and a memory of 16 GB.

An important property of the Word2Vec algorithm is that it groups the similar word vectors together in the vector space. If learning is done on a sufficient learning set, Word2Vec produces good predictive results on the meaning of a word based on previous occurrences.

In order to obtain semantic-preserving embeddings, an embedding algorithm developed to restrict ambiguity in entities is used. Such an algorithm may be the algorithm described in the following article:

Zwicklbauer, S., Seifert, C, Granitzer, M .: Dosing - a knowledge-base-agnostic framework for entity disambiguation using semantic embeddings.

In: Sack, H., Blomqvist, E., Aquinas, M., Ghidini, C, Ponzetto, S.P., Lange, C. (eds.) The Semantic Web. Latest Advances and New Domains - 13th

International Conference, ESWC 2016, Heraklion, Crete, Greece, May 29 -

June 2, 2016, Proceedings. Reading Notes in Computer Science, Vol. 9678, pp. 182-198. Springer (2016), http://dx.doi.org/10.1007/978-3-319-34129-

3 12

According to an implementation based on this algorithm Word2Vec obtains a vector representation for each word by predicting sequences that word.

Since a given RDF graph does not contain such a type of sequences, a sequence of vk GV nodes is created by conducting a random walk from a node also chooses randomly. It is considered that the RDF graph is a non-oriented graph G = (V, E) in which the V nodes are knowledge base resources, and the E links are the properties of the knowledge base, and

x, y G V, (x, y) G E <= 3p: (x, p, y) V 3p: (y, p, x) is a triple RDF in the knowledge base.

The random walk can be performed within this graph G. When walking meet an xGV node, the identifier of this node x is added in the output result.

The node succ (x) of the node x is chosen randomly and uniformly equally among the adjacent nodes, that is to say with a uniform probability equal to 1 / Edges0f (x), with "EdgesOf (x)" a function returning the number of links of the node x, that is to say the number of links in the vector vk.

One can also introduce a random variable Xx which determines the probability of jumping to a given node if a random jump is made.

The probability of jumping from a first node to a second node x is calculated by normalizing the respective inverse link frequency IEF of the node x, IEF (x). According to experimental studies carried out by the inventors, the parameter a = 0.1 is used to perform a random jump, but a range of values between 0.05 and 0.25 seems to be suitable and to provide a good model Word2Vec.

In addition, the parameter Θ indicates the number of random steps in the graph. It is possible to use for example Θ = 5 * | E |, which in DBpedia's example provides about 50 million random walks. Higher values of this parameter do not seem to improve entity embeddings, but increase the time required for the learning phase. According to one embodiment of the invention, the approach for creating the corpus for RDF knowledge bases can be according to the following algorithm:

This principle of using contextual information conveying semantic content can be applied to other learning classification mechanisms than neural networks.

Genetic algorithms, Bayesian networks, hidden Markov models, and so on.

The curve of FIG. 1 illustrates an experimental result of implementations of the invention.

It provides an overall score correlating the accuracy (y-axis) and a recall rate (x-axis), that is, properly classified fraudulent transactions.

These curves show 4 situations corresponding to different configurations of the dipping layers of the neural network:

- reference 1 - "no external datai": no contextual information is taken into account

- reference 2 - "tx-holiday": contextual information relating to holidays is taken into account;

- reference 3 "country embed": contextual information on countries is taken into account

- reference 4 - "tx_holiday + country_embed": Contextual information about holiday days and countries are taken into account.

It should be noted that the results are actually better because of the use of contextual information, in particular by the use of countries. It can also be seen that the combined use of several types of contextual information is a delicate problem. In some cases, it appears that some combinations may even degrade the overall performance of classifiers. The combination of semantic vector representations on countries and public holiday days (holidays, school holidays ...) seems to experimentally demonstrate good results, especially on low values of the recall rate, for which a precision high can be reached. Concretely, this means that a classifier according to this embodiment of the invention obtains good results for transactions most likely to be fraudulent, which in practice represents the most common situations.

Of course, the present invention is not limited to the examples and to the embodiment described and shown, but it is capable of numerous variants accessible to those skilled in the art.

Claims

A method for detecting fraudulent transactions in a set of payment transactions, comprising subjecting the transactions to a classification system driven on a training set and providing for each new transaction of said set a probability of being a fraudulent transaction, characterized in at each transaction are associated contextual information, and in that said classification system is a neural network.

Method according to the preceding claim, wherein said classification system uses said contextual information by means of graph embeddings.

Method according to one of the preceding claims, wherein said contextual information comprises data relating to the country associated with the transaction.

Method according to one of the preceding claims, wherein said contextual information comprises data relating to days off.

Method according to one of the preceding claims, wherein said classification system is based on the Word2Vec algorithm.

Device comprising means for implementing the method according to one of the preceding claims.