RU2723448C1

RU2723448C1 - Method of calculating client credit rating

Info

Publication number: RU2723448C1
Application number: RU2019116075A
Authority: RU
Inventors: Дмитрий Леонидович Бабаев; Дмитрий Евгеньевич Умеренков; Максим Сергеевич Савченко
Original assignee: Публичное Акционерное Общество "Сбербанк России" (Пао Сбербанк)
Priority date: 2019-05-24
Filing date: 2019-05-24
Publication date: 2020-06-11
Also published as: EA038378B1; WO2020242340A1; EA201991836A1

Abstract

FIELD: data processing.SUBSTANCE: invention relates to an automated method of assessing client credit rating based on transactional activity data using a machine learning algorithm. Computer-implemented method of calculating client credit rating using machine learning model, performed by means of at least one processor and comprising steps, where client transaction data are received, containing information on at least the amount of transactions in a given time interval, transaction currency and the type of location of the transaction; processing the obtained data using a machine learning model based on a recurrent neural network (RNN) or a RNN ensemble trained on vector representations of transactional activity of clients, wherein during said processing: division of data on transactions of each client into categorical and numerical variables; converting variables, where categorical variables are vectorized and normalizing numerical variables; concatenating the converted variables and detecting a vector corresponding to the last transient activity time of the client; classification of said vector for determination of scoring score of client.EFFECT: technical result is providing automated calculation of client credit rating based on its transaction data.6 cl, 5 tbl, 5 dwg

Description

ОБЛАСТЬ ТЕХНИКИFIELD OF TECHNOLOGY

[0001] Заявленное техническое решение относится в автоматизированному способу оценки кредитного рейтинга клиентов на основании данных транзакционной активности с помощью алгоритма машинного обучения.[0001] The claimed technical solution relates to an automated method for evaluating a customers credit rating based on transactional activity data using a machine learning algorithm.

УРОВЕНЬ ТЕХНИКИBACKGROUND

[0002] Кредитный скоринг - очень важное направление для банковской отрасли из-за огромных финансовых последствий для банков. Банковская индустрия разрабатывала модели кредитного скоринга с середины XX века и усовершенствовала эти модели с тех пор, инвестируя миллионы долларов в этот процесс. Традиционные кредитно-скоринговые модели основываются на данных анкеты кредитной заявки, кредитной истории заявителей и другой различной агрегированной финансовой информации, относящейся к заявке клиента. Эти модели используют традиционные методы машинного обучения, такие как логистическая регрессия, для оценки скорингового балла клиента, который показывает вернет ли клиент кредит или нет.[0002] Credit scoring is a very important area for the banking industry due to the huge financial consequences for banks. The banking industry has been developing credit scoring models since the mid-20th century and has improved these models since then, investing millions of dollars in this process. Traditional credit scoring models are based on the data of the loan application form, credit history of applicants and other various aggregated financial information related to the client’s application. These models use traditional machine learning methods, such as logistic regression, to evaluate a client's scoring score, which indicates whether the client will return the loan or not.

[0003] Существует большое количество исследований, посвященных задаче кредитного скоринга в банковской сфере, начиная с первой половины XX века [1]. Для решения этой задачи был использован широкий спектр методов, включая логистическую регрессию [2], деревья решений [3], бустинг [4], метод опорных векторов (SVM) [5] и нейронные сети (NN/HC) [6].[0003] There are a large number of studies on the problem of credit scoring in the banking sector, starting in the first half of the 20th century [1]. To solve this problem, a wide range of methods was used, including logistic regression [2], decision trees [3], boosting [4], support vector method (SVM) [5] and neural networks (NN / HC) [6].

[0004] Методы кредитного скоринга исторически основывались на использовании анкетных данных и кредитной истории заявителя. Однако новые источники данных (телекоммуникационные данные [7] и транзакционные данные [8]-[12].) использовались в последнее время для повышения качества оценки скорингового балла.[0004] Credit scoring methods have historically been based on the use of personal data and the credit history of the applicant. However, new data sources (telecommunication data [7] and transactional data [8] - [12].) Have been used recently to improve the quality of scoring score estimation.

[0005] Большинство предыдущих подходов к оценке скорингового балла строили агрегаты на транзакционных данных либо общие на всех данных [11], либо с использованием некоторого временного окна, например, месяц [8], [9] или день [10], и большинство из них основывались на классических методах машинного обучения. Например, в работе [8] авторы использовали такие методы, как обобщенные деревья решений для задач классификации и регрессии, и применяли их на ежемесячной транзакционной статистике. В источнике [9] авторы использовали дискретные модели дожития на месячной транзакционной статистике. Кроме того, в некоторых решениях использовались нейросети для кредитного скоринга на агрегированных транзакционных данных. Например, в [10] применяется неглубокая сверточная нейронная сеть (СНС) на данных ежедневной транзакционной статистики.[0005] Most of the previous approaches to scoring score estimation built aggregates on transactional data either common to all data [11], or using some time window, for example, month [8], [9] or day [10], and most of They were based on the classical methods of machine learning. For example, in [8], the authors used such methods as generalized decision trees for classification and regression problems, and applied them to monthly transactional statistics. In the source [9], the authors used discrete survival models on monthly transactional statistics. In addition, some solutions used neural networks for credit scoring on aggregated transaction data. For example, in [10], a shallow convolutional neural network (SNA) is used on the data of daily transactional statistics.

[0006] Кроме того, в решении [12] раскрывается несколько моделей кредитного скоринга на неагрегированных транзакционных данных, но с применением классических методов машинного обучения, таких как метод опорных векторов и методы ближайших соседей. При этом, данный подход сосредоточен на связанной проблеме для оценки кредитного риска и использует только информацию о субъектах сделки, без развертывания полной мощности транзакционных данных.[0006] In addition, decision [12] disclosed several credit scoring models based on non-aggregated transactional data, but using classical machine learning methods, such as the support vector method and nearest neighbor methods. Moreover, this approach focuses on a related problem for assessing credit risk and uses only information about the subjects of the transaction, without deploying the full power of transactional data.

[0007] В работе [13] авторы использовали рекуррентную нейронную сеть (РНС) с долговременной памятью (LSTM/ Long short-term memory) [14], построенную на отдельных признаках каждой транзакции, для обнаружения мошеннических транзакций. Для обзора методов нейронных сетей с целью обнаружения мошенничества с кредитными картами см. [15]. В [16] авторы применили рекуррентную нейронную сеть с долговременной памятью (LSTM РНС) для прогнозирования кредитных рейтингов для р2р платформы кредитования.[0007] In [13], the authors used a recurrent neural network (RNS) with long-term memory (LSTM / Long short-term memory) [14], built on the individual features of each transaction, to detect fraudulent transactions. For a review of neural network techniques to detect credit card fraud, see [15]. In [16], the authors used a recurrent neural network with long-term memory (LSTM RNS) to predict credit ratings for a P2P lending platform.

[0008] Несмотря на широкое распространение и применимость, известные подходы кредитного скоринга имеют определенные ограничения. Во-первых, кредитный скоринг требует трудоемкой подготовки признаков и глубокого знания предметной области для того, чтобы хорошо разрабатывать алгоритмы машинного обучения. Во-вторых, если у клиента нет значимой кредитной истории, то это затрудняет принятие надежного и верного решения в отношении конкретного клиента. В-третьих, существующие в настоящее время модели не в полной мере используют все из имеющихся данных о клиенте в современных условиях.[0008] Despite the wide distribution and applicability, known credit scoring approaches have certain limitations. Firstly, credit scoring requires laborious preparation of features and deep knowledge of the subject area in order to develop machine learning algorithms well. Secondly, if a client does not have a significant credit history, this makes it difficult to make a reliable and correct decision regarding a particular client. Thirdly, current models do not fully use all of the available customer data in modern conditions.

РАСКРЫТИЕ ИЗОБРЕТЕНИЯSUMMARY OF THE INVENTION

[0009] В заявленном техническом решении предлагается новый подход, заключающийся в применении РНС на транзакциях (E.T.-RNN), для вычисления скорингового балла клиента с помощью изучения и обработки истории транзакций по его кредитным и дебетовым картам. Заявленный подход основывается на глубинном машинном обучении, в отличие от более традиционных методов машинного обучения, причем данный подход применим только к тем клиентам, у которых есть кредитные или дебетовые карты банка. Так как значительный процент заявителей действительно имеют кредитные или дебетовые карты, то заявленный способ работает для большого сегмента клиентов.[0009] The claimed technical solution proposes a new approach, which involves the use of RNS on transactions (E.T.-RNN), to calculate the scoring score of a client by studying and processing the history of transactions on his credit and debit cards. The claimed approach is based on in-depth machine learning, in contrast to the more traditional machine learning methods, and this approach is applicable only to those customers who have credit or debit cards of the bank. Since a significant percentage of applicants do have credit or debit cards, the claimed method works for a large segment of customers.

[0010] Кроме того, предложенный способ имеет следующие преимущества по сравнению с текущими методами оценки кредитного скоринга. Во-первых, предложенный метод на основе глубинного машинного обучения превосходит базовые алгоритмы, включая модели, используемые в настоящее время, демонстрирующие значительный финансовый эффект. Во-вторых, предлагаемая на основе глубинного машинного обучения модель работает непосредственно на сырых транзакциях клиента и не нуждается в трудоемкой подготовке признаков, требующей глубоких знаний в этой области (вручную генерируя сотни или тысячи созданных агрегатов признаков). В-третьих, заявленный способ работает исключительно на транзакционных данных и, следовательно, не требует каких-либо дополнительных вводных данных от клиента.[0010] In addition, the proposed method has the following advantages compared to current methods of evaluating credit scoring. Firstly, the proposed method based on deep machine learning surpasses the basic algorithms, including the currently used models, which demonstrate a significant financial effect. Secondly, the model proposed on the basis of deep machine learning works directly on the raw transactions of the client and does not need labor-intensive preparation of attributes, requiring in-depth knowledge in this area (manually generating hundreds or thousands of created aggregates of attributes). Thirdly, the claimed method works exclusively on transactional data and, therefore, does not require any additional input from the client.

[0011] Это означает, что реализуется возможность очень быстрого принятия решения по кредитам, в том числе в режиме реального времени, поскольку весь процесс кредитного скоринга полностью автоматизирован. В-четвертых, информацию в транзакционных данных очень трудно подделать. Следовательно, нет необходимости проверять правильность данных, в отличие от анкеты на получение кредита и некоторых других источников данных, используемых для оценки. В-пятых, даже если у клиента нет кредитной истории, его кредитоспособность может быть оценена по истории его транзакций, составляющих основной источник оценки кредитного риска данным методом.[0011] This means that the possibility of very quick decision making on loans, including in real time, is realized, since the entire process of credit scoring is fully automated. Fourth, information in transactional data is very difficult to fake. Therefore, there is no need to check the correctness of the data, unlike the questionnaire for obtaining a loan and some other data sources used for evaluation. Fifth, even if the client does not have a credit history, his creditworthiness can be assessed by the history of his transactions, which constitute the main source of credit risk assessment using this method.

[0012] Наконец, предлагаемый способ опирается на принцип справедливого подхода к оценке клиента, так как он не использует анкетную информацию о человеке, что позволяет исключить дискриминационный характер оценки по различным демографическим факторам.[0012] Finally, the proposed method is based on the principle of a fair approach to the assessment of the client, since he does not use personal information about the person, which eliminates the discriminatory nature of the assessment for various demographic factors.

[0013] Таким образом, решается техническая проблема автоматизированного расчета кредитного рейтинга клиента с высокой степенью достоверности прогнозируемых расчетов.[0013] Thus, the technical problem of the automated calculation of the customer’s credit rating is solved with a high degree of reliability of the predicted settlements.

[0014] Техническим результатом от реализации заявленного способа является обеспечение автоматизированного расчета кредитного рейтинга клиента на основании его транзакционных данных.[0014] The technical result from the implementation of the claimed method is the provision of an automated calculation of the credit rating of a client based on its transaction data.

[0015] В предпочтительном варианте реализации заявлен компьютерно-реализуемый способ расчета кредитного рейтинга клиента с помощью модели машинного обучения, выполняемый с помощью по меньшей мере одного процессора и содержащий этапы, на которых:[0015] In a preferred embodiment, a computer-implemented method for calculating a client’s credit rating using a machine learning model, performed using at least one processor and comprising the steps of:

получают данные клиентских транзакций, содержащие информацию по меньшей мере о сумме транзакций в заданный временной промежуток, валюте транзакций и типе места осуществлении транзакции;receive client transaction data containing information about at least the amount of transactions in a given time period, transaction currency and type of transaction location;

осуществляют обработку полученных данных с помощью модели машинного обучения на базе рекуррентной нейронной сети (РНН) или ансамбля РНН, обученной на векторных представлениях транзакционной активности клиентов, причем в ходе указанной обработки осуществляется:they process the obtained data using a machine learning model based on a recurrent neural network (RNN) or an RNN ensemble trained on vector representations of transactional activity of clients, and during this processing it is carried out:

разделение данных по транзакциям каждого клиента на категориальные и численные переменные;dividing transaction data of each client into categorical and numerical variables;

преобразование переменных, при котором выполняется векторизация категориальных переменных и нормализация численных переменных;transformation of variables, in which vectorization of categorical variables and normalization of numerical variables are performed;

конкатенация преобразованных переменных и выявление вектора, соответствующего последнему временному промежутку транзакционной активности клиента;concatenation of the transformed variables and identification of the vector corresponding to the last time period of the client’s transactional activity;

классификация упомянутого вектора для определения скорингового балла клиента.classification of the mentioned vector to determine the scoring score of the client.

[0016] В одном из частных вариантов реализации способа информация клиентских транзакций дополнительно включает в себя тип карты, используемой для совершения транзакций, дату и время совершения транзакций.[0016] In one particular embodiment of the method, client transaction information further includes the type of card used to complete the transaction, the date and time of the transaction.

[0017] В другом частном варианте реализации способа РНН включает в себя слой векторизации (эмбеддинг слой), рекуррентный кодировщик и классификатор.[0017] In another particular embodiment of the method, the PHN includes a vectorization layer (embedding layer), a recurrent encoder, and a classifier.

[0018] В другом частном варианте реализации способа рекуррентный кодировщик представляет собой однослойную РНН на основе управляемого рекуррентного блока.[0018] In another particular embodiment of the method, the recurrent encoder is a single layer RNN based on a managed recurrence block.

[0019] В другом частном варианте реализации способа на основании данных клиентских транзакций определяется разница в днях между временем текущей транзакции и временем предыдущей транзакции, а также время в днях, прошедшее с даты выпуска карты до даты транзакции.[0019] In another particular embodiment of the method, the difference in days between the time of the current transaction and the time of the previous transaction, as well as the time in days elapsed from the date of issue of the card to the date of the transaction, is determined based on client transaction data.

[0020] В другом частном варианте реализации способа ансамбль РНН включает шесть нейронных сетей, каждая из которых обучается на разных подвыборках исходных данных.[0020] In another particular embodiment of the method, the RNN ensemble includes six neural networks, each of which is trained on different subsamples of the source data.

КРАТКОЕ ОПИСАНИЕ ЧЕРТЕЖЕЙBRIEF DESCRIPTION OF THE DRAWINGS

[0021] Фиг. 1 иллюстрирует архитектуру РНН.[0021] FIG. 1 illustrates RNN architecture.

[0022] Фиг. 2 иллюстрирует график с методами регуляризации.[0022] FIG. 2 illustrates a graph with regularization methods.

[0023] Фиг. 3 иллюстрирует график сравнения качества ансамблирования.[0023] FIG. 3 illustrates a graph comparing ensemble quality.

[0024] Фиг. 4 иллюстрирует график кривой обучения моделей машинного обучения.[0024] FIG. 4 illustrates a graph of the learning curve of machine learning models.

[0025] Фиг. 5 иллюстрирует график с количеством транзакций по клиенту.[0025] FIG. 5 illustrates a graph with the number of transactions per customer.

ОСУЩЕСТВЛЕНИЕ ИЗОБРЕТЕНИЯDETAILED DESCRIPTION OF THE INVENTION

[0026] На Фиг. 1 представлена архитектура представленной модели глубинного машинного обучения - рекуррентной нейронной сети, обрабатывающей транзакционные данные клиентов для расчета их кредитного рейтинга. Заявленная архитектура РНН состоит из трех основных частей: слой векторизации (эмбеддинг слой), рекуррентный кодировщик и классификатор (линейный классификатор).[0026] In FIG. Figure 1 presents the architecture of the presented model of deep machine learning - a recurrent neural network that processes the transaction data of clients to calculate their credit rating. The declared RNN architecture consists of three main parts: a vectorization layer (embedding layer), a recursive encoder, and a classifier (linear classifier).

[0027] Рекуррентные нейронные сети (РНН), как правило, используются для обработки последовательной информации. В некотором смысле, РНН обладают «памятью» по сравнению с предыдущими вычислениями и используют информацию с предыдущих временных шагов в дополнение к текущему вводу для получения следующего вывода. Этот подход подходит для многих задач естественной обработки языка - далее методы НЛП, включая классификацию текста, машинный перевод и языковое моделирование.[0027] Recursive neural networks (RNNs) are typically used to process sequential information. In a sense, RNNs have “memory” compared to previous calculations and use information from previous time steps in addition to the current input to get the next output. This approach is suitable for many tasks of natural language processing - hereinafter NLP methods, including text classification, machine translation and language modeling.

[0028] Заявленный способ вычисляет скоринговый балл для оценки кредитного рейтинга, используя транзакционные данные каждого клиента, имеющего несколько транзакций по кредитным или дебетовым картам. Каждая транзакция имеет несколько признаков, как категориальных, так и численных, и происходит в определенное время. Выбранные данные можно описать как данные разных временных рядов, схема которых представлена в Таблице 1. Поле тип продавца представляет, например, авиакомпанию, гостиницу, ресторан и т.д.

[0028] The claimed method calculates a scoring score for evaluating a credit rating using the transaction data of each client having multiple transactions on credit or debit cards. Each transaction has several attributes, both categorical and numerical, and occurs at a specific time. The selected data can be described as data of different time series, the scheme of which is presented in Table 1. The seller type field represents, for example, an airline, hotel, restaurant, etc.

[0029] Представленная архитектура транзакционных векторных представлений рекуррентных нейронных сетей (E.T.-RNN), представленная на Фиг. 1, основана на методах НЛП в контексте глубокого обучения, в которой задача оценки кредитного скоринга, решается как задача классификации текста, используя клиентов как тексты, а транзакции как отдельные слова.[0029] The presented architecture of transactional vector representations of recurrent neural networks (E.T.-RNN) shown in FIG. 1, is based on NLP methods in the context of deep learning, in which the task of assessing credit scoring is solved as the task of classifying text using clients as texts and transactions as separate words.

[0030] Рассмотрим более подробно представленную архитектуру РНН и принцип ее работы вычисления кредитного рейтинга клиентов. Слой векторизации или слой формирования векторных представлений (эмбедцингов) предназначен для отображения транзакций по платежным картам в виде векторов в латентном пространстве (векторами в латентном пространстве называются вектора, которые не могут быть получены в явном виде, а только выведены через математические модели) перед их передачей в кодировщик РНН. В частности, каждая категориальная переменная в каждой транзакции кодируется в низкоразмерный вектор через соответствующий слой векторизации (эмбеддинг-слой). Эмбеддинг-слои в начале обучения инициализируются случайным образом и обучаются одновременно с кодировщиком. Временной признак транзакции обрабатывается как набор категориальных переменных, каждая из которых представляет часть даты (час, день неделя, месяц). Затем каждая транзакция представляется в виде конкатенации численных переменных и векторных представлений категориальных переменных.[0030] Consider the RNN architecture in more detail and the principle of its operation of calculating the credit rating of customers. The vectorization layer or the layer of the formation of vector representations (embeddings) is intended to display transactions on payment cards in the form of vectors in latent space (vectors in latent space are vectors that cannot be obtained explicitly, but only derived through mathematical models) before transferring them to the RNN encoder. In particular, each categorical variable in each transaction is encoded into a low-dimensional vector through the corresponding vectorization layer (embedding layer). Embedding layers at the beginning of training are initialized randomly and are trained simultaneously with the encoder. The time sign of the transaction is processed as a set of categorical variables, each of which represents a part of the date (hour, day, week, month). Then each transaction is represented as a concatenation of numerical variables and vector representations of categorical variables.

[0031] В качестве кодировщика используется однослойная РНН на основе управляемого рекуррентного блока (GRU) [17]. Скрытый вектор (вектор латентного пространства) с последнего временного шага рекуррентного кодировщика использовался как представление клиента.[0031] A single-layer TRN based on a controlled recurrence unit (GRU) [17] is used as an encoder. The hidden vector (latent space vector) from the last time step of the recursive encoder was used as a representation of the client.

[0032] Классификатор. Скрытый вектор с последнего временного шага передается в полносвязный слой для классификации. В ходе экспериментов было установлено, что простой линейный классификатор превзошел несколько альтернативных подходов, и поэтому его использование в архитектуре было наиболее целесообразно.[0032] The classifier. The hidden vector from the last time step is transferred to a fully connected layer for classification. During the experiments, it was found that a simple linear classifier exceeded several alternative approaches, and therefore its use in architecture was most appropriate.

[0033] В работе заявленного способа используется стандартная характеристика качества модели - площадь под ROC-кривой (ROC - рабочая характеристика приемника). Несколько функций потерь могут использоваться в качестве альтернативы для задачи максимизации ROC AUC, включая классические функции бинарной кросс-энтропии: L_CE(p, y)=-∑_iy_ilogp_i и функции маржинальных ранжирующих потерь:L_R(p₁, p₂, y)=max(0,-y(р₁-р₂)+маржа), которые напрямую оптимизирует ROC AUC. Наилучшие результаты обработки данных были получены при использовании функции маржинальных ранжирующих потерь с маржой равной 0,1.[0033] In the work of the claimed method, a standard characteristic of the quality of the model is used — the area under the ROC curve (ROC is the operating characteristic of the receiver). Several loss functions can be used as an alternative to the ROC AUC maximization problem, including the classical binary cross-entropy functions: L _CE (p, y) = - ∑ _i y _i logp _i and the marginal ranking loss functions: L _R (p ₁ , p ₂ , y) = max (0, -y (p ₁ -p ₂ ) + margin), which are directly optimized by the ROC AUC. The best data processing results were obtained using the margin ranking loss function with a margin of 0.1.

[0034] Вместо модели машинного обучения на базе одной РНН может применяться принцип ансамблирования, который предназначен для повышения качества модели и ее стабильности незначительно проигрывая одной РНН во времени и вычислительной мощности. Поскольку есть достаточное количество примеров отрицательного класса (клиенты, у которых было событие дефолта для потребительского кредита в течение года после его выдачи), то, следовательно, можно использовать разные подвыборки примеров отрицательного класса для обучения каждой модели в ансамбле нейронных сетей.[0034] Instead of a machine learning model based on one RNN, the ensemble principle can be applied, which is designed to improve the quality of the model and its stability by slightly losing one RNN in time and processing power. Since there are a sufficient number of examples of the negative class (customers who have had a default event for a consumer loan within a year after its issuance), it is therefore possible to use different subsamples of examples of the negative class to train each model in an ensemble of neural networks.

[0035] В финальной версии модели используются средние значения прогнозов ансамбля из шести отдельно обученных моделей в качестве практического баланса между качеством прогнозирования и временем выполнения обучения модели. Повышение качества ансамбля и другие возможные стратегии ансамбля дополнительно будут рассмотрены далее в материалах заявки.[0035] The final version of the model uses the average values of the ensemble forecasts of six separately trained models as a practical balance between the quality of forecasting and the time it takes to complete the model training. Improving the quality of the ensemble and other possible ensemble strategies will be further discussed later in the application materials.

[0036] Данные, использованные для экспериментов, были предоставлены банковским сектором. Для экспериментов использовались транзакционные данные для клиентов, которые подали заявки на розничные кредиты. В итоговой выборке использовались данные только тех заявителей, которые уже использовали в банке продукт дебетовой или кредитной карты. Если у клиента несколько карт, то учитываются транзакции с каждой карты.[0036] The data used for the experiments were provided by the banking sector. For the experiments, transaction data was used for customers who applied for retail loans. In the final sample, data were used only from those applicants who had already used a debit or credit card product at the bank. If the client has several cards, then transactions from each card are taken into account.

[0037] Доступные транзакционные данные подразделяются на подкатегории: уровень транзакции (например, метка времени, страна, сумма, тип продавца) и уровень карты (например, филиал выдачи, тип карты). Данные на уровне карты дублируются дословно для каждой транзакции, связанной с соответствующей картой. Пример трех типичных операций с картами представлен в Таблице 1. Также использовались две производные функции, рассчитанные на основе данных транзакций:[0037] The available transactional data is divided into subcategories: transaction level (for example, timestamp, country, amount, type of seller) and card level (for example, branch of issue, type of card). Data at the card level is duplicated verbatim for each transaction associated with the corresponding card. An example of three typical card transactions is presented in Table 1. Two derivative functions calculated on the basis of transaction data were also used:

- разница в днях между временем текущей транзакции и временем предыдущей транзакции этого клиента;- the difference in days between the time of the current transaction and the time of the previous transaction of this client;

- время в днях, прошедшее с даты выпуска карты до даты транзакции.- time in days elapsed from the date of issue of the card to the date of transaction.

Только транзакции, выполненные до даты подачи заявки, принимаются для обучения и проверки.Only transactions completed before the filing date of the application are accepted for training and verification.

[0038] Обучающий набор данных представлял более 740 тысяч клиентов с общим количеством транзакций равным 200 миллионам. В качестве целевой переменной использовалось событие дефолта для потребительского кредита в течение года после его выдачи. Период в один год был выбран с помощью атрибута окна производительности [18]. Из-за риска нестационарности данных использовалась стратегия валидации вне времени. При этом результаты для валидации вне периода были последовательно выше, чем результаты для валидации вне времени для ряда архитектур и гиперпараметров.[0038] The training data set represented more than 740 thousand customers with a total number of transactions equal to 200 million. The default event for a consumer loan was used as the target variable during the year after its issuance. A period of one year was selected using the performance window attribute [18]. Due to the risk of non-stationary data, a timeless validation strategy was used. Moreover, the results for validation outside the period were consistently higher than the results for validation out of time for a number of architectures and hyperparameters.

[0039] Использовалось подмножество кредитных заявок из 16-месячного периода для обучения и четырехмесячного периода для валидации (подход валидация вне времени). Наборы для обучения и проверки были одинаковыми для каждой рассматриваемой модели и базовой модели. Из-за большого расхождения между количеством положительных и отрицательных примеров (из-за низкой ставки дефолта в банке) мы остановились на следующей стратегии недостаточной выборки: перед каждым экспериментом выбирались все положительные примеры и в 10 раз больше случайно выбранных примеров отрицательного класса. В каждую эпоху обучения использовались все положительные примеры и равное количество отрицательных примеров, отобранных из пула отрицательных примеров.[0039] A subset of loan applications from a 16-month period for training and a four-month period for validation (timeless validation approach) were used. The sets for training and testing were the same for each model under consideration and the base model. Due to the large discrepancy between the number of positive and negative examples (due to the low default rate in the bank), we settled on the following strategy of insufficient sampling: before each experiment, all positive examples were selected and 10 times more randomly selected examples of a negative class. In each era of training, all positive examples and an equal number of negative examples were selected, selected from the pool of negative examples.

[0040] Все модели обучались по последним 800 транзакциям каждого клиента, когда они были доступны, и заполнялись нулями, когда фактическое количество транзакций для клиента было ниже.[0040] All models were trained on the last 800 transactions of each client when they were available, and filled with zeros when the actual number of transactions for the client was lower.

[0041] Чтобы сравнить созданную модель с другими подходами, была реализована модель, основанная на логистической регрессии. Также использовалась дополнительную модель, основанную на методе градиентного бустинга (GBM). Как методы логистической регрессии, так и методы градиентного бустинга GBM требуют большого количества агрегированных признаков, подготовленных вручную из транзакционных данных, в качестве входных данных для модели классификации. Примером агрегированной функции может служить средняя сумма расходов в некоторых категориях продавцов, таких как отели за всю историю транзакций.[0041] To compare the created model with other approaches, a model based on logistic regression was implemented. An additional model based on the gradient boosting (GBM) method was also used. Both logistic regression methods and GBM gradient boosting methods require a large number of aggregated attributes, prepared manually from transactional data, as input to the classification model. An example of an aggregate function is the average cost of some categories of sellers, such as hotels over the entire transaction history.

[0042] Использовался LightGBM алгоритм градиентного бустинга и было создано около 7000 агрегированных признаков, подготовленных вручную. Аналогично, для логистической регрессии было разработано около 400 агрегированных признаков. Метод оцифровки признаков по весу и разбиения на бины был использован для преобразования категориальных признаков.[0042] The LightGBM gradient boosting algorithm was used and about 7000 hand-crafted aggregated attributes were created. Similarly, about 400 aggregated attributes were developed for logistic regression. The method of digitizing features by weight and binning was used to convert categorical features.

[0043] Выбор архитектуры рекуррентного кодировщика. В ходе экспериментов с различными архитектурами рекуррентных кодировщиков использовалась длинная кратковременная память (LSTM), двунаправленные рекуррентные ячейки и управляемый реккурентный блок (GRU). Результаты этого сравнения представлены в Таблице 2. На основании этого сравнения было решено использовать однослойный управляемый рекуррентный блок GRU, потому что разница с наиболее эффективными двунаправленными моделями не была статистически значимой, при этом увеличивая сложность модели и получая заметную выгоду вычислительных ресурсов.

[0043] Selection of a recurrent encoder architecture. In experiments with various architectures for recursive encoders, long-term short-term memory (LSTM), bidirectional recurrence cells, and a controlled recurrence unit (GRU) were used. The results of this comparison are presented in Table 2. Based on this comparison, it was decided to use a single-layer controlled recurrence unit GRU, because the difference with the most efficient bidirectional models was not statistically significant, while increasing the complexity of the model and obtaining a noticeable benefit in computing resources.

[0044] Функция потери и скорость обучения. Использовался размер батча 32 для обучения и размер батча 768 для проверки для всех экспериментов. При использовании функции ранжирования потерь был введен новый гиперпараметр - размер маржи функции потери. Как указывалось выше, размер маржи функции потерь в 0,1 дает лучшие результаты среди всех гиперпараметров функции потерь, которые представлены в Таблице 3.[0044] Loss function and learning rate. The batch size 32 was used for training and the batch size 768 for verification for all experiments. When using the loss ranking function, a new hyperparameter was introduced - the size of the margin of the loss function. As mentioned above, the loss function margin of 0.1 gives the best results among all the hyperparameters of the loss function, which are presented in Table 3.

[0045] Скорость обучения, используемая в методе обучения градиентного спуска, и графика снижения скорости обучения являются одними из наиболее чувствительных гиперпараметров, которые могут кардинально изменить производительность модели. При этом график оптимальной скорости обучения сильно зависит от используемой функции потерь, размера батча и общего количества параметров в модели. Было апробировано несколько режимов обучения и несколько режимов снижения скорости обучения и обнаружили, что как для функции потери бмнарной кросс-энтропии (ВСЕ), так и для функции потери ранжирования наиболее эффективной стратегией было агрессивное линейное снижение скорости обучения с параметром гамма =0,5, как показано в Таблице 4.[0045] The learning speed used in the gradient descent learning method and the graph for reducing the learning speed are some of the most sensitive hyperparameters that can dramatically change model performance. At the same time, the graph of the optimal learning speed strongly depends on the used loss function, the size of the batch and the total number of parameters in the model. Several training modes and several modes of reducing the learning speed were tested and found that both for the loss function of bmnar cross-entropy (ALL) and for the function of losing ranking, the most effective strategy was an aggressive linear decrease in the learning speed with gamma = 0.5, as shown in Table 4.

[0046] Методы регуляризации. Из-за низкого количества положительных примеров все модели демонстрируют склонность к переобучению. Поэтому для регуляризации были апробированы различные типы дропаута (в процессе обучения нейронной сети выбирается слой, из которого случайным образом выбрасывается определенное количество нейронов, которые выключаются из дальнейших вычислений), такие как:[0046] Regularization methods. Due to the low number of positive examples, all models show a tendency to retrain. Therefore, for the regularization, various types of dropout were tested (in the process of training the neural network, a layer is selected from which a certain number of neurons are randomly thrown, which are turned off from further calculations), such as:

- Дропаут транзакциий, который случайным образом отбрасывает некоторые клиентские транзакции с определенной вероятностью.- A dropout of transactions that randomly drops some client transactions with a certain probability.

- Перестановка транзакций, которая случайным образом переставляет порядок клиентских транзакций.- Transaction permutation, which randomly rearranges the order of client transactions.

- Дропаут после эмбеддинг слоя, которое случайным образом обнуляет некоторые компоненты после эмбеддинг слоя.- Dropout after embedding a layer, which randomly resets some components after embedding a layer.

При этом ни один из вышеупомянутых методов регуляризации не оказался эффективным против переобучения, что представлено на Фиг. 2.However, none of the above regularization methods was effective against retraining, as shown in FIG. 2.

[0047] Методы ансамблирования. Тестировалось несколько различных типов методов ансамблирования:[0047] Ensemble techniques. Several different types of ensemble methods were tested:

- Простое усреднение результатов модели. Усреднение прогнозов для различных моделей, обученных с использованием различных отрицательных примеров, приводит как к повышенной точности, так и к снижению вариабельности результатов, как показано на Фиг. 3.- Simple averaging of model results. Averaging forecasts for different models, trained using various negative examples, leads to both increased accuracy and reduced variability of the results, as shown in FIG. 3.

- Стохастическое усреднение веса (SWA). Усреднение весов ансамблевых моделей может значительно сократить время вывода, поскольку вместо всего ансамбля используется только одна модель с усредненными весами. Но в данном случае усреднение весов разных моделей приводит к заметному снижению качества.- Stochastic weight averaging (SWA). Averaging the weights of ensemble models can significantly reduce the output time, since instead of the entire ensemble, only one model with averaged weights is used. But in this case, averaging the weights of different models leads to a noticeable decrease in quality.

- Ансамбль снимков (копий весов модели сохраненных в процессе обучения). Использование снимков одной и той же модели (модели с разными наборами весов, при которых достигаются локальные минимумы функции потерь) в окончательном ансамбле может значительно сократить время обучения, поскольку следует обучать только одну модель. К сожалению, этот подход не выигрывает от использования отдельных примеров отрицательных классов- An ensemble of snapshots (copies of model weights stored during training). Using images of the same model (models with different sets of weights at which local minimums of the loss function are achieved) in the final ensemble can significantly reduce the training time, since only one model should be trained. Unfortunately, this approach does not benefit from using separate examples of negative classes.

- SWA + ансамбль снимков. Было установлено, что объединение SWA с ансамблем снимков для обучения одной модели путем создания снимка после заданной эпохи и усреднения весов приводит к некоторому снижению изменчивости, но результаты были достаточно слабые, вследствие чего данный метод ансамблирования не рассматривался как релевантный для применения.- SWA + ensemble of shots. It was found that combining SWA with an ensemble of images to train one model by creating a picture after a given era and averaging the weights leads to a certain decrease in variability, but the results were rather weak, as a result of which this ensemble method was not considered relevant for application.

[0048] Использовался ансамбль усреднения размера шесть для представленной архитектуры модели, обеспечивая разумный компромисс между качеством модели и временем обучения / вывода. Как упоминалось ранее, каждая модель ансамбля обучается на различных подвыборках отрицательного класса. Используется процедура недостаточной выборки, чтобы уменьшить количество отрицательных выборок. Отрицательные выборки отбираются независимо для каждой модели ансамбля, следовательно, каждая модель ансамбля обучается на несколько разных подгруппах отрицательных выборок.[0048] A size averaging ensemble of six was used for the presented model architecture, providing a reasonable compromise between model quality and training / output time. As mentioned earlier, each ensemble model is trained on different sub-samples of the negative class. An undersample procedure is used to reduce the number of negative samples. Negative samples are selected independently for each ensemble model, therefore, each ensemble model is trained on several different subgroups of negative samples.

[0049] Представленный способ на предложенной архитектуре модели оценки кредитного рейтинга был оценен на производственном конвейере банка, который для каждого клиента с дебетовой или кредитной картой. Подготовка полного ансамбля из шести моделей заняла около 4 часов на графическом процессоре Tesla P100. Необходимо около 17 минут, чтобы набрать 1 миллион клиентов на GPU Tesla P100. Время вывода линейно зависит от количества клиентов. Эти оценки использовались для принятия решений о кредитных заявках для десятков тысяч заявителей в течение одного месяца.[0049] The method presented on the proposed architecture of the credit rating model was evaluated on the bank's production line, which is for each client with a debit or credit card. The preparation of a full ensemble of six models took about 4 hours on a Tesla P100 GPU. It takes about 17 minutes to reach 1 million customers on the Tesla P100 GPU. Withdrawal time is linearly dependent on the number of customers. These estimates were used to make decisions on loan applications for tens of thousands of applicants within one month.

[0050] В Таблице 5 представлены основные результаты экспериментов с помощью применения заявленного способа на основе E.T.-RNN.[0050] Table 5 presents the main results of experiments using the claimed method based on E.T.-RNN.

[0051] Как показано в Таблице 5, заявленная архитектура E.T.-RNN значительно превзошла базовые показатели по представленным данным. Более того, одна из важнейших особенностей предлагаемого подхода заключается в том, что для его реализации нет необходимости в разработке функций в отличие от классических методов, которые в значительной степени зависят от функций, созданных вручную (например, 400 функций для логистической регрессии и 7000 функций для LGBM).[0051] As shown in Table 5, the claimed E.T.-RNN architecture significantly exceeded the baseline of the data presented. Moreover, one of the most important features of the proposed approach is that for its implementation there is no need to develop functions unlike classical methods, which largely depend on manually created functions (for example, 400 functions for logistic regression and 7000 functions for LGBM).

[0052] Обратите внимание, что результаты, представленные в Таблице 5, были достигнуты для полного набора данных, представленного в Таблице 1. Был также проведен ряд экспериментов для оценки производительности модели для различных наборов данных. Как показано на Фиг. 4, LGBM превосходит метод на основе E.T.-RNN для небольших объемов данных, измеряемых количеством кредитных заявок клиентов (по оси X). Однако, учитывая достаточное количество данных, метод E.T.-RNN значительно превосходит классические подходы. Это наблюдение согласуется с хорошо известным пониманием того, что нейронные сети превосходят классические методы на больших наборах данных. Также отметим, что у E.T.-RNN более крутая кривая обучения, чем у градиентного бустинга. Следовательно, разрыв в производительности увеличится еще больше с увеличением количества доступных данных.[0052] Note that the results presented in Table 5 were achieved for the complete data set shown in Table 1. A series of experiments were also conducted to evaluate the performance of the model for various data sets. As shown in FIG. 4, LGBM is superior to the E.T.-RNN-based method for small amounts of data, measured by the number of customer loan applications (X axis). However, given the sufficient amount of data, the E.T.-RNN method is significantly superior to classical approaches. This observation is consistent with the well-known understanding that neural networks are superior to classical methods on large data sets. Also note that E.T.-RNN has a steeper learning curve than gradient boosting. Therefore, the performance gap will widen even more with the amount of data available.

[0053] Количество транзакций. Производительность модели E.T.-RNN сильно зависит от количества доступных транзакций на клиента. Как показано на Фиг. 5, качество оценки возрастает, пока не будет достигнуто количество данных в размере ~350 транзакций. За пределами этого уровня увеличение производительности из-за дополнительных транзакций является достаточно незначительным. Кроме того, доля клиентов, имеющих более 350 транзакций, составляет около 50 процентов для указанного набора данных. Это означает, что предложенная модель достигает значительного успеха при оценке клиентов банка. С другой стороны, предложенный метод все еще эффективен даже для заявителей с небольшим количеством транзакций. Для клиентов с более чем 25 транзакциями (около 95 процентов от общего числа клиентов) получено значение 82,5 ROC-AUC.[0053] The number of transactions. The performance of the E.T.-RNN model is highly dependent on the number of transactions available per client. As shown in FIG. 5, the quality of the evaluation increases until the amount of data of ~ 350 transactions is reached. Outside of this level, the increase in productivity due to additional transactions is quite small. In addition, the proportion of customers with more than 350 transactions is about 50 percent for the specified data set. This means that the proposed model achieves significant success in evaluating bank customers. On the other hand, the proposed method is still effective even for applicants with a small number of transactions. For customers with more than 25 transactions (about 95 percent of the total number of customers), a value of 82.5 ROC-AUC is obtained.

[0054] Предложенный способ обеспечивает хорошее качество расчета кредитного рейтинга по следующим причинам:[0054] The proposed method provides a good quality calculation of the credit rating for the following reasons:

1) Достаточно большое количество клиентов в обучающем наборе данных. Нейронные сети имеют много доступных для изучения параметров по сравнению с классическими подходами и, следовательно, требуют больше данных, чем классические методы, что следует из графика на Фиг. 4.1) A sufficiently large number of clients in the training data set. Neural networks have many parameters that are available for study compared to classical approaches and, therefore, require more data than classical methods, which follows from the graph in FIG. 4.

2) Низкоуровневые, детализированные данные, применяемые для работы заявленного способа, можно описать как последовательность событий, и каждое событие состоит из нескольких переменных.2) Low-level, detailed data used for the operation of the claimed method can be described as a sequence of events, and each event consists of several variables.

3) Высокочастотные данные. Как обсуждалось ранее, более 80 процентов клиентов имеют как минимум 100 транзакций.3) High frequency data. As discussed earlier, more than 80 percent of customers have at least 100 transactions.

[0056] Таким образом, предложенный новый способ для автоматизированной оценки кредитного рейтинга с помощью модели E.T.-RNN, позволяет использовать детальные транзакционные данные для кредитного скоринга. Проведенные испытания на соответствие эталонам на исторических данных показали высокие показатели.[0056] Thus, the proposed new method for automated credit rating assessment using the E.T.-RNN model allows the use of detailed transactional data for credit scoring. Tests for compliance with the standards on historical data showed high rates.

[0057] Существенным преимуществом заявленного подхода является то, что даже сложные многомерные данные временных рядов могут быть непосредственно использованы для обучения без какой-либо необходимости в проектировании функций. Поскольку нейронная сеть изучает значимые внутренние представления входных данных во время обучения, то это позволяет снизить необходимость генерировать сотни или даже тысячи агрегированных признаков, созданных вручную, как это обычно делается в кредитном скоринге.[0057] A significant advantage of the claimed approach is that even complex multidimensional time series data can be directly used for training without any need to design functions. Since the neural network studies the significant internal representations of the input data during training, this reduces the need to generate hundreds or even thousands of aggregated attributes created manually, as is usually done in credit scoring.

[0058] Таким образом, заявленный способ не требует каких-либо значительных знаний в конкретной области для разработки признаков. Кроме того, предложенная модель E.T.-RNN работает исключительно на транзакционных данных и, следовательно, не требует каких-либо дополнительных данных от клиента, что означает, что появляется возможность принятия решений по кредитам очень быстро, в идеале почти в реальном времени, потому что весь процесс кредитного скоринга полностью автоматизирован. Кроме того, информацию в транзакционных данных исключительно трудно подделать. Следовательно, нет необходимости в дорогостоящих проверках правильности таких данных, в отличие от данных, предоставленных клиентом или полученных из некоторых других источников.[0058] Thus, the claimed method does not require any significant knowledge in a particular area for the development of features. In addition, the proposed ET-RNN model works exclusively on transactional data and, therefore, does not require any additional data from the client, which means that it becomes possible to make decisions on loans very quickly, ideally almost in real time, because the whole The credit scoring process is fully automated. In addition, information in transactional data is extremely difficult to fake. Therefore, there is no need for costly verifications of the correctness of such data, unlike data provided by the client or obtained from some other sources.

[0059] Еще одним преимуществом заявленного способа является то, что даже клиент без кредитной истории может быть надежно доступен для оценки кредитоспособности, поскольку история транзакций такого клиента является источником для оценки кредитных рисков. Также, заявленный способ обеспечивает справедливый подход к принятию решений по кредитам, поскольку он не опирается на личную демографическую информацию человека и, следовательно, не может дискриминировать заявителей на основании различных демографических факторов.[0059] Another advantage of the claimed method is that even a client without a credit history can be reliably available for assessing creditworthiness, since the transaction history of such a client is a source for assessing credit risks. Also, the claimed method provides a fair approach to making decisions on loans, since it does not rely on a person’s personal demographic information and, therefore, cannot discriminate against applicants on the basis of various demographic factors.

Источники информации:Sources of information:

1) David Durand. 1941. Credit-Rating Formulae. NBER, 83-91. http://www.nber.org/chapters/c12952:1) David Durand. 1941. Credit-Rating Formulae. NBER, 83-91. http://www.nber.org/chapters/c12952:

2) John C. Wiginton. 1980. A Note on the Comparison of Logit and Discriminant Models of Consumer Credit Behavior. Journal of Financial and Quantitative Analysis 15, 03 (1980), 757-770. https://EconPapers.repec.org/RePEc:cup:jfinqa:v:15:y:1980:i:03:p:757-770_00:2) John C. Wiginton. 1980. A Note on the Comparison of Logit and Discriminant Models of Consumer Credit Behavior. Journal of Financial and Quantitative Analysis 15, 03 (1980), 757-770. https://EconPapers.repec.org/RePEc:cup:jfinqa:v:15:y:1980:i:03:p:757-770_00:

3) Paul Makowski. 1985. Credit scoring branches out. Credit World 75,1 (1985), 30-37;3) Paul Makowski. 1985. Credit scoring branches out. Credit World 75.1 (1985), 30-37;

4)

Bastos. 2008. Credit scoring with boosted decision trees;4)

Bastos. 2008. Credit scoring with boosted decision trees;

5) Cheng-Lung Huang, Mu-Chen Chen, and Chieh-Jen Wang. 2007. Credit scoring with a data mining approach based on support vector machines. Expert Systems with Applications 33,4 (2007), 847-856. https://doi.org/l0.1016/j.eswa.2006.07.007;5) Cheng-Lung Huang, Mu-Chen Chen, and Chieh-Jen Wang. 2007. Credit scoring with a data mining approach based on support vector machines. Expert Systems with Applications 33.4 (2007), 847-856. https://doi.org/l0.1016/j.eswa.2006.07.007;

6) DavidWest. 2000. Neural network credit scoring models. Computers & Operations Research 27, 11-12 (2000), 1131-1152;6) DavidWest. 2000. Neural network credit scoring models. Computers & Operations Research 27, 11-12 (2000), 1131-1152;

7) Daniel

and Darrell Grissen. 2017. Behavior revealed in mobile phone usage predicts loan repayment. arXiv preprint arXiv:1712.05840 (2017);7) Daniel

and Darrell Grissen. 2017. Behavior revealed in mobile phone usage predicts loan repayment. arXiv preprint arXiv: 1712.05840 (2017);

8) Amir E Khandani, Adlar J Kim, and Andrew W Lo. 2010. Consumer credit-risk models via machine-learning algorithms. Journal of Banking & Finance 34, 11 (2010), 2767-2787;8) Amir E Khandani, Adlar J Kim, and Andrew W Lo. 2010. Consumer credit-risk models via machine-learning algorithms. Journal of Banking & Finance 34, 11 (2010), 2767-2787;

9) Tony Bellotti and Jonathan Crook. 2013. Forecasting and stress testing credit card default using dynamic models. International Journal of Forecasting 29, 4 (2013), 563-574;9) Tony Bellotti and Jonathan Crook. 2013. Forecasting and stress testing credit card default using dynamic models. International Journal of Forecasting 29, 4 (2013), 563-574;

10)

Kvamme, Nikolai Sellereite, Kjersti Aas, and Steffen Sjursen. 2018. Predicting mortgage default using convolutional neural networks. Expert Systems with Applications 102 (2018), 207 - 217. https://doi.org/10.1016/j.eswa.2018.02.029;ten)

Kvamme, Nikolai Sellereite, Kjersti Aas, and Steffen Sjursen. 2018. Predicting mortgage default using convolutional neural networks. Expert Systems with Applications 102 (2018), 207 - 217. https://doi.org/10.1016/j.eswa.2018.02.029;

11) Bo-Wen Chi and Chiun-Chieh Hsu. 2012. A hybrid approach to integrate genetic algorithm into dual scoring model in enhancing the performance of credit scoring model. Expert Systems with Applications 39, 3 (2012), 2650-2661;11) Bo-Wen Chi and Chiun-Chieh Hsu. 2012. A hybrid approach to integrate genetic algorithm into dual scoring model in enhancing the performance of credit scoring model. Expert Systems with Applications 39, 3 (2012), 2650-2661;

12) Ellen Tobback and David Martens. 2017. Retail credit scoring using fine-grained payment data. Working Papers. University of Antwerp, Faculty of Business and Economics. https://EconPapers.repec.org/RePEc:ant:wpaper:2017011;12) Ellen Tobback and David Martens. 2017. Retail credit scoring using fine-grained payment data. Working Papers. University of Antwerp, Faculty of Business and Economics. https://EconPapers.repec.org/RePEc:ant:wpaper:2017011;

13)

Wiese and Christian Omlin. 2009. Credit Card Transactions, Fraud Detection, and Machine Learning: Modelling Time with LSTM Recurrent Neural Networks. Springer Berlin Heidelberg, Berlin, Heidelberg, 231-268. https://doi.org/10.1007/978-3-642-04003-0_10;13)

Wiese and Christian Omlin. 2009. Credit Card Transactions, Fraud Detection, and Machine Learning: Modeling Time with LSTM Recurrent Neural Networks. Springer Berlin Heidelberg, Berlin, Heidelberg, 231-268. https://doi.org/10.1007/978-3-642-04003-0_10;

14) Felix A Gers,

Schmidhuber, and Fred Cummins. 1999. Learning to forget: Continual prediction with LSTM. (1999);14) Felix A Gers,

Schmidhuber, and Fred Cummins. 1999. Learning to forget: Continual prediction with LSTM. (1999);

15) Aisha Abdallah, Mohd Aizaini Maarof, and Anazida Zainal. 2016. Fraud detection system: A survey. Journal of Network and Computer Applications 68 (2016), 90-113;15) Aisha Abdallah, Mohd Aizaini Maarof, and Anazida Zainal. 2016. Fraud detection system: A survey. Journal of Network and Computer Applications 68 (2016), 90-113;

16) Yishen Zhang, DongWang, Yuehui Chen, Huijie Shang, and Qi Tian. 2017. Credit Risk Assessment Based on Long Short-Term Memory Model. In International Conference on Intelligent Computing. Springer, 700-712;16) Yishen Zhang, DongWang, Yuehui Chen, Huijie Shang, and Qi Tian. 2017. Credit Risk Assessment Based on Long Short-Term Memory Model. In International Conference on Intelligent Computing. Springer, 700-712;

17) Kyunghyun Cho, Bart van Merrienboer, Caglar

Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. CoRR abs/1406.1078 (2014). arXiv:1406.1078 http://arxiv.org/abs/1406.1078;17) Kyunghyun Cho, Bart van Merrienboer, Caglar

Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. CoRR abs / 1406.1078 (2014). arXiv: 1406.1078 http://arxiv.org/abs/1406.1078;

18) Naeem Siddiqi. 2005. Credit Risk Scorecards: Developing And Implementing Intelligent Credit Scoring. (2005).18) Naeem Siddiqi. 2005. Credit Risk Scorecards: Developing And Implementing Intelligent Credit Scoring. (2005).

Claims

1. A computer-implemented method for calculating a client’s credit rating using a machine learning model, performed using at least one processor and comprising stages in which:

- receive client transaction data containing information at least about the amount of transactions in a given time period, transaction currency and type of transaction location;

- they process the obtained data using a machine learning model based on a recurrent neural network (RNN) or an RNN ensemble trained on vector representations of transactional activity of clients, and during this processing it is carried out:

dividing transaction data of each client into categorical and numerical variables;

transformation of variables, in which vectorization of categorical variables and normalization of numerical variables are performed;

concatenation of the transformed variables and identification of the vector corresponding to the last time period of the client’s transactional activity;

classification of the aforementioned vector to determine the rate of customer credit rating.

2. The method according to claim 1, characterized in that the additional information of client transactions includes the type of card used to complete the transaction, the date and time of the transaction.

3. The method according to p. 1, characterized in that the RNN includes a vectorization layer, encoder and classifier.

4. The method according to p. 3, characterized in that the encoder is a single-layer RNN based on a controlled recurrence block.

5. The method according to claim 2, characterized in that based on the data of client transactions, the difference in days between the time of the current transaction and the time of the previous transaction is determined, as well as the time in days that elapsed from the card issue date to the transaction date.

6. The method according to p. 1, characterized in that the RNN ensemble includes six neural networks.