WO2021174966A1 - Risk identification model training method and apparatus - Google Patents

Risk identification model training method and apparatus Download PDF

Info

Publication number
WO2021174966A1
WO2021174966A1 PCT/CN2020/138205 CN2020138205W WO2021174966A1 WO 2021174966 A1 WO2021174966 A1 WO 2021174966A1 CN 2020138205 W CN2020138205 W CN 2020138205W WO 2021174966 A1 WO2021174966 A1 WO 2021174966A1
Authority
WO
WIPO (PCT)
Prior art keywords
transaction
scene
features
model
feature
Prior art date
Application number
PCT/CN2020/138205
Other languages
French (fr)
Chinese (zh)
Inventor
平野
Original Assignee
支付宝(杭州)信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 支付宝(杭州)信息技术有限公司 filed Critical 支付宝(杭州)信息技术有限公司
Publication of WO2021174966A1 publication Critical patent/WO2021174966A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/382Payment protocols; Details thereof insuring higher security of transaction

Definitions

  • Fig. 4 shows a schematic diagram of training a risk identification model according to an embodiment.
  • a thick solid line schematically shows the process of processing the first transaction event by the risk identification model. It can be seen that for the first transaction event, it is assumed that the corresponding first scene model is scene model 2.
  • the common feature part is input to the main model, and the main model processes it to obtain the first processing result; the first scene feature part is input to the scene model 2, and the scene model 2 processes this part of the feature to obtain the second process result.
  • the corresponding classifier 2 synthesizes the first processing result and the second processing result, and outputs the predicted risk for the first transaction event.
  • the feature acquisition module is configured to acquire newly added candidate features in the time period every predetermined time period.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computer Security & Cryptography (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed are a risk identification model training method and apparatus. The risk identification model comprises a main body model (11) and a plurality of scenario models (12-1, 12-2, ..., 12-N) corresponding to a plurality of transaction scenarios, and a plurality of classifiers (13-1, 13-2, ..., 13-N). The training method comprises: determining a first transaction scenario corresponding to a first transaction event in a training sample set, and extracting event features of the first transaction event (22); then, dividing the event features into a common feature portion and a first scenario feature portion according to a predetermined common feature set and a first scenario feature set (23); next, inputting the common feature portion into a main body model (11), inputting the first scenario feature portion into a first scenario model in the plurality of scenario models (12-1, 12-2, ..., 12-N) that corresponds to the first scenario, and obtaining a first predicted risk by means of a corresponding first classifier (24); according to the first prediction risk and a first risk label corresponding to the first transaction event, obtaining a first predicted loss corresponding to the first transaction event (25); and training a risk identification model according to the synthesis of predicted losses respectively corresponding to a plurality of sample transaction events (26).

Description

训练风险识别模型的方法及装置Method and device for training risk identification model 技术领域Technical field
本说明书一个或多个实施例涉及机器学习领域,尤其涉及训练风险识别模型的方法和装置。One or more embodiments of this specification relate to the field of machine learning, and more particularly to methods and devices for training risk identification models.
背景技术Background technique
随着计算机技术的发展,机器学习已经应用到各种各样的技术领域,用于分析、预测各种业务数据。在电子交易和电子支付已广泛使用的情况下,将人工智能应用到电子支付分析,识别其中的安全风险,成为一项重要的目标。With the development of computer technology, machine learning has been applied to various technical fields for analyzing and predicting various business data. When electronic transactions and electronic payments have been widely used, the application of artificial intelligence to electronic payment analysis and identification of security risks has become an important goal.
电子支付的安全风险,主要包括盗用风险,欺诈风险等,其中盗用风险涉及盗用账号、盗用卡、盗用付款码等情况,欺诈风险包括套现、洗钱等。一旦发生不安全的交易事件,会为用户的资金带来损失,也极大地威胁电子交易和支付平台的安全性,稳定性和用户体验。因此,对电子支付中安全风险的识别至关重要。The security risks of electronic payment mainly include the risk of embezzlement and fraud. The risk of embezzlement involves the use of account numbers, cards, and payment codes. Fraud risks include cash out and money laundering. Once an unsafe transaction event occurs, it will bring losses to users' funds, and also greatly threaten the security, stability and user experience of electronic transactions and payment platforms. Therefore, the identification of security risks in electronic payment is very important.
然而,随着电子支付平台提供的服务内容越来越多,电子支付的场景越来越纷繁复杂,各种场景下的不安全交易事件的类型也越来越多,这为各种不安全交易事件的识别带来很大的困难。另一方面,电子交易事件具有很强的时效性和攻防性,这为准确地识别交易事件的安全风险进一步增加了难度。However, as electronic payment platforms provide more and more service content, electronic payment scenarios are becoming more and more complicated, and there are more and more types of unsafe transaction events in various scenarios. This is a variety of unsafe transactions. The identification of the event brings great difficulties. On the other hand, electronic transaction events have a strong timeliness and offensive and defensive nature, which further increases the difficulty of accurately identifying the security risks of transaction events.
因此,希望能有改进的方案,更为准确有效地对交易事件的安全性进行评估,识别出不安全的交易事件。Therefore, it is hoped that there will be an improved scheme to evaluate the security of transaction events more accurately and effectively and identify unsafe transaction events.
发明内容Summary of the invention
本说明书一个或多个实施例描述了一种训练风险识别模型的方法和装置,通过多任务学习的方式,训练得到适用于多场景的风险识别模型,从而准确有效地对各种场景下的交易事件的安全性进行评估,识别出不安全的交易事件。One or more embodiments of this specification describe a method and device for training a risk identification model. Through multi-task learning, a risk identification model suitable for multiple scenarios can be trained to accurately and effectively deal with transactions in various scenarios. The security of the event is evaluated and unsafe transaction events are identified.
根据第一方面,提供了一种训练风险识别模型的方法,所述风险识别模型用于识别交易事件的安全风险,并包括主体模型,多个场景模型和对应的多个分类器,所述多个场景模型对应于多个交易场景;所述方法包括:获取训练样本集,其中包括多个样本交易事件及其各自对应的风险标签,其中所述多个样本交易事件来自于不同的交易场景; 对于所述多个样本交易事件中任意的第一交易事件,确定其对应的第一交易场景,并提取该第一交易事件的事件特征;根据预定的共有特征集和所述第一交易场景对应的第一场景特征集,将所述事件特征划分为共有特征部分和第一场景特征部分,其中所述共有特征集包括所述多个交易场景均具有的特征;将所述共有特征部分输入所述主体模型,将所述第一场景特征部分输入所述多个场景模型中与所述第一场景对应的第一场景模型,通过对应的第一分类器得到针对所述第一交易事件的第一预测风险;根据所述第一预测风险和与所述第一交易事件对应的第一风险标签,得到与该第一交易事件对应的第一预测损失;根据所述多个样本交易事件各自对应的预测损失的综合,训练所述风险识别模型。According to the first aspect, a method for training a risk identification model is provided. The risk identification model is used to identify the security risk of a transaction event, and includes a subject model, a plurality of scene models, and a plurality of corresponding classifiers. Each scenario model corresponds to multiple transaction scenarios; the method includes: obtaining a training sample set, which includes multiple sample transaction events and their respective corresponding risk labels, wherein the multiple sample transaction events come from different transaction scenarios; For any first transaction event among the plurality of sample transaction events, determine its corresponding first transaction scenario, and extract the event characteristics of the first transaction event; corresponding to the first transaction scenario according to a predetermined set of common features The first scene feature set of the event feature is divided into a common feature part and a first scene feature part, wherein the common feature set includes features that the multiple transaction scenarios have; and the common feature part is input to the office In the subject model, the first scene feature part is input into the first scene model corresponding to the first scene among the plurality of scene models, and the first scene model for the first transaction event is obtained through the corresponding first classifier 1. Predicted risk; obtain the first predicted loss corresponding to the first transaction event according to the first predicted risk and the first risk label corresponding to the first transaction event; according to each of the multiple sample transaction events corresponding The synthesis of the predicted loss, and the training of the risk identification model.
在不同实施例中,上述多个交易场景包括以下场景中的至少一部分:转账到账号、转账到卡、信用卡还款、充值场景、提现场景、红包场景、外部商户调用、亲密付、生活缴费、虚拟商品交易。In different embodiments, the multiple transaction scenarios described above include at least a part of the following scenarios: transfer to account, transfer to card, credit card repayment, recharge scenario, withdrawal scene, red envelope scenario, external merchant call, intimate payment, life payment, Virtual commodity trading.
在不同实施方式中,上述共有特征集可以包括以下中的一项或多项:身份特征、交易行为特征、交易环境特征、设备特征、关系特征。In different implementations, the above-mentioned shared feature set may include one or more of the following: identity features, transaction behavior features, transaction environment features, equipment features, and relationship features.
根据一个实施例,所述主体模型包括若干主体决策树,所述第一场景模型包括若干第一决策树;在这样的情况下,通过以下方式得到针对所述第一交易事件的第一预测风险:获取所述第一交易事件在所述若干主体决策树中落入的主体叶节点所对应的主体评分,所述主体叶节点根据所述共有特征部分而确定;获取所述第一交易事件在所述若干第一决策树中落入的第一叶节点所对应的第一评分,所述第一叶节点根据所述第一场景特征部分而确定;通过所述第一分类器对所述主体评分和第一评分进行综合,得到综合评分,根据所述综合评分,得到所述第一预测风险。According to one embodiment, the agent model includes several agent decision trees, and the first scenario model includes several first decision trees; in this case, the first predicted risk for the first transaction event is obtained in the following manner : Obtain the subject score corresponding to the subject leaf node where the first transaction event falls in the plurality of subject decision trees, and the subject leaf node is determined according to the common feature part; The first score corresponding to the first leaf node that falls in the plurality of first decision trees, the first leaf node is determined according to the first scene feature part; the subject is evaluated by the first classifier The score and the first score are integrated to obtain a comprehensive score, and the first predicted risk is obtained according to the comprehensive score.
根据一个实施例,所述风险识别模型通过神经网络实现,所述主体模型对应于主体网络,所述第一场景模型对应于第一网络部分;在这样的情况下,通过以下方式得到针对所述第一交易事件的第一预测风险:获取所述主体网络对所述共有特征部分进行处理得到的第一向量;获取所述第一网络部分对所述第一场景特征部分进行处理得到的第二向量;通过所述第一分类器对所述第一向量和第二向量进行综合,得到综合结果,根据所述综合结果,得到所述第一预测风险。According to one embodiment, the risk identification model is implemented by a neural network, the subject model corresponds to the subject network, and the first scene model corresponds to the first network part; The first predicted risk of the first transaction event: Obtain the first vector obtained by processing the common feature part by the main network; Obtain the second vector obtained by processing the first scene feature part by the first network part Vector; the first vector and the second vector are synthesized by the first classifier to obtain a synthesized result, and the first predicted risk is obtained according to the synthesized result.
根据一种实施方式,上述方法还包括:获取多项新增备选特征;对所述多项新增备选特征进行筛选,得到若干新增特征;利用所述新增特征,更新所述共有特征集,和/或所述多个交易场景分别对应的场景特征集;利用更新后的共有特征集和场景特征集, 重新训练所述风险识别模型。According to an embodiment, the above method further includes: acquiring a plurality of newly added candidate features; screening the plurality of newly added candidate features to obtain a number of newly added features; using the newly added features to update the common Feature sets, and/or scene feature sets respectively corresponding to the multiple transaction scenarios; using the updated common feature set and scene feature set to retrain the risk identification model.
进一步地,在上述实施方式的一个实施例中,通过以下方式进行筛选:基于所述多项新增备选特征各自的信息价值IV,进行第一筛选;基于各项新增备选特征之间的相关系数,进行第二筛选,得到所述若干新增特征。Further, in an example of the above-mentioned implementation manner, the screening is performed in the following manner: a first screening is performed based on the information value IV of each of the multiple newly added candidate features; The second screening is performed on the correlation coefficient of, and the several new features are obtained.
在一个实施例,每隔预定时间周期,获取该时间周期中的新增备选特征,作为上述多项新增备选特征。In one embodiment, every predetermined time period, the newly-added candidate features in the time period are acquired as the aforementioned multiple newly-added candidate features.
根据第二方面,提供了一种训练风险识别模型的装置,所述风险识别模型用于识别交易事件的安全风险,并包括主体模型,多个场景模型和对应的多个分类器,所述多个场景模型对应于多个交易场景;所述装置包括:样本集获取单元,配置为获取训练样本集,其中包括多个样本交易事件及其各自对应的风险标签,所述多个样本交易事件来自于不同的交易场景;特征提取单元,配置为对于所述多个样本交易事件中任意的第一交易事件,确定其对应的第一交易场景,并提取该第一交易事件的事件特征;特征划分单元,配置为根据预定的共有特征集和所述第一交易场景对应的第一场景特征集,将所述事件特征划分为共有特征部分和第一场景特征部分,其中所述共有特征集包括所述多个交易场景均具有的特征;预测单元,配置为将所述共有特征部分输入所述主体模型,将所述第一场景特征部分输入所述多个场景模型中与所述第一场景对应的第一场景模型,通过对应的第一分类器得到针对所述第一交易事件的第一预测风险;损失确定单元,配置为根据所述第一预测风险和与所述第一交易事件对应的第一风险标签,得到与该第一交易事件对应的第一预测损失;训练单元,配置为根据所述多个样本交易事件各自对应的预测损失的综合,训练所述风险识别模型。According to a second aspect, a device for training a risk identification model is provided, the risk identification model is used to identify the security risk of a transaction event, and includes a subject model, a plurality of scene models, and a plurality of corresponding classifiers. Each scenario model corresponds to multiple transaction scenarios; the device includes: a sample set obtaining unit configured to obtain a training sample set, which includes a plurality of sample transaction events and their respective corresponding risk labels, the plurality of sample transaction events are from In different transaction scenarios; the feature extraction unit is configured to determine the corresponding first transaction scenario for any first transaction event among the plurality of sample transaction events, and extract the event characteristics of the first transaction event; feature division Unit, configured to divide the event feature into a common feature part and a first scene feature part according to a predetermined common feature set and a first scene feature set corresponding to the first transaction scenario, wherein the common feature set includes all The features of the multiple transaction scenarios; a prediction unit configured to input the common feature part into the main body model, and input the first scene feature part into the multiple scene models corresponding to the first scene According to the first scenario model, the first predicted risk for the first transaction event is obtained through the corresponding first classifier; the loss determination unit is configured to be based on the first predicted risk and the corresponding first transaction event The first risk label obtains the first predicted loss corresponding to the first transaction event; the training unit is configured to train the risk identification model according to the synthesis of the predicted losses corresponding to each of the multiple sample transaction events.
根据第三方面,提供了一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行第一方面的方法。According to a third aspect, there is provided a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method of the first aspect.
根据第四方面,提供了一种计算设备,包括存储器和处理器,其特征在于,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现第一方面的方法。According to a fourth aspect, there is provided a computing device, including a memory and a processor, characterized in that executable code is stored in the memory, and when the processor executes the executable code, the method of the first aspect is implemented .
根据本说明书实施例的描述,通过不同场景下的样本交易事件,训练得到多场景多任务的风险识别模型,该风险识别模型包括各个场景通用的主体模型部分,和各个场景专用的场景模型部分。由于各个场景共享主体模型部分,使得各个场景之间可以迁移学习,共享部分特征的处理结果,针对多个场景多个任务,均达到较好的预测效果。进一步地,还可以基于攻防性考虑,对上述风险识别模型进行更新和自动管理。According to the description of the embodiments of this specification, a multi-scenario and multi-task risk identification model is trained through sample transaction events in different scenarios. The risk identification model includes a main model part common to each scene and a scene model part dedicated to each scene. Since each scene shares the main model part, it is possible to transfer learning between each scene, share the processing results of some features, and achieve better prediction effects for multiple tasks in multiple scenarios. Further, based on offensive and defensive considerations, the above-mentioned risk identification model can be updated and automatically managed.
附图说明Description of the drawings
为了更清楚地说明本申请实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。In order to explain the technical solutions of the embodiments of the present application more clearly, the following will briefly introduce the drawings used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. A person of ordinary skill in the art can obtain other drawings based on these drawings without creative work.
图1示出根据一个实施例的风险识别模型的架构示意图;Fig. 1 shows a schematic structural diagram of a risk identification model according to an embodiment;
图2示出根据一个实施例的训练风险识别模型的方法;Figure 2 shows a method of training a risk identification model according to an embodiment;
图3示出根据一个实施例的特征与场景关系的示意图;Fig. 3 shows a schematic diagram of the relationship between features and scenes according to an embodiment;
图4示出根据一个实施例的训练风险识别模型的示意图;Fig. 4 shows a schematic diagram of training a risk identification model according to an embodiment;
图5示出在一个实施例中更新风险识别模型的方法;Figure 5 shows a method of updating a risk identification model in one embodiment;
图6示出根据一个实施例的训练风险识别模型的装置的示意性框图。Fig. 6 shows a schematic block diagram of an apparatus for training a risk recognition model according to an embodiment.
具体实施方式Detailed ways
下面结合附图,对本说明书提供的方案进行描述。The following describes the solutions provided in this specification with reference to the accompanying drawings.
如前所述,为了保障用户的支付安全和电子支付平台的服务稳定性,需要对电子交易中的安全风险进行识别。然而,出于提高用户体验的考虑,电子支付平台提供的服务内容越来越多,电子支付的场景也越来越丰富。例如,在支付宝中,提供有多种支付场景,例如,转账到账号、转账到卡、信用卡还款、充值、提现、红包、生活缴费、外部商户调用、亲密付、等等。研究发现,每一种场景都有其独有的场景特点,但是如果针对各个场景分别建立模型来识别交易风险的话,一方面,训练和管理大量的模型需要相当大的成本,另一方面,有些小众场景样本数量较少,因此难以针对这样的场景单独训练出预测准确率高的模型。然而,如果针对各种场景统一建立通用的模型,则无法利用各个场景独有的场景特点,使得通用模型的准确率不够理想。As mentioned above, in order to ensure the user's payment security and the service stability of the electronic payment platform, it is necessary to identify the security risks in electronic transactions. However, for the sake of improving user experience, electronic payment platforms provide more and more service content, and electronic payment scenarios are becoming more and more abundant. For example, in Alipay, multiple payment scenarios are provided, such as transfer to account, transfer to card, credit card repayment, recharge, cash withdrawal, red envelope, life payment, external merchant call, intimate payment, and so on. Research has found that each scenario has its own unique scenario characteristics, but if a model is established for each scenario to identify transaction risks, on the one hand, training and managing a large number of models requires considerable costs, on the other hand, some The number of samples in niche scenes is small, so it is difficult to separately train a model with high prediction accuracy for such scenes. However, if a universal model is established uniformly for various scenes, the unique scene characteristics of each scene cannot be used, making the accuracy of the universal model unsatisfactory.
基于以上考虑,发明人提出,采用多任务学习的方式,以主体模型加各个场景的分场景模型为架构,训练一个适用于各个场景的风险识别模型。该风险识别模型可以设计用于识别交易事件中的特定类型的风险,例如盗用风险。Based on the above considerations, the inventor proposes to train a risk identification model suitable for each scenario by adopting a multi-task learning method, with a main body model plus a sub-scenario model of each scenario as the framework. The risk identification model can be designed to identify specific types of risks in transaction events, such as the risk of embezzlement.
图1示出根据一个实施例的风险识别模型的架构示意图。如图1所示,风险识别模型包括,主体模型11,N个场景模型12-1,12-2,…,12-N,和对应的N多个分类器 13-1,13-2,…,13-N,其中N个场景模型分别对应于N个交易场景。Fig. 1 shows a schematic structural diagram of a risk identification model according to an embodiment. As shown in Figure 1, the risk identification model includes a main body model 11, N scene models 12-1, 12-2,..., 12-N, and corresponding N multiple classifiers 13-1, 13-2,... ,13-N, where N scene models correspond to N transaction scenes respectively.
主体模型11用于处理各个交易场景所共有的特征。可以通过预先对各个交易场景进行分析,并结合风险识别目标(例如识别盗用风险或是欺诈风险)进行特征筛选,确定出一个共有特征集,其中包含各个交易场景均共同具有,且对风险识别有信息价值的多项特征。The main body model 11 is used to process the characteristics common to each transaction scenario. By analyzing each transaction scenario in advance, and combining with risk identification targets (such as identifying the risk of embezzlement or fraud) for feature screening, a set of common features can be determined, which includes the common features of each transaction scenario and the risk identification. Multiple characteristics of information value.
具体的,在不同实施例中,共有特征集中可以包括以下中的一个或多个方面的特征:身份特征、交易行为特征、交易环境特征、设备特征、关系特征、等等。Specifically, in different embodiments, the shared feature set may include one or more of the following features: identity features, transaction behavior features, transaction environment features, equipment features, relationship features, and so on.
更具体而言,身份特征可以包括,支付用户的基本属性特征,例如性别、年龄、职业、收入、注册时长、教育程度等。在一个例子中,身份特征还可以包括,支付用户的金融资产方面的特征,例如余额宝余额、近期消费次数、消费金额等等。More specifically, the identity features may include the basic attributes of the payment user, such as gender, age, occupation, income, registration time, education level, and so on. In an example, the identity characteristics may also include the characteristics of the financial assets of the payment user, such as the balance of Yu'ebao, the number of recent consumptions, the consumption amount, and so on.
交易行为特征例如可以包括,交易金额、交易时长、交易行为轨迹,例如进入交易界面经过的入口,交易过程中的操作轨迹等等。在一个例子中,交易行为特征还可以包括,在进行目标交易行为之前最近一次操作的类型,操作的页面,停留的时间等等。Transaction behavior characteristics may include, for example, transaction amount, transaction duration, and transaction behavior trajectory, such as the entrance to the transaction interface, the operation trajectory during the transaction, and so on. In an example, the characteristics of the transaction behavior may also include the type of the most recent operation before the target transaction behavior, the page operated, the stay time, and so on.
交易环境特征可以包括,交易借助的地理环境和/或网络环境的特征,例如,地理位置信息、IP地址、wifi标识等等。The characteristics of the transaction environment may include the characteristics of the geographic environment and/or the network environment through which the transaction is used, for example, geographic location information, IP address, wifi identification, and so on.
设备特征可以包括,交易借助的设备的硬件和软件信息,例如,设备MAC地址、智能手机SIM卡序列号、UMID、APDID等等硬件标识信息、和/或、操作系统、系统版本、APP版本等软件信息。Device characteristics can include hardware and software information of the device used for the transaction, such as device MAC address, smartphone SIM card serial number, UMID, APDID, and other hardware identification information, and/or operating system, system version, APP version, etc. Software information.
关系特征可以包括,支付用户在预先建立的人群关系网中的信息,例如好友数目,与好友的沟通频次,沟通类别等等。在一种实施例中,可以将人群关系网构建为关系图。此时,关系特征可以包括,支付用户在所述关系图中的图特征,该图特征既可以包括例如节点的度之类的低阶图特征,也可以包括,基于图嵌入处理得到的高阶图特征,例如通过邻居节点聚合生成的高阶特征。The relationship characteristics may include the information of the payment user in the pre-established crowd relationship network, such as the number of friends, the frequency of communication with friends, the type of communication, and so on. In an embodiment, the crowd relationship network may be constructed as a relationship graph. At this time, the relationship feature can include the graph feature of the payment user in the relationship graph. The graph feature can include low-level graph features such as the degree of a node, or can include high-level graph features based on graph embedding processing. Graph features, such as high-level features generated by the aggregation of neighbor nodes.
共有特征集还可以包括其他各个交易场景共有的特征,在此不一一进行枚举。The common feature set may also include features common to other transaction scenarios, and we will not enumerate them one by one here.
N个场景模型12-1,12-2,…,12-N分别对应于N个交易场景,用于处理各个交易场景下不包含在上述共有特征集中的差异化场景特征。具体的,上述N个交易场景可以包括以下场景中的多个场景:转账到账号、转账到卡、信用卡还款、充值场景、提现场景、红包场景、外部商户调用、亲密付、虚拟商品交易等等。在不同场景下,差异化场景特征的内容各不相同。The N scene models 12-1, 12-2,..., 12-N respectively correspond to N transaction scenes, and are used to process differentiated scene features that are not included in the above-mentioned common feature set in each transaction scene. Specifically, the aforementioned N transaction scenarios may include multiple scenarios in the following scenarios: transfer to account, transfer to card, credit card repayment, recharge scenario, withdrawal scene, red envelope scenario, external merchant call, intimate payment, virtual commodity transaction, etc. Wait. In different scenes, the content of differentiated scene features is different.
具体的,对于转账到账号的场景,对应的场景模型所处理的差异化场景特征可以包括,收款账号对应的用户的身份特征,例如收款用户的性别、年龄、职业、收入、注册时长、教育程度等基本属性特征、支付账号和收款账号之间的关系特征、两者的收付款记录等等。Specifically, for the scenario of transferring money to an account, the differentiated scenario features handled by the corresponding scenario model may include the identity features of the user corresponding to the receiving account, such as the gender, age, occupation, income, registration time, and duration of the receiving user. Basic attributes such as education level, the relationship between the payment account and the collection account, the receipt and payment records of the two, and so on.
对于信用卡还款的场景,对应的差异化场景特征可以包括,支付用户的信用记录方面的特征,例如芝麻分、借贷记录、还款记录等等。For the credit card repayment scenario, the corresponding differentiated scenario features may include the characteristics of the credit history of the paying user, such as sesame points, loan records, repayment records, and so on.
对于充值场景,对应的差异化场景特征可以包括,充值对象标识,例如手机号、充值记录、最近一个月的充值总金额,等等。For the recharge scenario, the corresponding differentiated scenario feature may include the recharge object identifier, such as mobile phone number, recharge record, total recharge amount in the most recent month, and so on.
可以理解,不同的场景具有不同的差异化场景特征,在此不一一进行枚举。It can be understood that different scenes have different differentiated scene characteristics, and we will not enumerate them one by one here.
如图1所示,N个场景模型12-1,12-2,…,12-N分别对应于N个分类器13-1,13-2,…,13-N,第i分类器用于从主体模型获得针对共有特征的处理结果,并从对应的第i场景模型获得针对第i场景下的场景特征的处理结果,综合两方面的结果,对第i场景下的交易事件进行风险识别,例如输出其风险等级类别。As shown in Figure 1, N scene models 12-1, 12-2,..., 12-N correspond to N classifiers 13-1, 13-2,..., 13-N, respectively. The i-th classifier is used to The subject model obtains the processing results for the common features, and obtains the processing results for the scene features in the i-th scene from the corresponding i-th scene model, and combines the two results to identify the risk of the transaction event in the i-th scene, for example Output its risk level category.
在不同实施例中,主体模型和各个场景模型可以通过各种具体的模型实现。例如,在一个例子中,风险识别模型整体上通过树模型实现,例如梯度提升决策树GBDT模型;相应的,主体模型和各个场景模型可以各自实现为若干棵决策树。在另一例子中,风险识别模型整体上通过神经网络实现,例如实现为深度神经网络DNN;相应的,主体模型和各个场景模型各自可以实现为,由若干层神经元构成的多层感知机。取决于各自处理的特征的数目,主体模型和各个场景模型可以具有相同或不同的网络宽度和/或网络深度。In different embodiments, the main body model and each scene model can be implemented by various specific models. For example, in one example, the risk identification model is implemented as a whole through a tree model, such as a gradient boosting decision tree GBDT model; correspondingly, the subject model and each scene model can be implemented as several decision trees. In another example, the risk recognition model is implemented as a whole through a neural network, such as a deep neural network DNN; correspondingly, the subject model and each scene model can each be implemented as a multi-layer perceptron composed of several layers of neurons. Depending on the number of features processed by each, the subject model and each scene model may have the same or different network widths and/or network depths.
在以上的模型架构中,多个场景下的交易事件的风险识别可以视为多个不同的任务,然而这多个任务并不是互相独立的,对应的多个分类器针对不同任务进行分类时,不仅依赖场景模型的处理,还依赖于各个场景共有的主体模型的处理结果。因此,多个任务利用共同的主体模型,实现联合学习和训练,从而使得各个场景的任务之间,可以互相进行迁移学习,共享对共有特征的处理结果,从而实现各个场景下的风险识别。In the above model architecture, risk identification of transaction events in multiple scenarios can be regarded as multiple different tasks. However, these multiple tasks are not independent of each other. When the corresponding multiple classifiers classify different tasks, It depends not only on the processing of the scene model, but also on the processing results of the main model common to each scene. Therefore, multiple tasks use a common subject model to achieve joint learning and training, so that tasks in various scenarios can perform migration learning with each other and share the processing results of common features, thereby realizing risk identification in each scenario.
下面描述以上的风险识别模型的训练过程。The training process of the above risk identification model is described below.
图2示出根据一个实施例的训练风险识别模型的方法。可以理解,该方法可以通过任何具有计算、处理能力的装置、设备、平台、设备集群来执行,其中风险识别模型具有以上结合图1所描述的结构。如图2所示,训练风险识别模型的方法至少包括以下 步骤。Fig. 2 shows a method of training a risk recognition model according to an embodiment. It can be understood that the method can be executed by any device, device, platform, or device cluster with computing and processing capabilities, and the risk identification model has the structure described above in conjunction with FIG. 1. As shown in Figure 2, the method of training a risk recognition model includes at least the following steps.
在步骤21,获取训练样本集,其中包括多个样本交易事件及其各自对应的风险标签,其中所述多个样本交易事件来自于不同的交易场景。In step 21, a training sample set is obtained, which includes a plurality of sample transaction events and their respective corresponding risk labels, wherein the plurality of sample transaction events come from different transaction scenarios.
如前所述,不同的交易场景可以包括,转账到账号、转账到卡、信用卡还款、充值、提现、红包、生活缴费、外部商户调用、亲密付,等等。取决于不同交易场景的使用状况,来自各个交易场景的样本交易事件的数目一般并不相同。对于用户使用频次较多的场景,可以得到数量较多的样本交易事件;对于用户使用频次较少的小众场景,样本交易事件的数量可能非常少。例如,假定一批训练样本构成的样本集包括1000个样本交易事件,那么,通常情况下,有些场景下的样本交易事件可达几百个,而有些场景下的样本交易事件只有几十件甚至更少。这也正是,针对每个场景单独建立模型无法达到良好效果的一个原因。As mentioned above, different transaction scenarios can include transfer to account, transfer to card, credit card repayment, recharge, cash withdrawal, red envelope, life payment, external merchant call, intimate payment, and so on. Depending on the usage status of different transaction scenarios, the number of sample transaction events from each transaction scenario is generally different. For scenarios where users use more frequently, a larger number of sample transaction events can be obtained; for niche scenarios where users use less frequently, the number of sample transaction events may be very small. For example, suppose that a sample set composed of a batch of training samples includes 1000 sample transaction events. Under normal circumstances, there are hundreds of sample transaction events in some scenarios, while there are only dozens or even dozens of sample transaction events in some scenarios. less. This is exactly one reason why building a model for each scene alone cannot achieve good results.
一般地,样本交易事件的风险标签用于示出,该样本交易事件的真实风险状况。在不同实施例中,风险标签可以为二值化标签,例如0表示无风险,1表示有风险,也可以是多值化标签,不同标签值示出不同的风险等级。Generally, the risk label of a sample transaction event is used to show the true risk status of the sample transaction event. In different embodiments, the risk label may be a binary label, for example, 0 means no risk, 1 means risk, or a multi-value label, and different label values indicate different risk levels.
接着,利用待训练的风险识别模型,对于以上训练样本集中的各个样本交易事件,逐个进行预测。为了描述的清楚和简单,将训练样本集中任意一个样本交易事件称为第一交易事件,结合该第一交易事件进行描述。Then, the risk identification model to be trained is used to predict each sample transaction event in the above training sample set one by one. For clarity and simplicity of description, any sample transaction event in the training sample set is referred to as the first transaction event, and the first transaction event is described in combination.
如图2所示,在步骤22,确定任意的第一交易事件对应的第一交易场景,并提取该第一交易事件的事件特征。As shown in FIG. 2, in step 22, the first transaction scenario corresponding to any first transaction event is determined, and the event characteristics of the first transaction event are extracted.
可以理解,在采集交易事件样本作为训练样本时,可以根据该交易事件源自的场景,为其添加场景标注。相应的,根据上述第一交易事件的场景标注,可以确定出其对应的场景,称为第一交易场景。此外,针对该第一交易事件,可以根据预先通过特征筛选确定的特征项,提取其事件特征。It can be understood that when collecting a sample of a transaction event as a training sample, it is possible to add a scene label to it according to the scene from which the transaction event originated. Correspondingly, according to the scene label of the above-mentioned first transaction event, the corresponding scene can be determined, which is called the first transaction scene. In addition, for the first transaction event, the event feature can be extracted based on the feature item determined through feature screening in advance.
然后,在步骤23,根据预定的共有特征集和第一交易场景对应的第一场景特征集,将上述事件特征划分为共有特征部分和第一场景特征部分。如前所述,共有特征集包括多个交易场景均具有的特征。第一场景特征集包括,第一场景中具有的、不包含在共有特征集中的特征。Then, in step 23, according to the predetermined common feature set and the first scene feature set corresponding to the first transaction scenario, the above-mentioned event feature is divided into a common feature part and a first scene feature part. As mentioned earlier, the common feature set includes features that are shared by multiple transaction scenarios. The first scene feature set includes the features in the first scene that are not included in the common feature set.
图3示出根据一个实施例的特征与场景关系的示意图。如图3所示,该示意图中表格的横向表示,一项特征所适用的场景,表格的纵向表示,一个场景所需要的特征。 仅当一项特征适用于所有场景,也就是,该项特征对应的行中所有表项均被选中(示出被阴影)时,该项特征被包含在共有特征集中。对于每个场景,从其需要的所有特征中去除掉属于共有特征集的特征,便可以得到该场景对应的场景特征集。一般而言,可以在模型训练开始之前,预先分析得到图3所示的关系图,并基于该关系图,得到共有特征集,以及各个场景的场景特征集。Fig. 3 shows a schematic diagram of the relationship between features and scenes according to an embodiment. As shown in Figure 3, the horizontal representation of the table in the schematic diagram represents the scene to which a feature applies, and the vertical representation of the table represents the features required by a scene. Only when a feature is applicable to all scenes, that is, all entries in the row corresponding to the feature are selected (shown shaded), the feature is included in the common feature set. For each scene, remove the features belonging to the common feature set from all the features it needs, and then the scene feature set corresponding to the scene can be obtained. Generally speaking, before the model training starts, the relationship diagram shown in FIG. 3 can be obtained by analyzing in advance, and based on the relationship diagram, the common feature set and the scene feature set of each scene can be obtained.
对于上述第一交易事件而言,在确定出其对应的第一场景的基础上,就可以得到第一场景特征集。根据上述预先确定的共有特征集和第一场景特征集,可以将第一交易事件的事件特征划分为共有特征部分和第一场景特征部分。For the above-mentioned first transaction event, on the basis of determining its corresponding first scene, the first scene feature set can be obtained. According to the aforementioned predetermined common feature set and the first scene feature set, the event feature of the first transaction event can be divided into a common feature part and a first scene feature part.
于是,接着,在步骤24,将上述共有特征部分输入风险识别模型中的主体模型,将第一场景特征部分输入多个场景模型中与第一场景对应的第一场景模型,并通过对应的第一分类器得到针对该第一交易事件的第一预测风险。Then, next, in step 24, the above-mentioned common feature part is input into the main body model in the risk identification model, and the first scene feature part is input into the first scene model corresponding to the first scene among the plurality of scene models, and pass the corresponding first scene model. A classifier obtains the first predicted risk for the first transaction event.
图4示出根据一个实施例的训练风险识别模型的示意图。在图4中,用粗实线示意性示出风险识别模型对第一交易事件的处理过程。可以看到,对于第一交易事件,假定其对应的第一场景模型为场景模型2。在步骤24中,将共有特征部分输入主体模型,主体模型对其进行处理后得到第一处理结果;将第一场景特征部分输入场景模型2,场景模型2对这部分特征进行处理后得到第二处理结果。对应的分类器2综合第一处理结果和第二处理结果,输出针对该第一交易事件的预测风险。Fig. 4 shows a schematic diagram of training a risk identification model according to an embodiment. In Fig. 4, a thick solid line schematically shows the process of processing the first transaction event by the risk identification model. It can be seen that for the first transaction event, it is assumed that the corresponding first scene model is scene model 2. In step 24, the common feature part is input to the main model, and the main model processes it to obtain the first processing result; the first scene feature part is input to the scene model 2, and the scene model 2 processes this part of the feature to obtain the second process result. The corresponding classifier 2 synthesizes the first processing result and the second processing result, and outputs the predicted risk for the first transaction event.
在一个实施例中,风险识别模型实现为诸如GBDT的树模型。在这样的情况下,主体模型可以包括若干主体决策树,第一场景模型包括若干第一决策树。相应的,主体模型对共有特征部分的处理可以包括,根据共有特征部分中各个特征的特征值,沿着上述若干主体决策树进行遍历,确定该第一交易事件在上述若干主体决策树中落入的叶节点,并根据各个主体叶节点对应的分值,得到该第一交易事件所对应的主体评分。In one embodiment, the risk identification model is implemented as a tree model such as GBDT. In this case, the agent model may include several agent decision trees, and the first scene model includes several first decision trees. Correspondingly, the main body model's processing of the shared feature part may include, according to the feature value of each feature in the shared feature part, traversing along the above-mentioned several main body decision trees, and determining that the first transaction event falls into the above-mentioned several main body decision trees. And obtain the subject score corresponding to the first transaction event according to the score corresponding to each subject leaf node.
第一场景模型对第一场景特征部分的处理可以包括,根据第一场景特征部分中各个特征的特征值,沿着上述若干第一决策树进行遍历,确定该第一交易事件在上述若干第一决策树中落入的叶节点,并根据各个叶节点对应的分值,得到该第一交易事件所对应的第一评分。The processing of the first scene feature part by the first scene model may include, according to the feature value of each feature in the first scene feature part, traversing along the above-mentioned several first decision trees, and determining that the first transaction event is in the above-mentioned several first decision trees. The leaf nodes that fall in the decision tree are determined, and the first score corresponding to the first transaction event is obtained according to the score corresponding to each leaf node.
于是,第一分类器,例如图4的分类器2,可以对主体模型输出的主体评分和第一场景模型输出的第一评分进行综合,得到综合评分。在不同实施例中,第一分类器可以通过求和,加权求和,求均值等多种方式,对主体评分和第一评分进行综合,得到综合 评分。最后,第一分类器可以根据该综合评分,确定第一交易事件的预测风险。Therefore, the first classifier, such as the classifier 2 in FIG. 4, can synthesize the subject score output by the subject model and the first score output by the first scene model to obtain a comprehensive score. In different embodiments, the first classifier can synthesize the subject score and the first score through various methods such as summation, weighted summation, and average value, to obtain a comprehensive score. Finally, the first classifier can determine the predicted risk of the first transaction event based on the comprehensive score.
在另一实施例中,上述风险识别模型通过神经网络实现,例如实现为DNN深度神经网络。在这样的情况下,主体模型对应于主体网络,第一场景模型对应于第一网络部分。相应的,主体模型对共有特征部分的处理可以包括,通过主体网络中各层的神经元对各项共有特征的特征值进行运算,得到第一处理结果。该第一处理结果可以为一个处理值,不过更典型的,主体网络输出的第一处理结果体现为一个向量,称为第一向量。In another embodiment, the aforementioned risk identification model is implemented by a neural network, for example, a DNN deep neural network. In this case, the agent model corresponds to the agent network, and the first scene model corresponds to the first network part. Correspondingly, the processing of the common feature part by the main body model may include calculating the characteristic value of each common feature through the neurons of each layer in the main network to obtain the first processing result. The first processing result may be a processing value, but more typically, the first processing result output by the main body network is embodied as a vector, which is called the first vector.
第一场景模型对第一场景特征部分的处理可以包括,通过第一网络部分中各层的神经元对第一场景特征中各项特征的特征值进行运算,得到第二处理结果。该第二处理结果通常体现为第二向量。The processing of the first scene feature part by the first scene model may include calculating the feature value of each feature in the first scene feature through neurons in each layer in the first network part to obtain the second processing result. The second processing result is usually embodied as a second vector.
在这样的情况下,对应于第一场景的第一分类器也可以利用神经网络层实现,例如可以体现为若干全连接层。该全连接层接收主体网络输出的上述第一处理结果和第一网络部分输出的第二处理结果,并对其进行融合处理。在第一处理结果和第二处理结果体现为向量的情况下,该融合处理可以包括,向量拼接,相加,加权求和,按位相乘等操作,以及这些操作的组合。然后,全连接层根据融合结果,通过例如施加softmax函数,确定并输出针对第一交易事件的第一预测风险。In this case, the first classifier corresponding to the first scene can also be implemented using a neural network layer, for example, it can be embodied in several fully connected layers. The fully connected layer receives the above-mentioned first processing result output by the main network and the second processing result output by the first network part, and performs fusion processing on them. In the case where the first processing result and the second processing result are embodied as vectors, the fusion processing may include operations such as vector splicing, addition, weighted summation, and bitwise multiplication, and a combination of these operations. Then, the fully connected layer determines and outputs the first predicted risk for the first transaction event by applying a softmax function, for example, according to the fusion result.
在风险识别模型通过其他具体模型形式实现的情况下,第一分类器类似地对主体模型输出的第一处理结果和第一场景模型输出的第二处理结果进行综合,根据综合结果,得到针对第一交易事件的第一预测风险。In the case that the risk identification model is implemented in other specific model forms, the first classifier similarly synthesizes the first processing result output by the subject model and the second processing result output by the first scene model. The first predicted risk of a trading event.
然后,在步骤25,根据第一分类器输出的第一预测风险和与该第一交易事件对应的第一风险标签,得到与该第一交易事件对应的第一预测损失。该第一预测损失用于衡量,该风险识别模型针对该第一交易事件的预测结果与其真实风险之间的差异。Then, in step 25, according to the first predicted risk output by the first classifier and the first risk label corresponding to the first transaction event, the first predicted loss corresponding to the first transaction event is obtained. The first predicted loss is used to measure the difference between the predicted result of the first transaction event by the risk identification model and its true risk.
如上,利用风险识别模型对第一交易事件的风险进行了预测,并得到预测损失。可以理解,上述第一交易事件为训练样本集中任意的一个样本交易事件。对于其他的样本交易事件,可以类似的进行预测,并得到对应的预测损失。As above, the risk identification model is used to predict the risk of the first transaction event, and the predicted loss is obtained. It can be understood that the above-mentioned first transaction event is any sample transaction event in the training sample set. For other sample transaction events, similar predictions can be made and the corresponding predicted losses can be obtained.
例如,图4中还示意性示出另一样本交易事件,例如称为第二交易事件的预测过程,如图中粗虚线所示。可以看到,第二交易事件对应于第二场景,图4中示例性表示为场景N,与第一交易事件的场景不同。相应的,针对第二交易事件,将其事件特征中的共有特征部分输入到主体模型,将对应于第二场景的第二场景特征部分输入到第二场景模型(模型N),利用对应的第二分类器(分类器N),输出该第二交易事件的预测 风险,进而得到其对应的预测损失。For example, FIG. 4 also schematically shows another sample transaction event, for example, a prediction process called a second transaction event, as shown by the thick dashed line in the figure. It can be seen that the second transaction event corresponds to the second scenario, which is exemplarily represented as scenario N in FIG. 4, which is different from the scenario of the first transaction event. Correspondingly, for the second transaction event, input the common feature part of its event characteristics into the main body model, input the second scene feature part corresponding to the second scene into the second scene model (model N), and use the corresponding first The second classifier (classifier N) outputs the predicted risk of the second transaction event, and then obtains its corresponding predicted loss.
通过以上方式,可以得到训练样本集中各个样本交易事件对应的预测损失。对各个预测损失进行综合,可以得到该训练样本集对应的总预测损失。Through the above method, the predicted loss corresponding to each sample transaction event in the training sample set can be obtained. By integrating each prediction loss, the total prediction loss corresponding to the training sample set can be obtained.
于是,在步骤26,根据训练样本集中各个样本交易事件对应的预测损失的综合,即上述总预测损失,训练所述风险识别模型。具体地,可以在总预测损失减小的方向,调整风险识别模型的参数,对其进行优化和训练。Therefore, in step 26, the risk identification model is trained based on the synthesis of the predicted loss corresponding to each sample transaction event in the training sample set, that is, the above-mentioned total predicted loss. Specifically, the parameters of the risk identification model can be adjusted in the direction in which the total predicted loss is reduced, and the parameters of the risk identification model can be optimized and trained.
如此,通过来自于不同场景的样本交易事件,训练得到一个能够针对多个场景多个任务进行预测的风险识别模型。可以看到,由于各个场景共享主体模型部分,使得各个场景之间可以迁移学习,共享部分特征的处理结果。例如,利用样本较为丰富的高频场景中的样本交易事件,可以使得主体模型得到很好的训练。对于样本量较少的场景,可以重点根据主体模型部分的处理结果进行预测,达到较好的预测效果。In this way, through sample transaction events from different scenarios, a risk identification model that can predict multiple tasks in multiple scenarios can be trained. It can be seen that since each scene shares the main model part, it is possible to transfer learning between each scene and share the processing results of some features. For example, the use of sample transaction events in a high-frequency scene with relatively abundant samples can enable the subject model to be well trained. For scenes with a small sample size, you can focus on the prediction based on the processing results of the main model part to achieve a better prediction effect.
进一步地,发明人还发现,电子交易事件具有很强的时效性和攻防性。这体现在,一方面,新的不安全事件类型层出不穷,安全风险形式变化非常迅速;另一方面,有意发起不安全事件的不良使用者,有可能基于已有的安全性评估系统的评估结果识别出一些评估规则,然后刻意绕开这些评估规则,实施新的不安全事件。以上的时效性和攻防性,往往使得安全性评估系统无法应对新类型的不安全事件,导致识别性能下降。Furthermore, the inventor also found that electronic transaction events have strong timeliness and offensive and defensive nature. This is reflected in the fact that on the one hand, new types of unsafe incidents emerge in an endless stream, and the form of security risks changes very rapidly; on the other hand, bad users who intentionally initiate unsafe incidents may be identified based on the evaluation results of the existing safety evaluation system. Develop some evaluation rules, and then deliberately bypass these evaluation rules to implement new unsafe incidents. The above-mentioned timeliness and offensive and defensive nature often make the safety assessment system unable to deal with new types of unsafe events, resulting in a decrease in recognition performance.
为此,在以上训练出风险识别模型的基础上,根据本说明书的一个实施例,还进一步地对该风险识别模型进行更新。图5示出在一个实施例中更新风险识别模型的方法。For this reason, on the basis of the risk identification model trained above, according to an embodiment of this specification, the risk identification model is further updated. Figure 5 shows a method of updating the risk identification model in one embodiment.
如图5所示,首先在步骤51,获取多项新增备选特征。在一个例子中,上述新增备选特征可以是,通过对新出现的风险事件类型进行分析,发掘、沉淀并添加到特征池中的特征。在另一例子中,上述新增备选特征也可以是,基于已有的特征,通过特征组合工具,得到的新增衍生特征。在一个实施例中,每隔预定时间周期,获取该时间周期中的新增备选特征。As shown in Fig. 5, first in step 51, a number of newly added candidate features are obtained. In an example, the above-mentioned new candidate feature may be a feature that is discovered, precipitated, and added to the feature pool by analyzing the types of newly emerging risk events. In another example, the aforementioned newly added candidate feature may also be a newly added derivative feature obtained through a feature combination tool based on an existing feature. In one embodiment, every predetermined time period, the newly added candidate features in the time period are acquired.
然后在步骤52,对上述多项新增备选特征进行筛选,得到若干新增特征。特征筛选可以基于多种评估特征可用性的指标进行,例如特征信息价值IV,信息增益比,相关系数,基尼系数,等等。Then in step 52, the above-mentioned multiple new candidate features are screened to obtain several new features. Feature screening can be performed based on multiple indicators for evaluating feature availability, such as feature information value IV, information gain ratio, correlation coefficient, Gini coefficient, and so on.
在一个具体例子中,可以基于特征IV值和相关系数的组合,对备选特征进行筛选。具体地,可以基于上述多项新增备选特征各自的信息价值IV,进行第一筛选。第一筛选可以包括,剔除IV值低于一定阈值的特征,保留IV值高于该阈值的特征。然后,基 于各项新增备选特征之间的相关系数,进行第二筛选,得到若干新增特征。第二筛选可以包括,如果一项特征与任何其他特征之前的相关系数大于预定的相关度阈值,则剔除该项特征。或者,第二筛选也可以包括,如果两项特征之间的相关系数大于预定的相关度阈值,则将这两项特征中IV值较低的剔除。还可以基于其他原则,进行特征筛选,从而得到若干新增特征。In a specific example, the candidate features can be screened based on the combination of the feature IV value and the correlation coefficient. Specifically, the first screening may be performed based on the information value IV of each of the aforementioned multiple newly added candidate features. The first screening may include removing features with IV values below a certain threshold, and retaining features with IV values above the threshold. Then, based on the correlation coefficients between each new candidate feature, a second screening is performed to obtain several new features. The second screening may include, if the correlation coefficient between a feature and any other feature is greater than a predetermined correlation threshold, then removing the feature. Alternatively, the second screening may also include, if the correlation coefficient between the two features is greater than a predetermined correlation threshold, then the two features with a lower IV value are eliminated. It is also possible to perform feature screening based on other principles to obtain several new features.
于是接着,在步骤53,利用上述新增特征,更新共有特征集,和/或多个交易场景分别对应的场景特征集。在该步骤中,可以将新增特征添加到如图3所示的特征-场景关系图表中,从而确定各个新增特征属于共同特征集或者某个场景特征集。如此,更新风险识别模型所基于的共有特征集和/或场景特征集。Then, in step 53, the above-mentioned newly added features are used to update the common feature set and/or the scene feature sets corresponding to multiple transaction scenarios. In this step, the newly-added features can be added to the feature-scene relationship chart as shown in FIG. 3, so as to determine that each newly-added feature belongs to a common feature set or a certain scene feature set. In this way, the common feature set and/or the scene feature set on which the risk identification model is based are updated.
然后,在步骤54,利用更新后的共有特征集和场景特征集,重新训练风险识别模型。如此,可以用最新的特征集合,更新风险识别模型,使其能够适应新的风险事件类型。Then, in step 54, use the updated common feature set and scene feature set to retrain the risk identification model. In this way, the latest feature set can be used to update the risk identification model so that it can adapt to new types of risk events.
进一步地,对于更新的风险识别模型,可以对其进行性能评估,自动输出评估结果,并将该评估结果与更新前的模型进行对比。如果性能有提升,则自动上线该更新的风险识别模型;如果没有明显提升,则保持原有模型不变。如此可以在提高模型性能和效果的同时,大幅度减少人力管理成本。Further, for the updated risk identification model, its performance can be evaluated, the evaluation result can be automatically output, and the evaluation result can be compared with the model before the update. If the performance is improved, the updated risk identification model will be automatically launched; if there is no significant improvement, the original model will remain unchanged. This can greatly reduce the cost of manpower management while improving the performance and effect of the model.
回顾以上过程,在本说明书的实施例中,通过不同场景下的样本交易事件,训练得到多场景多任务的风险识别模型,该风险识别模型包括各个场景通用的主体模型部分,和各个场景专用的场景模型部分。由于各个场景共享主体模型部分,使得各个场景之间可以迁移学习,共享部分特征的处理结果,针对多个场景多个任务,均达到较好的预测效果。进一步地,还可以基于攻防性考虑,对上述风险识别模型进行更新和自动管理。Recalling the above process, in the embodiments of this specification, a multi-scenario and multi-task risk identification model is trained through sample transaction events in different scenarios. Scene model part. Since each scene shares the main model part, it is possible to transfer learning between each scene, share the processing results of some features, and achieve better prediction effects for multiple tasks in multiple scenarios. Further, based on offensive and defensive considerations, the above-mentioned risk identification model can be updated and automatically managed.
根据另一方面的实施例,提供了一种训练风险识别模型的装置,其中所述风险识别模型用于识别交易事件的安全风险,并包括主体模型,多个场景模型和对应的多个分类器,所述多个场景模型对应于多个交易场景;用于训练该风险识别模型的训练装置可以部署在任何具有计算、处理能力的设备、平台或设备集群中。图6示出根据一个实施例的训练风险识别模型的装置的示意性框图。如图6所示,该装置600包括以下单元。According to another embodiment, a device for training a risk identification model is provided, wherein the risk identification model is used to identify the security risk of a transaction event, and includes a subject model, a plurality of scene models, and a plurality of corresponding classifiers The multiple scenario models correspond to multiple transaction scenarios; the training device for training the risk identification model can be deployed in any device, platform or device cluster with computing and processing capabilities. Fig. 6 shows a schematic block diagram of an apparatus for training a risk recognition model according to an embodiment. As shown in FIG. 6, the device 600 includes the following units.
样本集获取单元61,配置为获取训练样本集,其中包括多个样本交易事件及其各自对应的风险标签,所述多个样本交易事件来自于不同的交易场景;The sample set obtaining unit 61 is configured to obtain a training sample set, which includes a plurality of sample transaction events and their respective corresponding risk labels, and the plurality of sample transaction events come from different transaction scenarios;
特征提取单元62,配置为对于所述多个样本交易事件中任意的第一交易事件,确 定其对应的第一交易场景,并提取该第一交易事件的事件特征;The feature extraction unit 62 is configured to determine the corresponding first transaction scenario for any first transaction event among the plurality of sample transaction events, and extract the event feature of the first transaction event;
特征划分单元63,配置为根据预定的共有特征集和所述第一交易场景对应的第一场景特征集,将所述事件特征划分为共有特征部分和第一场景特征部分,其中所述共有特征集包括所述多个交易场景均具有的特征;The feature dividing unit 63 is configured to divide the event feature into a common feature part and a first scene feature part according to a predetermined common feature set and a first scene feature set corresponding to the first transaction scenario, wherein the common feature The set includes the characteristics of the multiple transaction scenarios;
预测单元64,配置为将所述共有特征部分输入所述主体模型,将所述第一场景特征部分输入所述多个场景模型中与所述第一场景对应的第一场景模型,通过对应的第一分类器得到针对所述第一交易事件的第一预测风险;The prediction unit 64 is configured to input the common feature part into the main body model, input the first scene feature part into a first scene model corresponding to the first scene among the plurality of scene models, and pass the corresponding The first classifier obtains the first predicted risk for the first transaction event;
损失确定单元65,配置为根据所述第一预测风险和与所述第一交易事件对应的第一风险标签,得到与该第一交易事件对应的第一预测损失;The loss determining unit 65 is configured to obtain the first predicted loss corresponding to the first transaction event according to the first predicted risk and the first risk label corresponding to the first transaction event;
训练单元66,配置为根据所述多个样本交易事件各自对应的预测损失的综合,训练所述风险识别模型。The training unit 66 is configured to train the risk identification model according to the synthesis of the predicted loss corresponding to each of the multiple sample transaction events.
在不同实施例中,所述多个交易场景包括以下场景中的至少一部分:转账到账号,转账到卡,信用卡还款,充值场景,提现场景,红包场景,外部商户调用,亲密付,生活缴费,虚拟商品交易。In different embodiments, the multiple transaction scenarios include at least a part of the following scenarios: transfer to account, transfer to card, credit card repayment, recharge scenario, withdrawal scene, red envelope scenario, external merchant call, intimate payment, and life payment , Virtual commodity trading.
在各种实施方式中,所述共有特征集可以包括以下中的一项或多项:身份特征,交易行为特征,交易环境特征,设备特征,关系特征。In various embodiments, the shared feature set may include one or more of the following: identity feature, transaction behavior feature, transaction environment feature, equipment feature, relationship feature.
根据一个实施例,上述主体模型包括若干主体决策树,所述第一场景模型包括若干第一决策树;在这样的情况下,所述预测单元64具体配置为:According to one embodiment, the above-mentioned subject model includes several subject decision trees, and the first scene model includes several first decision trees; in this case, the prediction unit 64 is specifically configured to:
获取所述第一交易事件在所述若干主体决策树中落入的主体叶节点所对应的主体评分,所述主体叶节点根据所述共有特征部分而确定;Acquiring the subject scores corresponding to the subject leaf nodes that the first transaction event falls in the plurality of subject decision trees, the subject leaf nodes being determined according to the common feature part;
获取所述第一交易事件在所述若干第一决策树中落入的第一叶节点所对应的第一评分,所述第一叶节点根据所述第一场景特征部分而确定;Acquiring a first score corresponding to a first leaf node in which the first transaction event falls in the plurality of first decision trees, the first leaf node being determined according to the first scene feature part;
通过所述第一分类器对所述主体评分和第一评分进行综合,得到综合评分,根据所述综合评分,得到所述第一预测风险。The subject score and the first score are synthesized by the first classifier to obtain a comprehensive score, and the first predicted risk is obtained according to the comprehensive score.
在另一实施例中,风险识别模型通过神经网络实现,所述主体模型对应于主体网络,所述第一场景模型对应于第一网络部分;在这样的情况下,所述预测单元64具体配置为:In another embodiment, the risk identification model is implemented by a neural network, the subject model corresponds to the subject network, and the first scene model corresponds to the first network part; in this case, the prediction unit 64 is specifically configured for:
获取所述主体网络对所述共有特征部分进行处理得到的第一向量;Acquiring the first vector obtained by the main network processing the common feature part;
获取所述第一网络部分对所述第一场景特征部分进行处理得到的第二向量;Acquiring a second vector obtained by processing the first scene feature part by the first network part;
通过所述第一分类器对所述第一向量和第二向量进行综合,得到综合结果,根据所述综合结果,得到所述第一预测风险。The first vector and the second vector are synthesized by the first classifier to obtain a synthesized result, and the first predicted risk is obtained according to the synthesized result.
根据一种实施方式,装置600还包括更新单元67,所述更新单元进一步包括(未示出):特征获取模块,配置为获取多项新增备选特征;特征筛选模块,配置为对所述多项新增备选特征进行筛选,得到若干新增特征;特征更新模块,配置为利用所述新增特征,更新所述共有特征集,和/或所述多个交易场景分别对应的场景特征集;模型更新模块,配置为利用更新后的共有特征集和场景特征集,重新训练所述风险识别模型。According to an embodiment, the device 600 further includes an update unit 67, the update unit further includes (not shown): a feature acquisition module configured to acquire multiple new candidate features; a feature screening module configured to A number of new candidate features are screened to obtain several new features; the feature update module is configured to use the new features to update the set of common features, and/or the scene features corresponding to the multiple transaction scenarios. The model update module is configured to use the updated common feature set and scene feature set to retrain the risk identification model.
在一个实施例中,上述特征筛选模块配置为:基于所述多项新增备选特征各自的信息价值IV,进行第一筛选;基于各项新增备选特征之间的相关系数,进行第二筛选,得到所述若干新增特征。In one embodiment, the above-mentioned feature screening module is configured to: perform a first screening based on the information value IV of each of the multiple newly added candidate features; and perform the first screening based on the correlation coefficient between each newly added candidate feature Second, screening to obtain the several new features.
在一个实施例中,所述特征获取模块配置为:每隔预定时间周期,获取该时间周期中的新增备选特征。In one embodiment, the feature acquisition module is configured to acquire newly added candidate features in the time period every predetermined time period.
通过以上装置,通过多任务学习的方式,训练得到适用于各种交易场景的风险识别模型,并可以对该模型进行更新。Through the above devices, through multi-task learning, a risk identification model suitable for various trading scenarios can be trained, and the model can be updated.
根据另一方面的实施例,还提供一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行结合图2所描述的方法。According to another embodiment, there is also provided a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method described in conjunction with FIG. 2.
根据再一方面的实施例,还提供一种计算设备,包括存储器和处理器,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现结合图2所述的方法。According to an embodiment of still another aspect, there is also provided a computing device, including a memory and a processor, the memory is stored with executable code, and when the processor executes the executable code, it implements the method described in conjunction with FIG. 2 method.
本领域技术人员应该可以意识到,在上述一个或多个示例中,本申请所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时,可以将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。Those skilled in the art should be aware that, in one or more of the foregoing examples, the functions described in this application can be implemented by hardware, software, firmware, or any combination thereof. When implemented by software, these functions can be stored in a computer-readable medium or transmitted as one or more instructions or codes on the computer-readable medium.
以上所述的具体实施方式,对本申请的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本申请的具体实施方式而已,并不用于限定本申请的保护范围,凡在本申请的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本申请的保护范围之内。The specific implementations described above further describe the purpose, technical solutions and beneficial effects of this application in detail. It should be understood that the above are only specific implementations of this application and are not intended to limit the scope of this application. The scope of protection, any modification, equivalent replacement, improvement, etc. made on the basis of the technical solution of this application shall be included in the scope of protection of this application.

Claims (18)

  1. 一种训练风险识别模型的方法,所述风险识别模型用于识别交易事件的安全风险,并包括主体模型,多个场景模型和对应的多个分类器,所述多个场景模型对应于多个交易场景;所述方法包括:A method for training a risk identification model. The risk identification model is used to identify the security risk of a transaction event, and includes a subject model, a plurality of scene models and a plurality of corresponding classifiers, the plurality of scene models corresponding to a plurality of Transaction scenario; the method includes:
    获取训练样本集,其中包括多个样本交易事件及其各自对应的风险标签,其中所述多个样本交易事件来自于不同的交易场景;Obtaining a training sample set, which includes a plurality of sample transaction events and their respective corresponding risk labels, wherein the plurality of sample transaction events are from different transaction scenarios;
    对于所述多个样本交易事件中任意的第一交易事件,确定其对应的第一交易场景,并提取该第一交易事件的事件特征;For any first transaction event among the plurality of sample transaction events, determine its corresponding first transaction scenario, and extract the event characteristics of the first transaction event;
    根据预定的共有特征集和所述第一交易场景对应的第一场景特征集,将所述事件特征划分为共有特征部分和第一场景特征部分,其中所述共有特征集包括所述多个交易场景均具有的特征;According to a predetermined common feature set and a first scene feature set corresponding to the first transaction scenario, the event feature is divided into a common feature part and a first scene feature part, wherein the common feature set includes the multiple transactions The characteristics of the scene;
    将所述共有特征部分输入所述主体模型,将所述第一场景特征部分输入所述多个场景模型中与所述第一场景对应的第一场景模型,通过对应的第一分类器得到针对所述第一交易事件的第一预测风险;The common feature part is input into the main body model, the first scene feature part is input into a first scene model corresponding to the first scene among the plurality of scene models, and the corresponding first classifier is used to obtain the target The first predicted risk of the first transaction event;
    根据所述第一预测风险和与所述第一交易事件对应的第一风险标签,得到与该第一交易事件对应的第一预测损失;Obtaining a first predicted loss corresponding to the first transaction event according to the first predicted risk and the first risk label corresponding to the first transaction event;
    根据所述多个样本交易事件各自对应的预测损失的综合,训练所述风险识别模型。Training the risk identification model according to the synthesis of the predicted losses corresponding to each of the multiple sample transaction events.
  2. 根据权利要求1所述的方法,其中,所述多个交易场景包括以下场景中的至少一部分:转账到账号,转账到卡,信用卡还款,充值场景,提现场景,红包场景,外部商户调用,亲密付,生活缴费,虚拟商品交易。The method according to claim 1, wherein the multiple transaction scenarios include at least a part of the following scenarios: transfer to account, transfer to card, credit card repayment, recharge scenario, withdrawal scene, red envelope scenario, external merchant call, Intimate payment, life payment, virtual commodity trading.
  3. 根据权利要求1所述的方法,其中,所述共有特征集包括以下中的一项或多项:身份特征,交易行为特征,交易环境特征,设备特征,关系特征。The method according to claim 1, wherein the set of common features includes one or more of the following: identity features, transaction behavior features, transaction environment features, equipment features, and relationship features.
  4. 根据权利要求1所述的方法,其中,所述主体模型包括若干主体决策树,所述第一场景模型包括若干第一决策树;The method according to claim 1, wherein the subject model includes a plurality of subject decision trees, and the first scene model includes a plurality of first decision trees;
    所述通过对应的第一分类器得到针对所述第一交易事件的第一预测风险包括:The obtaining the first predicted risk for the first transaction event through the corresponding first classifier includes:
    获取所述第一交易事件在所述若干主体决策树中落入的主体叶节点所对应的主体评分,所述主体叶节点根据所述共有特征部分而确定;Acquiring the subject scores corresponding to the subject leaf nodes that the first transaction event falls in the plurality of subject decision trees, the subject leaf nodes being determined according to the common feature part;
    获取所述第一交易事件在所述若干第一决策树中落入的第一叶节点所对应的第一评分,所述第一叶节点根据所述第一场景特征部分而确定;Acquiring a first score corresponding to a first leaf node in which the first transaction event falls in the plurality of first decision trees, the first leaf node being determined according to the first scene feature part;
    通过所述第一分类器对所述主体评分和第一评分进行综合,得到综合评分,根据所述综合评分,得到所述第一预测风险。The subject score and the first score are synthesized by the first classifier to obtain a comprehensive score, and the first predicted risk is obtained according to the comprehensive score.
  5. 根据权利要求1所述的方法,其中,所述风险识别模型通过神经网络实现,所述主体模型对应于主体网络,所述第一场景模型对应于第一网络部分;The method according to claim 1, wherein the risk identification model is implemented by a neural network, the subject model corresponds to the subject network, and the first scene model corresponds to the first network part;
    所述通过对应的第一分类器得到针对所述第一交易事件的第一预测风险包括:The obtaining the first predicted risk for the first transaction event through the corresponding first classifier includes:
    获取所述主体网络对所述共有特征部分进行处理得到的第一向量;Acquiring the first vector obtained by the main network processing the common feature part;
    获取所述第一网络部分对所述第一场景特征部分进行处理得到的第二向量;Acquiring a second vector obtained by processing the first scene feature part by the first network part;
    通过所述第一分类器对所述第一向量和第二向量进行综合,得到综合结果,根据所述综合结果,得到所述第一预测风险。The first vector and the second vector are synthesized by the first classifier to obtain a synthesized result, and the first predicted risk is obtained according to the synthesized result.
  6. 根据权利要求1所述的方法,还包括:The method according to claim 1, further comprising:
    获取多项新增备选特征;Obtain multiple new candidate features;
    对所述多项新增备选特征进行筛选,得到若干新增特征;Screen the multiple newly added candidate features to obtain a number of newly added features;
    利用所述新增特征,更新所述共有特征集,和/或所述多个交易场景分别对应的场景特征集;Update the common feature set and/or the scene feature set corresponding to the multiple transaction scenes by using the newly added feature;
    利用更新后的共有特征集和场景特征集,重新训练所述风险识别模型。Using the updated common feature set and scene feature set, the risk identification model is retrained.
  7. 根据权利要求6所述的方法,其中,对所述多项新增备选特征进行筛选,得到若干新增特征,包括:The method according to claim 6, wherein the multiple newly-added candidate features are screened to obtain several newly-added features, including:
    基于所述多项新增备选特征各自的信息价值IV,进行第一筛选;Perform the first screening based on the respective information value IV of the multiple newly added candidate features;
    基于各项新增备选特征之间的相关系数,进行第二筛选,得到所述若干新增特征。Based on the correlation coefficients between the various newly added candidate features, a second screening is performed to obtain the several newly added features.
  8. 根据权利要求6所述的方法,其中,获取多项新增备选特征包括:The method according to claim 6, wherein obtaining a plurality of newly added candidate features comprises:
    每隔预定时间周期,获取该时间周期中的新增备选特征。Every predetermined time period, the newly added candidate features in the time period are acquired.
  9. 一种训练风险识别模型的装置,所述风险识别模型用于识别交易事件的安全风险,并包括主体模型,多个场景模型和对应的多个分类器,所述多个场景模型对应于多个交易场景;所述装置包括:A device for training a risk identification model, the risk identification model is used to identify the security risks of a transaction event, and includes a main body model, a plurality of scene models and a plurality of corresponding classifiers, the plurality of scene models corresponding to a plurality of Transaction scenario; the device includes:
    样本集获取单元,配置为获取训练样本集,其中包括多个样本交易事件及其各自对应的风险标签,所述多个样本交易事件来自于不同的交易场景;The sample set obtaining unit is configured to obtain a training sample set, which includes a plurality of sample transaction events and their respective corresponding risk labels, and the plurality of sample transaction events come from different transaction scenarios;
    特征提取单元,配置为对于所述多个样本交易事件中任意的第一交易事件,确定其对应的第一交易场景,并提取该第一交易事件的事件特征;The feature extraction unit is configured to determine the corresponding first transaction scenario for any first transaction event among the plurality of sample transaction events, and extract the event feature of the first transaction event;
    特征划分单元,配置为根据预定的共有特征集和所述第一交易场景对应的第一场景特征集,将所述事件特征划分为共有特征部分和第一场景特征部分,其中所述共有特征集包括所述多个交易场景均具有的特征;The feature dividing unit is configured to divide the event feature into a common feature part and a first scene feature part according to a predetermined common feature set and a first scene feature set corresponding to the first transaction scenario, wherein the common feature set Including the characteristics of the multiple transaction scenarios;
    预测单元,配置为将所述共有特征部分输入所述主体模型,将所述第一场景特征部分输入所述多个场景模型中与所述第一场景对应的第一场景模型,通过对应的第一分类 器得到针对所述第一交易事件的第一预测风险;The prediction unit is configured to input the common feature part into the main body model, input the first scene feature part into a first scene model corresponding to the first scene among the plurality of scene models, and pass the corresponding first scene model A classifier obtains the first predicted risk for the first transaction event;
    损失确定单元,配置为根据所述第一预测风险和与所述第一交易事件对应的第一风险标签,得到与该第一交易事件对应的第一预测损失;A loss determining unit configured to obtain a first predicted loss corresponding to the first transaction event according to the first predicted risk and a first risk label corresponding to the first transaction event;
    训练单元,配置为根据所述多个样本交易事件各自对应的预测损失的综合,训练所述风险识别模型。The training unit is configured to train the risk identification model according to the synthesis of the predicted losses corresponding to each of the multiple sample transaction events.
  10. 根据权利要求9所述的装置,其中,所述多个交易场景包括以下场景中的至少一部分:转账到账号,转账到卡,信用卡还款,充值场景,提现场景,红包场景,外部商户调用,亲密付,生活缴费,虚拟商品交易。The device according to claim 9, wherein the multiple transaction scenarios include at least part of the following scenarios: transfer to account, transfer to card, credit card repayment, recharge scenario, withdrawal scene, red envelope scenario, external merchant call, Intimate payment, life payment, virtual commodity trading.
  11. 根据权利要求9所述的装置,其中,所述共有特征集包括以下中的一项或多项:身份特征,交易行为特征,交易环境特征,设备特征,关系特征。The device according to claim 9, wherein the set of common features includes one or more of the following: identity features, transaction behavior features, transaction environment features, equipment features, and relationship features.
  12. 根据权利要求9所述的装置,其中,所述主体模型包括若干主体决策树,所述第一场景模型包括若干第一决策树;The device according to claim 9, wherein the agent model includes a plurality of agent decision trees, and the first scene model includes a plurality of first decision trees;
    所述预测单元具体配置为:The prediction unit is specifically configured as:
    获取所述第一交易事件在所述若干主体决策树中落入的主体叶节点所对应的主体评分,所述主体叶节点根据所述共有特征部分而确定;Acquiring the subject scores corresponding to the subject leaf nodes that the first transaction event falls in the plurality of subject decision trees, the subject leaf nodes being determined according to the common feature part;
    获取所述第一交易事件在所述若干第一决策树中落入的第一叶节点所对应的第一评分,所述第一叶节点根据所述第一场景特征部分而确定;Acquiring a first score corresponding to a first leaf node in which the first transaction event falls in the plurality of first decision trees, the first leaf node being determined according to the first scene feature part;
    通过所述第一分类器对所述主体评分和第一评分进行综合,得到综合评分,根据所述综合评分,得到所述第一预测风险。The subject score and the first score are synthesized by the first classifier to obtain a comprehensive score, and the first predicted risk is obtained according to the comprehensive score.
  13. 根据权利要求9所述的装置,其中,所述风险识别模型通过神经网络实现,所述主体模型对应于主体网络,所述第一场景模型对应于第一网络部分;The device according to claim 9, wherein the risk identification model is implemented by a neural network, the subject model corresponds to the subject network, and the first scene model corresponds to the first network part;
    所述预测单元具体配置为:The prediction unit is specifically configured as:
    获取所述主体网络对所述共有特征部分进行处理得到的第一向量;Acquiring the first vector obtained by the main network processing the common feature part;
    获取所述第一网络部分对所述第一场景特征部分进行处理得到的第二向量;Acquiring a second vector obtained by processing the first scene feature part by the first network part;
    通过所述第一分类器对所述第一向量和第二向量进行综合,得到综合结果,根据所述综合结果,得到所述第一预测风险。The first vector and the second vector are synthesized by the first classifier to obtain a synthesized result, and the first predicted risk is obtained according to the synthesized result.
  14. 根据权利要求9所述的装置,还包括更新单元,所述更新单元包括:The apparatus according to claim 9, further comprising an update unit, the update unit comprising:
    特征获取模块,配置为获取多项新增备选特征;The feature acquisition module is configured to acquire multiple new candidate features;
    特征筛选模块,配置为对所述多项新增备选特征进行筛选,得到若干新增特征;The feature screening module is configured to screen the multiple newly-added candidate features to obtain several newly-added features;
    特征更新模块,配置为利用所述新增特征,更新所述共有特征集,和/或所述多个交易场景分别对应的场景特征集;A feature update module configured to use the newly added feature to update the common feature set, and/or the scene feature set corresponding to each of the multiple transaction scenarios;
    模型更新模块,配置为利用更新后的共有特征集和场景特征集,重新训练所述风险识别模型。The model update module is configured to use the updated common feature set and scene feature set to retrain the risk identification model.
  15. 根据权利要求14所述的装置,其中,所述特征筛选模块配置为:The device according to claim 14, wherein the feature screening module is configured to:
    基于所述多项新增备选特征各自的信息价值IV,进行第一筛选;Perform the first screening based on the respective information value IV of the multiple newly added candidate features;
    基于各项新增备选特征之间的相关系数,进行第二筛选,得到所述若干新增特征。Based on the correlation coefficients between the various newly added candidate features, a second screening is performed to obtain the several newly added features.
  16. 根据权利要求14所述的方法,其中,所述特征获取模块配置为:The method according to claim 14, wherein the feature acquisition module is configured to:
    每隔预定时间周期,获取该时间周期中的新增备选特征。Every predetermined time period, the newly added candidate features in the time period are acquired.
  17. 一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行权利要求1-8中任一项的所述的方法。A computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method of any one of claims 1-8.
  18. 一种计算设备,包括存储器和处理器,其特征在于,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现权利要求1-8中任一项所述的方法。A computing device, comprising a memory and a processor, characterized in that executable code is stored in the memory, and when the processor executes the executable code, the method described in any one of claims 1-8 is implemented. method.
PCT/CN2020/138205 2020-03-05 2020-12-22 Risk identification model training method and apparatus WO2021174966A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010147121.4 2020-03-05
CN202010147121.4A CN111291900A (en) 2020-03-05 2020-03-05 Method and device for training risk recognition model

Publications (1)

Publication Number Publication Date
WO2021174966A1 true WO2021174966A1 (en) 2021-09-10

Family

ID=71028603

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/138205 WO2021174966A1 (en) 2020-03-05 2020-12-22 Risk identification model training method and apparatus

Country Status (2)

Country Link
CN (1) CN111291900A (en)
WO (1) WO2021174966A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114092097A (en) * 2021-11-23 2022-02-25 支付宝(杭州)信息技术有限公司 Training method of risk recognition model, and transaction risk determination method and device
CN114492214A (en) * 2022-04-18 2022-05-13 支付宝(杭州)信息技术有限公司 Method and device for determining selection operator and optimizing strategy combination by using machine learning

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110458393B (en) * 2019-07-05 2023-07-18 创新先进技术有限公司 Method and device for determining risk identification scheme and electronic equipment
CN111291900A (en) * 2020-03-05 2020-06-16 支付宝(杭州)信息技术有限公司 Method and device for training risk recognition model
CN112330035A (en) * 2020-11-10 2021-02-05 支付宝(杭州)信息技术有限公司 Training method and device of risk prediction model
CN112418520B (en) * 2020-11-22 2022-09-20 同济大学 Credit card transaction risk prediction method based on federal learning
CN112396513B (en) * 2020-11-27 2024-02-20 中国银联股份有限公司 Data processing method and device
CN112785157B (en) * 2021-01-22 2022-07-22 支付宝(杭州)信息技术有限公司 Risk identification system updating method and device and risk identification method and device
CN113487208A (en) * 2021-07-16 2021-10-08 支付宝(杭州)信息技术有限公司 Risk assessment method and device
CN113672807B (en) * 2021-08-05 2024-03-05 杭州网易云音乐科技有限公司 Recommendation method, recommendation device, recommendation medium, recommendation device and computing equipment
CN115099586A (en) * 2022-06-10 2022-09-23 上海异工同智信息科技有限公司 Method and device for identifying operation risk
CN115701866B (en) * 2022-12-22 2023-10-27 荣耀终端有限公司 E-commerce platform risk identification model training method and device
CN117036037B (en) * 2023-10-09 2023-12-29 中国建设银行股份有限公司 Suspicious transaction risk analysis method and suspicious transaction risk analysis device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103839183A (en) * 2014-03-19 2014-06-04 江苏苏大大数据科技有限公司 Intelligent credit extension method and intelligent credit extension device
US20150039512A1 (en) * 2014-08-08 2015-02-05 Brighterion, Inc. Real-time cross-channel fraud protection
US20150127415A1 (en) * 2013-11-01 2015-05-07 Digital Risk Analytics, LLC Systems, methods and computer readable media for generating a multi-dimensional risk assessment system including a manufacturing defect risk model
CN108399509A (en) * 2018-04-12 2018-08-14 阿里巴巴集团控股有限公司 Determine the method and device of the risk probability of service request event
CN108629413A (en) * 2017-03-15 2018-10-09 阿里巴巴集团控股有限公司 Neural network model training, trading activity Risk Identification Method and device
CN109409896A (en) * 2018-10-17 2019-03-01 北京芯盾时代科技有限公司 Identification model training method, bank's fraud recognition methods and device are cheated by bank
CN110544100A (en) * 2019-09-10 2019-12-06 北京三快在线科技有限公司 Business identification method, device and medium based on machine learning
CN111291900A (en) * 2020-03-05 2020-06-16 支付宝(杭州)信息技术有限公司 Method and device for training risk recognition model

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107016416B (en) * 2017-04-12 2021-02-12 中国科学院重庆绿色智能技术研究院 Data classification prediction method based on neighborhood rough set and PCA fusion
CN110019812B (en) * 2018-02-27 2021-08-20 中国科学院计算技术研究所 User self-production content detection method and system
CN110309840B (en) * 2018-03-27 2023-08-11 创新先进技术有限公司 Risk transaction identification method, risk transaction identification device, server and storage medium
CN109472610A (en) * 2018-11-09 2019-03-15 福建省农村信用社联合社 A kind of bank transaction is counter to cheat method and system, equipment and storage medium
CN110008991B (en) * 2019-02-26 2023-05-02 创新先进技术有限公司 Risk event identification method, risk identification model generation method, risk event identification device, risk identification equipment and risk identification medium
CN110310206B (en) * 2019-07-01 2023-09-29 创新先进技术有限公司 Method and system for updating risk control model
CN110443618B (en) * 2019-07-10 2023-12-01 创新先进技术有限公司 Method and device for generating wind control strategy
CN110555461A (en) * 2019-07-31 2019-12-10 中国地质大学(武汉) scene classification method and system based on multi-structure convolutional neural network feature fusion

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150127415A1 (en) * 2013-11-01 2015-05-07 Digital Risk Analytics, LLC Systems, methods and computer readable media for generating a multi-dimensional risk assessment system including a manufacturing defect risk model
CN103839183A (en) * 2014-03-19 2014-06-04 江苏苏大大数据科技有限公司 Intelligent credit extension method and intelligent credit extension device
US20150039512A1 (en) * 2014-08-08 2015-02-05 Brighterion, Inc. Real-time cross-channel fraud protection
CN108629413A (en) * 2017-03-15 2018-10-09 阿里巴巴集团控股有限公司 Neural network model training, trading activity Risk Identification Method and device
CN108399509A (en) * 2018-04-12 2018-08-14 阿里巴巴集团控股有限公司 Determine the method and device of the risk probability of service request event
CN109409896A (en) * 2018-10-17 2019-03-01 北京芯盾时代科技有限公司 Identification model training method, bank's fraud recognition methods and device are cheated by bank
CN110544100A (en) * 2019-09-10 2019-12-06 北京三快在线科技有限公司 Business identification method, device and medium based on machine learning
CN111291900A (en) * 2020-03-05 2020-06-16 支付宝(杭州)信息技术有限公司 Method and device for training risk recognition model

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114092097A (en) * 2021-11-23 2022-02-25 支付宝(杭州)信息技术有限公司 Training method of risk recognition model, and transaction risk determination method and device
CN114092097B (en) * 2021-11-23 2024-05-24 支付宝(杭州)信息技术有限公司 Training method of risk identification model, transaction risk determining method and device
CN114492214A (en) * 2022-04-18 2022-05-13 支付宝(杭州)信息技术有限公司 Method and device for determining selection operator and optimizing strategy combination by using machine learning

Also Published As

Publication number Publication date
CN111291900A (en) 2020-06-16

Similar Documents

Publication Publication Date Title
WO2021174966A1 (en) Risk identification model training method and apparatus
CN109102393B (en) Method and device for training and using relational network embedded model
US20230316076A1 (en) Unsupervised Machine Learning System to Automate Functions On a Graph Structure
CN111291816B (en) Method and device for carrying out feature processing aiming at user classification model
CN110188198B (en) Anti-fraud method and device based on knowledge graph
CN106875078B (en) Transaction risk detection method, device and equipment
CN112580952A (en) User behavior risk prediction method and device, electronic equipment and storage medium
CN109214914A (en) A kind of loan information checking method and device based on communication open platform
US20150262184A1 (en) Two stage risk model building and evaluation
CN113011889B (en) Account anomaly identification method, system, device, equipment and medium
CN111325248A (en) Method and system for reducing pre-loan business risk
CN111428217A (en) Method and device for identifying cheat group, electronic equipment and computer readable storage medium
CN113657896A (en) Block chain transaction topological graph analysis method and device based on graph neural network
CN113537960A (en) Method, device and equipment for determining abnormal resource transfer link
CN114187112A (en) Training method of account risk model and determination method of risk user group
CN110472050A (en) A kind of clique's clustering method and device
CN115545886A (en) Overdue risk identification method, overdue risk identification device, overdue risk identification equipment and storage medium
CN108038692A (en) Role recognition method, device and server
CN115859187A (en) Object identification method and device, electronic equipment and storage medium
CN110555007B (en) Method and device for discriminating theft behavior, computing equipment and storage medium
CN115935265B (en) Method for training risk identification model, risk identification method and corresponding device
CN116823428A (en) Anti-fraud detection method, device, equipment and storage medium
CN116361488A (en) Method and device for mining risk object based on knowledge graph
CN110880117A (en) False service identification method, device, equipment and storage medium
CN115438747A (en) Abnormal account recognition model training method, device, equipment and medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20922625

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20922625

Country of ref document: EP

Kind code of ref document: A1