WO2020140678A1 - Abnormal application detection method and apparatus, and computer device and storage medium - Google Patents

Abnormal application detection method and apparatus, and computer device and storage medium Download PDF

Info

Publication number
WO2020140678A1
WO2020140678A1 PCT/CN2019/123190 CN2019123190W WO2020140678A1 WO 2020140678 A1 WO2020140678 A1 WO 2020140678A1 CN 2019123190 W CN2019123190 W CN 2019123190W WO 2020140678 A1 WO2020140678 A1 WO 2020140678A1
Authority
WO
WIPO (PCT)
Prior art keywords
application
feature set
feature
preset
sub
Prior art date
Application number
PCT/CN2019/123190
Other languages
French (fr)
Chinese (zh)
Inventor
马新俊
Original Assignee
深圳壹账通智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳壹账通智能科技有限公司 filed Critical 深圳壹账通智能科技有限公司
Publication of WO2020140678A1 publication Critical patent/WO2020140678A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/192Recognition using electronic means using simultaneous comparisons or correlations of the image signals with a plurality of references
    • G06V30/194References adjustable by an adaptive method, e.g. learning

Definitions

  • This application relates to an abnormal application detection method, device, computer equipment, and storage medium.
  • an abnormal application detection method, device, computer equipment, and storage medium are provided.
  • An abnormal application detection method including:
  • the model is constructed based on the unmarked training feature set, which is used to perform anomaly feature detection by applying the features in the feature set to the corresponding path length in the isolated tree of the isolated forest model, and based on the length of the path for the application feature set Score;
  • An abnormal application detection device including:
  • the data acquisition module is used to obtain the credit application data of the applicant
  • the feature extraction module is used to obtain the applicant information and the application feature set corresponding to the applicant information according to the credit application data, and the application features in the application feature set include application time, application frequency, application trend and institution preference ;
  • a score calculation module configured to input the application feature set into a preset isolated forest model, perform anomaly detection on the features in the application feature set through the preset isolated forest model, and obtain an application score corresponding to the application feature set
  • the preset isolated forest model is constructed based on the unmarked training feature set, which is used to perform abnormal feature detection by applying the features in the feature set to the corresponding path length in the isolated tree of the isolated forest model, and based on the path Length to score the application feature set;
  • the abnormal application determination module is used to obtain the absolute value of the difference between the application score and the preset reference value, and when the absolute value of the difference is lower than the preset threshold, the application corresponding to the credit application data is determined to be an abnormal application .
  • a computer device includes a memory and one or more processors.
  • the memory stores computer-readable instructions.
  • the one or more processors are executed The following steps:
  • the model is constructed based on the unmarked training feature set, which is used to perform anomaly feature detection by applying the features in the feature set to the corresponding path length in the isolated tree of the isolated forest model, and based on the length of the path for the application feature set Score;
  • One or more non-volatile storage media storing computer readable instructions.
  • the computer readable instructions When executed by one or more processors, the one or more processors perform the following steps:
  • the model is constructed based on the unmarked training feature set, which is used to perform anomaly feature detection by applying the features in the feature set to the corresponding path length in the isolated tree of the isolated forest model, and based on the length of the path for the application feature set Score;
  • FIG. 1 is an application scenario diagram of an abnormal application detection method according to one or more embodiments.
  • FIG. 2 is a schematic flowchart of an abnormal application detection method according to one or more embodiments.
  • FIG. 3 is a schematic flowchart of sub-steps of step S400 in FIG. 2 according to one or more embodiments.
  • FIG. 4 is a schematic flowchart of an abnormal application detection method in another embodiment.
  • FIG. 5 is a schematic flowchart of sub-steps of step S600 in FIG. 2 according to one or more embodiments.
  • FIG. 6 is a block diagram of an abnormal application detection device according to one or more embodiments.
  • FIG. 7 is a block diagram of a computer device according to one or more embodiments.
  • the abnormal application detection method provided by this application can be applied to the application environment shown in FIG. 1.
  • the terminal 102 and the server 104 communicate via the network.
  • the server 104 obtains the credit application data of the applicant provided by the application monitoring staff through the network. After receiving the credit application data, the server first obtains the applicant information and the application feature set corresponding to the applicant information based on the credit application data, and then applies the application characteristics Input the preset isolated forest model, use the preset isolated forest model to perform anomaly detection on the features in the application feature set, obtain the application score corresponding to the application feature set, and then determine whether the application is an abnormal application according to the application score, and then detect the The result is fed back to the terminal 102.
  • the terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablets, and portable wearable devices.
  • the server 104 may be implemented by an independent server or a server cluster composed of multiple servers.
  • an abnormal application detection method is provided.
  • the method is applied to the server in FIG. 1 as an example for illustration, including the following steps:
  • Applicant specifically refers to the applicant who is applying for a credit.
  • the application specifically refers to a credit application.
  • the credit application data specifically includes the data provided by the user for the credit application, which can specifically include the user's personal information and the credit that the user can provide. data.
  • Applicant information refers to the personal information corresponding to the applicant. According to the applicant information, the relevant data of the applicant can be obtained.
  • the application feature set specifically refers to the feature data used to analyze the applicant's credit situation.
  • the application feature set can specifically include the application Time dimension, frequency of application, application trend, institutional preference and other dimensions.
  • the isolated forest model is a model based on the isolated forest algorithm.
  • the isolated forest algorithm is generally used to mine abnormal data, or outlier mining, that is, in a large number of data, find the data that does not conform to the rules of other data. . For the anomaly data found, then either the anomaly data should be directly cleared, such as the noise removal data in the data cleaning, or the in-depth analysis of the anomaly data, such as the analysis of the behavioral characteristics of attacks and fraud.
  • the preset isolated forest model is based on the unmarked training feature set and is used to detect whether the application belongs to an abnormal application model. The preset isolated forest model is used to detect the applicant's previous application characteristics to obtain the application corresponding to the application data. score.
  • the application scoring process includes obtaining the length of the corresponding path of each feature in the application feature set in the isolated tree of the isolated forest model, performing abnormal feature detection based on the path length, and scoring the application feature set based on the path length to obtain the current Apply for the corresponding score.
  • the application score refers to the score obtained based on the application data.
  • the application feature set is input into the preset isolated forest model to obtain the application score corresponding to the application feature set.
  • each application feature in the application feature set is anomaly detected through an isolated tree in the preset forest model.
  • An application feature can be detected by multiple isolated trees to obtain anomalies corresponding to each application feature Degree, and then integrate the abnormality of each application feature to obtain the application score corresponding to the application feature set.
  • the abnormal application specifically includes a fraud application, that is, the applicant's credit application contains fraud.
  • a fraud application that is, the applicant's credit application contains fraud.
  • anomalies anomaly detection
  • anomalies are defined as "more easily like outliers (more likely like to be separated)", which can be understood as sparse and isolated Groups with higher density are farther away.
  • sparsely distributed areas indicate that the probability of events occurring in this area is very low, so it can be considered that the data falling in these areas is abnormal.
  • the application corresponding to the credit application data is determined to be an abnormal application.
  • the preset benchmark value is 1, and the closer the application score is to 1, the higher the probability of fraud.
  • an abnormal threshold may be set, and then the difference between the application score and 1 is obtained.
  • the absolute value of the difference between the application score and 1 is lower than or equal to the abnormal threshold, the application is determined to be an abnormal application. If the difference from 1 is higher than the abnormal threshold, it is determined that the application is not an abnormal application. .
  • the above abnormal application detection method first obtains the applicant's credit application data; obtains the applicant information and the application feature set corresponding to the applicant information based on the credit application data; enters the application feature set into the preset isolated forest model after unsupervised training, Anomaly detection is performed on the features in the application feature set through a preset isolated forest model to obtain the application score corresponding to the application feature set; according to the application score, it is determined whether the application corresponding to the credit application data is an abnormal application.
  • This proposal uses the default isolated forest model obtained by unsupervised learning to score credit applications without labeling training, which greatly improves the practicality of the abnormal application scoring system and has the ability to identify variant frauds and frauds that have never been seen before.
  • the output in the form of score reflects its abnormal degree, which is convenient for customers to understand and productize.
  • S400 specifically includes:
  • before S600 includes:
  • the unmarked training feature set refers to the feature set used to train the isolated forest model.
  • the composition of the training feature set is similar to that of the application feature set, and includes dimensions such as application time, application frequency, application trend, and institutional preference.
  • the isolated forest model is composed of multiple isolated trees. First, an isolated tree is established based on the unmarked training feature set, and an isolated forest model used for application scoring according to the application feature set is composed of multiple isolated trees.
  • the unlabeled training feature set includes features such as application time, application frequency, application trends, and institutional preferences.
  • the data can build an isolated tree, and then train to generate multiple isolated trees based on a large amount of unlabeled data to form an isolated forest.
  • S520 specifically includes: sampling a sample feature set containing ⁇ samples from the unlabeled training feature set as an isolated tree training sample set; randomly selecting individual features of the samples in the sample feature set; according to the features Binary divide the sample feature set to obtain two sub-feature sets; determine whether the sub-feature set can be divided again, and when the sub-feature set can be divided again, return the sub-feature set as a new sample feature set The step of performing binary division to obtain two sub-feature sets; when the sub-feature set cannot be divided again, the binary division of the sub-feature set is stopped.
  • the process of constructing an isolated tree based on the unlabeled training feature set specifically includes: first, a sample feature set containing ⁇ samples is sampled from the training feature set containing the training sample as the training sample set of the isolated tree, and based on the sample feature set, a Isolated tree for training.
  • each isolated tree may be equally divided into a training feature set based on the number of isolated trees, and the isolated tree may be trained based on the divided training feature set.
  • the sample feature set After establishing the sample feature set, randomly select a feature contained in the sample in the sample feature set, and obtain the possible value of this feature, and then randomly select a value in all the value ranges of this feature, and perform a binary search on the sample feature set Divide, divide the samples in the sample feature set less than this value to the left of the node, and the samples greater than or equal to this value to the right of the node. This results in a split condition and sub-feature sets on both sides. At the same time, it is judged whether the sub-feature set can be divided again. When the sub-feature can be divided again, a value is randomly selected from all the value ranges of the feature in this sub-feature set, and the sub-feature set is divided into two parts.
  • the binary division of the sub-feature set ends.
  • the division referred to here is true for all sub-feature sets that can be divided.
  • the isolation tree can be effectively established by dividing the feature set. And can effectively detect abnormal applications based on the feature tree.
  • the sample feature set is binary-divided according to the features, and after obtaining the two sub-feature sets includes:
  • the number of returns needs to be recorded.
  • the return number is greater than log2( ⁇ )-2
  • the binary division of the sub-feature set needs to be stopped.
  • the height of the isolated tree is limited. The height of the isolated tree is determined based on the number of samples ⁇ contained in the sample feature set used for training. When the height of the isolated tree reaches log2( ⁇ ). The height of the isolated tree and whether the sample can be divided again is the basis for whether the sub-feature set needs to be divided again. If either side meets the corresponding conditions, the binary division should be terminated.
  • step S600 specifically includes:
  • S660 Determine the application score of the application feature set according to the path length corresponding to all the application features in the application feature set.
  • the abnormal application detection method of the present application specifically includes the following steps: obtaining the applicant's credit application data; obtaining the applicant information based on the credit application data; based on the applicant information, looking up the applicant's Historical application data to obtain historical application data; based on historical application data to obtain application feature set, the application features in the application feature set include application time, application frequency, application trends and institutional preferences; sampling from the unmarked training feature set contains ⁇
  • the sample feature set of the sample is used as the training sample set of the isolated tree; the single feature of the sample in the sample feature set is randomly selected; the sample feature set is binary divided according to the feature to obtain two sub-feature sets; whether the sub-feature set can be divided again , When the sub-feature set can be divided again, return the sub-feature set as a new sample feature set to binary divide the sample feature set according to the feature, obtain two sub-feature sets, and record the number of returns; when the sub-feature set cannot When dividing again or when the number of returns is greater than log2( ⁇ )-2, the binary
  • steps in the flowcharts of FIGS. 2-5 are sequentially displayed in accordance with the arrows, these steps are not necessarily performed in the order indicated by the arrows. Unless clearly stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least some of the steps in FIGS. 2-5 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but may be executed at different times. These sub-steps or stages The execution order of is not necessarily sequential, but may be executed in turn or alternately with at least a part of other steps or sub-steps or stages of other steps.
  • an abnormality application detection device including: a data acquisition module 200, a feature extraction module 400, a score calculation module 600, and an abnormality application determination module 800, wherein:
  • the data obtaining module 200 is used to obtain the credit application data of the applicant
  • the feature extraction module 400 is used to obtain the applicant information and the application feature set corresponding to the applicant information according to the credit application data;
  • the score calculation module 600 is used to input the application feature set into the preset isolated forest model, perform anomaly detection on the features in the application feature set through the preset isolated forest model, and obtain the application score corresponding to the application feature set.
  • the preset isolated forest model is based on no
  • the labeled training feature set is constructed to perform anomaly feature detection by applying the features in the feature set to the corresponding path length in the isolated tree of the isolated forest model, and scoring the applied feature set based on the length of the path;
  • the abnormal application determination module 800 is used to obtain the absolute value of the difference between the application score and the preset reference value. When the absolute value of the difference is lower than the preset threshold, the application corresponding to the credit application data is determined to be an abnormal application.
  • the feature extraction module 400 is specifically configured to: obtain applicant information based on credit application data; obtain applicant historical application data based on applicant information; and obtain application feature sets based on historical application data.
  • the score calculation module 600 is used to: input the application feature set into the preset isolated forest model; perform anomaly detection on each application feature in the application feature set through the isolated tree in the preset forest model to obtain the respective Obtain the anomaly degree corresponding to each application feature; and obtain the application score corresponding to the application feature set according to the anomaly degree of each application feature.
  • the model training module includes:
  • the isolated tree building unit is used to construct an isolated tree by preset unmarked training feature sets;
  • the model building unit constructs a preset isolated forest model based on the isolated tree.
  • the isolated tree building unit is specifically used for: sampling a sample feature set containing ⁇ samples from the unlabeled training feature set as an isolated tree training sample set; randomly selecting a single sample in the sample feature set Features; Binary divide the sample feature set according to the feature to obtain two sub-feature sets; determine whether the sub-feature set can be divided again, and when the sub-feature set can be divided again, return the sub-feature set as a new sample feature set according to the feature. The step of bifurcation of the sample feature set to obtain two sub-feature sets; and when the number of samples in the sub-feature set is equal to 1, stop the bifurcation of the sub-feature set.
  • the isolated tree building unit is also used to: return the sub-feature set as a new sample feature set.
  • the score calculation module 600 is specifically used to: obtain the application features in the application feature set; run the corresponding application features on the isolated tree in the preset isolated forest model, and record the path that the application features traverse during the operation Length; and determine the application score of the application feature set according to the path length corresponding to all the application features in the application feature set.
  • the abnormal application determination module 800 is specifically configured to: obtain the difference between the application score and 1, and when the difference is lower than a preset threshold, determine that the application is an abnormal application.
  • Each module in the above abnormal application detection device may be implemented in whole or in part by software, hardware, or a combination thereof.
  • the above modules may be embedded in the hardware form or independent of the processor in the computer device, or may be stored in the memory in the computer device in the form of software so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure may be as shown in FIG. 7.
  • the computer device includes a processor, memory, network interface, and database connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, a computer program, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
  • the database of the computer device is used to store abnormal application data.
  • the network interface of the computer device is used to communicate with external terminals through a network connection. When the computer program is executed by the processor, an abnormal application detection method is realized.
  • FIG. 7 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Include more or less components than shown in the figure, or combine certain components, or have a different arrangement of components.
  • a computer device includes a memory and one or more processors.
  • the memory stores computer-readable instructions.
  • the steps of the abnormal application detection method provided in any embodiment of the present application are implemented.
  • One or more non-volatile storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to implement the exception provided in any embodiment of the present application Apply the steps of the detection method.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM random access memory
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain (Synchlink) DRAM
  • RDRAM direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Disclosed is an abnormal application detection method, comprising: acquiring credit application data of an applicant; acquiring, according to the credit application data, applicant information and an application feature set corresponding to the applicant information; inputting the application feature set into a pre-set isolation forest model, and performing abnormality detection on features in the application feature set by means of the pre-set isolation forest model in order to acquire an application score corresponding to the application feature set, wherein the pre-set isolation forest model is constructed based on an unmarked training feature set; and acquiring an absolute value of a difference value of the application score and a pre-set reference value, and when the absolute value of the difference value is less than a pre-set threshold value, determining that an application corresponding to the credit application data is an abnormal application.

Description

异常申请检测方法、装置、计算机设备和存储介质Abnormal application detection method, device, computer equipment and storage medium
相关申请的交叉引用Cross-reference of related applications
本申请要求于2019年01月04日提交中国专利局,申请号为2019100070454,申请名称为“异常申请检测方法、装置、计算机设备以及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requires priority to be submitted to the China Patent Office on January 04, 2019, with the application number 2019100070454 and the name of the Chinese patent application named "abnormal application detection method, device, computer equipment and storage medium", the entire content of which is cited by reference Incorporated in this application.
技术领域Technical field
本申请涉及一种异常申请检测方法、装置、计算机设备和存储介质。This application relates to an abnormal application detection method, device, computer equipment, and storage medium.
背景技术Background technique
随着互联网信贷产业在近几年的飞速发展,其呈现出百家争鸣、百花齐放的局面,然而伴随着产业的欣荣发展,欺诈黑色产业链也在不断地渗透到该领域,各种新颖的欺诈模式层出不穷,对互联网信贷产业的健康发展蒙上了一层阴影。据不完全统计,每年因欺诈导致的损失巨大,欺诈风险已成为互联网信贷产业风险的重中之重。With the rapid development of the Internet credit industry in recent years, it has shown a situation of controversy and blossoming. However, with the prosperous development of the industry, the fraudulent black industry chain has also continuously penetrated into this field. Various novel fraud models The endless stream has cast a shadow over the healthy development of the Internet credit industry. According to incomplete statistics, the losses caused by fraud are huge every year, and the risk of fraud has become the top priority of the Internet credit industry.
然而,发明人意识到,目前业内主要采取监督学习算法对用户的贷款申请进行欺诈检测,但大部分情况下的用于检测用户行为是否属于欺诈行为的数据是没有标签的,人工进行标注的成本极大,且诈骗的手段总在变化。However, the inventor realized that currently the industry mainly adopts supervised learning algorithms for fraud detection of user loan applications, but in most cases, the data used to detect whether user behavior is fraudulent is unlabeled, and the cost of manual labeling Extremely, and the means of fraud always change.
发明内容Summary of the invention
根据本申请公开的各种实施例,提供一种异常申请检测方法、装置、计算机设备和存储介质。According to various embodiments disclosed in the present application, an abnormal application detection method, device, computer equipment, and storage medium are provided.
一种异常申请检测方法,包括:An abnormal application detection method, including:
获取申请人的信贷申请数据;Obtain the applicant's credit application data;
根据所述信贷申请数据获取申请人信息以及与所述申请人信息对应的申请特征集;Obtaining applicant information and an application feature set corresponding to the applicant information according to the credit application data;
将所述申请特征集输入预设孤立森林模型,通过所述预设孤立森林模型对所述申请特征集内特征进行异常检测,获取所述申请特征集对应的申请评分,所述预设孤立森林模型基于无标记的训练特征集构建,用于通过申请特征集内各个特征在所述孤立森林模型的孤立树中对应路径的长度进行异常特征检测,并基于所述路径的长度对申请特征集进行评分;及Input the application feature set into a preset isolated forest model, perform anomaly detection on the features in the application feature set through the preset isolated forest model, and obtain an application score corresponding to the application feature set, the preset isolated forest The model is constructed based on the unmarked training feature set, which is used to perform anomaly feature detection by applying the features in the feature set to the corresponding path length in the isolated tree of the isolated forest model, and based on the length of the path for the application feature set Score; and
获取所述申请评分与预设基准值差值的绝对值,当所述差值的绝对值低于预设阈值时,判定所述信贷申请数据对应的申请为异常申请。Obtain the absolute value of the difference between the application score and the preset reference value, and when the absolute value of the difference is lower than the preset threshold, determine that the application corresponding to the credit application data is an abnormal application.
一种异常申请检测装置,包括:An abnormal application detection device, including:
数据获取模块,用于获取申请人的信贷申请数据;The data acquisition module is used to obtain the credit application data of the applicant;
特征提取模块,用于根据所述信贷申请数据获取申请人信息以及与所述申请人信息对应的申请特征集,所述申请特征集中的申请特征包括申请时点、申请频次、申请趋势以及机构偏好;The feature extraction module is used to obtain the applicant information and the application feature set corresponding to the applicant information according to the credit application data, and the application features in the application feature set include application time, application frequency, application trend and institution preference ;
评分计算模块,用于将所述申请特征集输入预设孤立森林模型,通过所述预设孤立森林模型对所述申请特征集内特征进行异常检测,获取所述申请特征集对应的申请评分,所述预设孤立森林模型基于无标记的训练特征集构建,用于通过申请特征集内各个特征在所述孤立森林模型的孤立树中对应路径的长度进行异常特征检测,并基于所述路径的长度对申请特征集进行评分;及A score calculation module, configured to input the application feature set into a preset isolated forest model, perform anomaly detection on the features in the application feature set through the preset isolated forest model, and obtain an application score corresponding to the application feature set, The preset isolated forest model is constructed based on the unmarked training feature set, which is used to perform abnormal feature detection by applying the features in the feature set to the corresponding path length in the isolated tree of the isolated forest model, and based on the path Length to score the application feature set; and
异常申请判定模块,用于获取所述申请评分与预设基准值差值的绝对值,当所述差值的绝对值低于预设阈值时,判定所述信贷申请数据对应的申请为异常申请。The abnormal application determination module is used to obtain the absolute value of the difference between the application score and the preset reference value, and when the absolute value of the difference is lower than the preset threshold, the application corresponding to the credit application data is determined to be an abnormal application .
一种计算机设备,包括存储器和一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述一个或多个处理器执行以下步骤:A computer device includes a memory and one or more processors. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the one or more processors are executed The following steps:
获取申请人的信贷申请数据;Obtain the applicant's credit application data;
根据所述信贷申请数据获取申请人信息以及与所述申请人信息对应的申请特征集;Obtaining applicant information and an application feature set corresponding to the applicant information according to the credit application data;
将所述申请特征集输入预设孤立森林模型,通过所述预设孤立森林模型对所述申请特征集内特征进行异常检测,获取所述申请特征集对应的申请评分,所述预设孤立森林模型基于无标记的训练特征集构建,用于通过申请特征集内各个特征在所述孤立森林模型的孤立树中对应路径的长度进行异常特征检测,并基于所述路径的长度对申请特征集进行评分;及Input the application feature set into a preset isolated forest model, perform anomaly detection on the features in the application feature set through the preset isolated forest model, and obtain an application score corresponding to the application feature set, the preset isolated forest The model is constructed based on the unmarked training feature set, which is used to perform anomaly feature detection by applying the features in the feature set to the corresponding path length in the isolated tree of the isolated forest model, and based on the length of the path for the application feature set Score; and
获取所述申请评分与预设基准值差值的绝对值,当所述差值的绝对值低于预设阈值时,判定所述信贷申请数据对应的申请为异常申请。Obtain the absolute value of the difference between the application score and the preset reference value, and when the absolute value of the difference is lower than the preset threshold, determine that the application corresponding to the credit application data is an abnormal application.
一个或多个存储有计算机可读指令的非易失性存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:One or more non-volatile storage media storing computer readable instructions. When the computer readable instructions are executed by one or more processors, the one or more processors perform the following steps:
获取申请人的信贷申请数据;Obtain the applicant's credit application data;
根据所述信贷申请数据获取申请人信息以及与所述申请人信息对应的申请特征集;Obtaining applicant information and an application feature set corresponding to the applicant information according to the credit application data;
将所述申请特征集输入预设孤立森林模型,通过所述预设孤立森林模型对所述申请特征集内特征进行异常检测,获取所述申请特征集对应的申请评分,所述预设孤立森林模型基于无标记的训练特征集构建,用于通过申请特 征集内各个特征在所述孤立森林模型的孤立树中对应路径的长度进行异常特征检测,并基于所述路径的长度对申请特征集进行评分;及Input the application feature set into a preset isolated forest model, perform anomaly detection on the features in the application feature set through the preset isolated forest model, and obtain an application score corresponding to the application feature set, the preset isolated forest The model is constructed based on the unmarked training feature set, which is used to perform anomaly feature detection by applying the features in the feature set to the corresponding path length in the isolated tree of the isolated forest model, and based on the length of the path for the application feature set Score; and
获取所述申请评分与预设基准值差值的绝对值,当所述差值的绝对值低于预设阈值时,判定所述信贷申请数据对应的申请为异常申请。Obtain the absolute value of the difference between the application score and the preset reference value, and when the absolute value of the difference is lower than the preset threshold, determine that the application corresponding to the credit application data is an abnormal application.
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。The details of one or more embodiments of the application are set forth in the drawings and description below. Other features and advantages of this application will become apparent from the description, drawings, and claims.
附图说明BRIEF DESCRIPTION
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。In order to more clearly explain the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings required in the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. Those of ordinary skill in the art can obtain other drawings based on these drawings without creative efforts.
图1为根据一个或多个实施例中异常申请检测方法的应用场景图。FIG. 1 is an application scenario diagram of an abnormal application detection method according to one or more embodiments.
图2为根据一个或多个实施例中异常申请检测方法的流程示意图。FIG. 2 is a schematic flowchart of an abnormal application detection method according to one or more embodiments.
图3为根据一个或多个实施例中图2中步骤S400的子步骤的流程示意图。FIG. 3 is a schematic flowchart of sub-steps of step S400 in FIG. 2 according to one or more embodiments.
图4为另一个实施例中异常申请检测方法的流程示意图。FIG. 4 is a schematic flowchart of an abnormal application detection method in another embodiment.
图5为根据一个或多个实施例中图2中步骤S600的子步骤的流程示意图。FIG. 5 is a schematic flowchart of sub-steps of step S600 in FIG. 2 according to one or more embodiments.
图6为根据一个或多个实施例中异常申请检测装置的框图。FIG. 6 is a block diagram of an abnormal application detection device according to one or more embodiments.
图7为根据一个或多个实施例中计算机设备的框图。7 is a block diagram of a computer device according to one or more embodiments.
具体实施方式detailed description
为了使本申请的技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the technical solutions and advantages of the present application more clear, the following describes the present application in further detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, and are not used to limit the present application.
本申请提供的异常申请检测方法,可以应用于如图1所示的应用环境中。终端102与服务器104通过网络进行通信。服务器104通过网络获取申请监测工作人员提供的申请人的信贷申请数据,服务器接受到信贷申请数据后,首先根据信贷申请数据获取申请人信息以及与申请人信息对应的申请特征集,而后将申请特征集输入预设孤立森林模型,通过预设孤立森林模型对申请特征集内特征进行异常检测,获取申请特征集对应的申请评分,而后根据该申请评分判断该申请是否属于异常申请,而后将检测的结果反馈至终端102。终端102可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备,服务器104可以用独立的服务器或者是多个服 务器组成的服务器集群来实现。The abnormal application detection method provided by this application can be applied to the application environment shown in FIG. 1. The terminal 102 and the server 104 communicate via the network. The server 104 obtains the credit application data of the applicant provided by the application monitoring staff through the network. After receiving the credit application data, the server first obtains the applicant information and the application feature set corresponding to the applicant information based on the credit application data, and then applies the application characteristics Input the preset isolated forest model, use the preset isolated forest model to perform anomaly detection on the features in the application feature set, obtain the application score corresponding to the application feature set, and then determine whether the application is an abnormal application according to the application score, and then detect the The result is fed back to the terminal 102. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablets, and portable wearable devices. The server 104 may be implemented by an independent server or a server cluster composed of multiple servers.
在其中一个实施例中,如图2所示,提供了一种异常申请检测方法,以该方法应用于图1中的服务器为例进行说明,包括以下步骤:In one of the embodiments, as shown in FIG. 2, an abnormal application detection method is provided. The method is applied to the server in FIG. 1 as an example for illustration, including the following steps:
S200,获取申请人的信贷申请数据。S200, obtain the credit application data of the applicant.
申请人具体是指正在进行信贷申请的申请人,申请具体指的是信贷申请,信贷申请数据具体包括用户用于信贷申请所提供的数据,具体可以包括用户的个人信息,以及用户能提供的信贷数据。Applicant specifically refers to the applicant who is applying for a credit. The application specifically refers to a credit application. The credit application data specifically includes the data provided by the user for the credit application, which can specifically include the user's personal information and the credit that the user can provide. data.
在申请人提出信贷申请时,如果需要判断该申请是否处于异常申请,首先需要获得申请人用于申请提供的信贷申请数据。When an applicant submits a credit application, if it is necessary to determine whether the application is an abnormal application, the credit application data provided by the applicant for the application needs to be obtained first.
S400,根据信贷申请数据获取申请人信息以及与申请人信息对应的申请特征集;S400, obtaining the applicant information and the application feature set corresponding to the applicant information according to the credit application data;
申请人信息指的是与申请人对应的个人信息,根据申请人信息可以获取申请人相关的数据,申请特征集具体是指用于分析申请人信贷情况的特征数据,申请特征集具体可以包括申请时点、申请频次、申请趋势以及机构偏好等等维度。Applicant information refers to the personal information corresponding to the applicant. According to the applicant information, the relevant data of the applicant can be obtained. The application feature set specifically refers to the feature data used to analyze the applicant's credit situation. The application feature set can specifically include the application Time dimension, frequency of application, application trend, institutional preference and other dimensions.
根据用户提交的信贷申请数据确定申请人,并获取申请人信息,同时获取申请人在信贷申请方面的申请特征集。Determine the applicant based on the credit application data submitted by the user, and obtain the applicant information, as well as the applicant's application feature set for the credit application.
S600,将申请特征集输入预设孤立森林模型,通过预设孤立森林模型对申请特征集内特征进行异常检测,获取申请特征集对应的申请评分,用于通过申请特征集内各个特征在孤立森林模型的孤立树中对应路径的长度进行异常特征检测,并基于路径的长度对申请特征集进行评分。S600, input the application feature set into the preset isolated forest model, and perform anomaly detection on the features in the application feature set through the preset isolated forest model to obtain the application score corresponding to the application feature set, which is used to pass each feature in the application feature set in the isolated forest The length of the corresponding path in the isolated tree of the model is used to detect abnormal features, and the application feature set is scored based on the length of the path.
孤立森林模型是基于孤立森林算法建立的模型,孤立森林算法一般用于挖掘异常数据,或者说离群点挖掘,即是在一大堆数据中,找出与其它数据的规律不太符合的数据。对于找出的异常数据,然后要么直接清除异常数据,如数据清理中的去除噪声数据,要么深入分析异常数据,比如分析攻击、欺诈的行为特征。而预设孤立森林模型基于无标记的训练特征集构建,用于检测申请是否属于异常申请的模型,通过预设孤立森林模型对申请人过往的申请特征进行检测,获得该次申请数据对应的申请评分。获取申请评分过程包括,获取申请特征集内各个特征在孤立森林模型的孤立树中对应路径的长度,基于路径长度进行异常特征检测,并基于路径的长度对申请特征集进行评分,来获得当前的申请对应的评分。申请评分是指基于该申请数据得到的评分。The isolated forest model is a model based on the isolated forest algorithm. The isolated forest algorithm is generally used to mine abnormal data, or outlier mining, that is, in a large number of data, find the data that does not conform to the rules of other data. . For the anomaly data found, then either the anomaly data should be directly cleared, such as the noise removal data in the data cleaning, or the in-depth analysis of the anomaly data, such as the analysis of the behavioral characteristics of attacks and fraud. The preset isolated forest model is based on the unmarked training feature set and is used to detect whether the application belongs to an abnormal application model. The preset isolated forest model is used to detect the applicant's previous application characteristics to obtain the application corresponding to the application data. score. The application scoring process includes obtaining the length of the corresponding path of each feature in the application feature set in the isolated tree of the isolated forest model, performing abnormal feature detection based on the path length, and scoring the application feature set based on the path length to obtain the current Apply for the corresponding score. The application score refers to the score obtained based on the application data.
在获得包含申请特征的申请特征集后,将申请特征集输入预设的孤立森林模型,获取申请特征集对应的申请评分。在其中一个实施例中,通过预设森林模型中的孤立树对申请特征集内的各个申请特征进行异常检测,一个申请特征可以通过多个孤立树进行检测,分别得到每个申请特征对应的异常度, 而后综合各个申请特征的异常度,得到该申请特征集对应的申请评分。After obtaining the application feature set containing the application features, the application feature set is input into the preset isolated forest model to obtain the application score corresponding to the application feature set. In one embodiment, each application feature in the application feature set is anomaly detected through an isolated tree in the preset forest model. An application feature can be detected by multiple isolated trees to obtain anomalies corresponding to each application feature Degree, and then integrate the abnormality of each application feature to obtain the application score corresponding to the application feature set.
S800,获取申请评分与预设基准值差值的绝对值,当差值的绝对值低于预设阈值时,判定信贷申请数据对应的申请为异常申请。S800. Obtain the absolute value of the difference between the application score and the preset reference value. When the absolute value of the difference is lower than the preset threshold, determine that the application corresponding to the credit application data is an abnormal application.
异常申请具体包括欺诈申请,即申请人的该信贷申请包含欺诈行为。目前学术界对异常(anomaly detection)的定义有很多种,在孤立森林中,异常被定义为“容易被孤立的离群点(more likely to be separated)”,可以将其理解为分布稀疏且离密度高的群体较远的点。在特征空间里,分布稀疏的区域表示事件发生在该区域的概率很低,因而可以认为落在这些区域里的数据是异常的。The abnormal application specifically includes a fraud application, that is, the applicant's credit application contains fraud. At present, there are many definitions of anomaly (anomaly detection) in academia. In an isolated forest, anomalies are defined as "more easily like outliers (more likely like to be separated)", which can be understood as sparse and isolated Groups with higher density are farther away. In the feature space, sparsely distributed areas indicate that the probability of events occurring in this area is very low, so it can be considered that the data falling in these areas is abnormal.
在得到基于预设孤立森林模型得到的申请评分后,可以基于该申请评分对该申请是否属于异常申请进行检测,获取申请评分与预设基准值差值的绝对值,当差值的绝对值低于预设阈值时,判定信贷申请数据对应的申请为异常申请。一般来说,预设基准值为1,该申请评分越接近1,其存在欺诈的可能性越高。在其中一个实施例中,可以设置一个异常阈值,而后获取申请评分与1的差值,当申请评分与1的差的绝对值低于或等于异常阈值时,判定该申请属于异常申请,当评分与1的差的高于异常阈值是,判定该申请不属于异常申请。。After obtaining the application score based on the preset isolated forest model, you can detect whether the application is an abnormal application based on the application score, and obtain the absolute value of the difference between the application score and the preset reference value, when the absolute value of the difference is low At the preset threshold, the application corresponding to the credit application data is determined to be an abnormal application. In general, the preset benchmark value is 1, and the closer the application score is to 1, the higher the probability of fraud. In one of the embodiments, an abnormal threshold may be set, and then the difference between the application score and 1 is obtained. When the absolute value of the difference between the application score and 1 is lower than or equal to the abnormal threshold, the application is determined to be an abnormal application. If the difference from 1 is higher than the abnormal threshold, it is determined that the application is not an abnormal application. .
上述异常申请检测方法,首先获取申请人的信贷申请数据;根据信贷申请数据获取申请人信息以及与申请人信息对应的申请特征集;将申请特征集输入经过无监督训练获得预设孤立森林模型,通过预设孤立森林模型对申请特征集内特征进行异常检测,获取申请特征集对应的申请评分;根据申请评分判断信贷申请数据对应的申请是否属于异常申请。本提案采用无监督学习获得的预设孤立森林模型对信贷申请进行评分,无需标签进行训练,大大提高了异常申请评分系统的实用性,有能力识别变种的诈骗和从未见过的诈骗,以评分的形式输出,反映其异常程度,便于客户理解与产品化。The above abnormal application detection method first obtains the applicant's credit application data; obtains the applicant information and the application feature set corresponding to the applicant information based on the credit application data; enters the application feature set into the preset isolated forest model after unsupervised training, Anomaly detection is performed on the features in the application feature set through a preset isolated forest model to obtain the application score corresponding to the application feature set; according to the application score, it is determined whether the application corresponding to the credit application data is an abnormal application. This proposal uses the default isolated forest model obtained by unsupervised learning to score credit applications without labeling training, which greatly improves the practicality of the abnormal application scoring system and has the ability to identify variant frauds and frauds that have never been seen before. The output in the form of score reflects its abnormal degree, which is convenient for customers to understand and productize.
如图3所示,在其中一个实施例中,S400具体包括:As shown in FIG. 3, in one of the embodiments, S400 specifically includes:
S420,根据信贷申请数据获取申请人信息。S420: Obtain applicant information based on credit application data.
S440,根据申请人信息,获取申请人的历史申请数据。S440: Obtain the applicant's historical application data based on the applicant's information.
S460,根据历史申请数据获得申请特征集。S460: Obtain an application feature set based on historical application data.
首先基于申请人提出的申请确定申请人的信息,基于该申请人的信息在信贷数据库内进行搜索,查找申请人的历史信贷申请记录,并获取这些历史申请数据。并基于这些历史申请记录对申请人的申请特征进行总结,获得申请特征集,通过对申请人历史申请记录的总结,可以有效获得申请人的申请特征,并基于申请人的申请特征对申请人的申请是否属于异常申请进行检测。First, determine the applicant's information based on the applicant's application, search the credit database based on the applicant's information, find the applicant's historical credit application records, and obtain these historical application data. Based on these historical application records, the applicant's application characteristics are summarized to obtain the application feature set. Through the summary of the applicant's historical application records, the applicant's application characteristics can be effectively obtained, and the applicant's application characteristics can be obtained based on the applicant's application characteristics. Check whether the application is an abnormal application.
如图4所示,在其中一个实施例中,S600之前包括:As shown in FIG. 4, in one of the embodiments, before S600 includes:
S520,通过预设无标记的训练特征集构建孤立树;S520, construct an isolated tree by preset unmarked training feature sets;
S540,根据孤立树构建预设的孤立森林模型。S540. Construct a preset isolated forest model according to the isolated tree.
无标记的训练特征集是指用于训练孤立森林模型的特征集,训练特征集的组成与申请特征集类似,都包括申请时点、申请频次、申请趋势以及机构偏好等维度。孤立森林模型由多个孤立树组成,首先基于无标记的训练特征集建立孤立树,通过多个孤立树组成用于根据申请特征集进行申请评分的孤立森林模型。无标记的训练特征集包括申请时点、申请频次、申请趋势以及机构偏好等特征,可以随机选取无标记的训练特征集中的某个特征建立一个孤立树,如基于训练特征集中申请人的申请频次的数据可以建立一个孤立树,而后基于大量的无标记数据训练生成多个孤立树组成孤立森林。The unmarked training feature set refers to the feature set used to train the isolated forest model. The composition of the training feature set is similar to that of the application feature set, and includes dimensions such as application time, application frequency, application trend, and institutional preference. The isolated forest model is composed of multiple isolated trees. First, an isolated tree is established based on the unmarked training feature set, and an isolated forest model used for application scoring according to the application feature set is composed of multiple isolated trees. The unlabeled training feature set includes features such as application time, application frequency, application trends, and institutional preferences. You can randomly select a feature in the unlabeled training feature set to build an isolated tree, such as based on the applicant's application frequency in the training feature set The data can build an isolated tree, and then train to generate multiple isolated trees based on a large amount of unlabeled data to form an isolated forest.
在其中一个实施例中,S520具体包括:从无标记的训练特征集中抽样出包含ψ个样本的样本特征集,作为孤立树的训练样本集;随机选取样本特征集内样本的单个特征;根据特征对样本特征集进行二叉划分,获得两个子特征集;判断子特征集是否能再次划分,当子特征集能再次划分时,将子特征集作为新的样本特征集返回根据特征对样本特征集进行二叉划分,获得两个子特征集的步骤;当子特征集不能再次划分时,停止对子特征集的二叉划分。In one of the embodiments, S520 specifically includes: sampling a sample feature set containing ψ samples from the unlabeled training feature set as an isolated tree training sample set; randomly selecting individual features of the samples in the sample feature set; according to the features Binary divide the sample feature set to obtain two sub-feature sets; determine whether the sub-feature set can be divided again, and when the sub-feature set can be divided again, return the sub-feature set as a new sample feature set The step of performing binary division to obtain two sub-feature sets; when the sub-feature set cannot be divided again, the binary division of the sub-feature set is stopped.
根据无标记的训练特征集构建孤立树的过程具体包括:首先从包含训练样本的训练特征集中抽样出包含ψ个样本的样本特征集作为孤立树的训练样本集,并基于该样本特征集对一个孤立树进行训练。在其中一个实施例中,可以基于孤立树的数量对每个孤立树均分训练特征集,并基于均分后的训练特征集对孤立树进行训练。在确立样本特征集后,随机选取样本特征集内样本所包含的一个特征,并获取这个特征可能的取值,而后在这个特征的所有值范围内随机选一个值,对样本特征集进行二叉划分,将样本特征集中小于该值的样本划分到节点的左边,大于等于该值的样本划分到节点的右边。这样得到了一个分裂条件和两边的子特征集。同时判断子特征集是否可以再次划分,当子特征可以再次划分时,再次在这个子特征集内该特征的所有值范围内随机选一个值,对子特征集进行二叉划分。当不能再次划分时即子特征只包含一个数据时,结束对子特征集的二叉划分,这里指的划分对所有可划分的子特征集都成立。通过对特征集的划分能有效得建立孤立树。并能基于特征树对异常的申请进行有效检测。The process of constructing an isolated tree based on the unlabeled training feature set specifically includes: first, a sample feature set containing ψ samples is sampled from the training feature set containing the training sample as the training sample set of the isolated tree, and based on the sample feature set, a Isolated tree for training. In one of the embodiments, each isolated tree may be equally divided into a training feature set based on the number of isolated trees, and the isolated tree may be trained based on the divided training feature set. After establishing the sample feature set, randomly select a feature contained in the sample in the sample feature set, and obtain the possible value of this feature, and then randomly select a value in all the value ranges of this feature, and perform a binary search on the sample feature set Divide, divide the samples in the sample feature set less than this value to the left of the node, and the samples greater than or equal to this value to the right of the node. This results in a split condition and sub-feature sets on both sides. At the same time, it is judged whether the sub-feature set can be divided again. When the sub-feature can be divided again, a value is randomly selected from all the value ranges of the feature in this sub-feature set, and the sub-feature set is divided into two parts. When it cannot be divided again, that is, when the sub-feature contains only one piece of data, the binary division of the sub-feature set ends. The division referred to here is true for all sub-feature sets that can be divided. The isolation tree can be effectively established by dividing the feature set. And can effectively detect abnormal applications based on the feature tree.
在其中一个实施例中,根据特征对样本特征集进行二叉划分,获得两个子特征集之后包括:In one of the embodiments, the sample feature set is binary-divided according to the features, and after obtaining the two sub-feature sets includes:
将子特征集作为新的样本特征集返回根据特征对样本特征集进行二叉划分,获得两个子特征集的步骤,记录返回次数,当返回次数大于log2(ψ)-2时,停止对子特征集的二叉划分。Return the sub-feature set as a new sample feature set. Binary divide the sample feature set according to the feature to obtain two sub-feature sets. Record the number of returns. When the number of returns is greater than log2(ψ)-2, stop the sub-features. Set of binary divisions.
在将子特征集作为新的样本特征集进行二叉划分时,需要记录返回的次数,当返回次数大于log2(ψ)-2时,需要停止对子特征集的二叉划分。孤立树的高度有一定限制,孤立树的高度基于训练所用的样本特征集所包含的样本数目ψ确定,当孤立树的高度最高达到log2(ψ)。孤立树的高度与样本是否可以再次划分是是否需要对子特征集再次进行二叉划分的依据,任意一方达到对应条件,都应终止二叉划分。When the sub-feature set is used as a new sample feature set for binary division, the number of returns needs to be recorded. When the return number is greater than log2(ψ)-2, the binary division of the sub-feature set needs to be stopped. The height of the isolated tree is limited. The height of the isolated tree is determined based on the number of samples ψ contained in the sample feature set used for training. When the height of the isolated tree reaches log2(ψ). The height of the isolated tree and whether the sample can be divided again is the basis for whether the sub-feature set needs to be divided again. If either side meets the corresponding conditions, the binary division should be terminated.
如图5所示,在其中一个实施例中,步骤S600具体包括:As shown in FIG. 5, in one of the embodiments, step S600 specifically includes:
S620,获取申请特征集中申请特征。S620: Obtain the application features in the application feature set.
S640,将对应的申请特征在预设孤立森林模型中的孤立树上运行,记录申请特征在运行过程中经过的路径长度。S640: Run the corresponding application feature on the isolated tree in the preset isolated forest model, and record the path length that the application feature traverses during the operation.
S660,根据申请特征集内所有申请特征对应的路径长度确定申请特征集的申请评分。S660: Determine the application score of the application feature set according to the path length corresponding to all the application features in the application feature set.
在获得申请人的申请特征集后,提取申请特征集内的申请特征,并将申请特征在对应的孤立树上运行,记录单个申请特征在孤立树上到达不可再次划分的节点所经过的路径长度,通过所有特征的路径长度确定申请特征集对应的申请评分。通过申请特征集内所有特征在孤立森林模型内所跑的路径的长度可以确定该申请的某个特征异常,并可以基于每个特征的长度综合对其进行评分。After obtaining the applicant's application feature set, extract the application features in the application feature set and run the application features on the corresponding isolated tree to record the length of the path that a single application feature traverses to the indivisible node on the isolated tree , Determine the application score corresponding to the application feature set by the path length of all features. By applying the length of the path that all the features in the feature set run in the isolated forest model, a certain feature of the application can be determined to be abnormal, and it can be comprehensively scored based on the length of each feature.
在其中一个实施例中,本申请的异常申请检测方法具体包括以下步骤:获取申请人的信贷申请数据;根据信贷申请数据获取申请人信息;根据申请人信息,查找申请人在各类信贷机构的历史申请数据,获取历史申请数据;根据历史申请数据获得申请特征集,申请特征集中的申请特征包括申请时点、申请频次、申请趋势以及机构偏好;从无标记的训练特征集中抽样出包含ψ个样本的样本特征集,作为孤立树的训练样本集;随机选取样本特征集内样本的单个特征;根据特征对样本特征集进行二叉划分,获得两个子特征集;判断子特征集是否能再次划分,当子特征集能再次划分时,将子特征集作为新的样本特征集返回根据特征对样本特征集进行二叉划分,获得两个子特征集的步骤,并记录返回次数;当子特征集不能再次划分时或者当返回次数大于log2(ψ)-2时,停止对子特征集的二叉划分。根据孤立树构建预设的孤立森林模型。获取申请特征集中申请特征;将对应的申请特征在预设孤立森林模型中的孤立树上运行,记录申请特征在运行过程中经过的路径长度;根据申请特征集内所有申请特征对应的路径长度确定申请特征集的申请评分。获取申请评分与1的差值,当差值低于预设阈值时,判定信贷申请数据对应的申请为异常申请。In one of the embodiments, the abnormal application detection method of the present application specifically includes the following steps: obtaining the applicant's credit application data; obtaining the applicant information based on the credit application data; based on the applicant information, looking up the applicant's Historical application data to obtain historical application data; based on historical application data to obtain application feature set, the application features in the application feature set include application time, application frequency, application trends and institutional preferences; sampling from the unmarked training feature set contains ψ The sample feature set of the sample is used as the training sample set of the isolated tree; the single feature of the sample in the sample feature set is randomly selected; the sample feature set is binary divided according to the feature to obtain two sub-feature sets; whether the sub-feature set can be divided again , When the sub-feature set can be divided again, return the sub-feature set as a new sample feature set to binary divide the sample feature set according to the feature, obtain two sub-feature sets, and record the number of returns; when the sub-feature set cannot When dividing again or when the number of returns is greater than log2(ψ)-2, the binary division of the sub-feature set is stopped. Build a preset isolated forest model based on isolated trees. Obtain the application features in the application feature set; run the corresponding application features on the isolated tree in the preset isolated forest model, and record the path length that the application features traverse during the operation; determine according to the path length corresponding to all application features in the application feature set Application score for the application feature set. Obtain the difference between the application score and 1. When the difference is lower than the preset threshold, the application corresponding to the credit application data is determined to be an abnormal application.
应该理解的是,虽然图2-5的流程图中的各个步骤按照箭头的指示依次 显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2-5中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the steps in the flowcharts of FIGS. 2-5 are sequentially displayed in accordance with the arrows, these steps are not necessarily performed in the order indicated by the arrows. Unless clearly stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least some of the steps in FIGS. 2-5 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but may be executed at different times. These sub-steps or stages The execution order of is not necessarily sequential, but may be executed in turn or alternately with at least a part of other steps or sub-steps or stages of other steps.
在其中一个实施例中,如图6所示,提供了一种异常申请检测装置,包括:数据获取模块200、特征提取模块400、评分计算模块600和异常申请判定模块800,其中:In one of the embodiments, as shown in FIG. 6, an abnormality application detection device is provided, including: a data acquisition module 200, a feature extraction module 400, a score calculation module 600, and an abnormality application determination module 800, wherein:
数据获取模块200,用于获取申请人的信贷申请数据;The data obtaining module 200 is used to obtain the credit application data of the applicant;
特征提取模块400,用于根据信贷申请数据获取申请人信息以及与申请人信息对应的申请特征集;The feature extraction module 400 is used to obtain the applicant information and the application feature set corresponding to the applicant information according to the credit application data;
评分计算模块600,用于将申请特征集输入预设孤立森林模型,通过预设孤立森林模型对申请特征集内特征进行异常检测,获取申请特征集对应的申请评分,预设孤立森林模型基于无标记的训练特征集构建,用于通过申请特征集内各个特征在孤立森林模型的孤立树中对应路径的长度进行异常特征检测,并基于路径的长度对申请特征集进行评分;及The score calculation module 600 is used to input the application feature set into the preset isolated forest model, perform anomaly detection on the features in the application feature set through the preset isolated forest model, and obtain the application score corresponding to the application feature set. The preset isolated forest model is based on no The labeled training feature set is constructed to perform anomaly feature detection by applying the features in the feature set to the corresponding path length in the isolated tree of the isolated forest model, and scoring the applied feature set based on the length of the path; and
异常申请判定模块800,用于获取申请评分与预设基准值差值的绝对值,当差值的绝对值低于预设阈值时,判定信贷申请数据对应的申请为异常申请。The abnormal application determination module 800 is used to obtain the absolute value of the difference between the application score and the preset reference value. When the absolute value of the difference is lower than the preset threshold, the application corresponding to the credit application data is determined to be an abnormal application.
在其中一个实施例中,特征提取模块400具体用于:根据信贷申请数据获取申请人信息;根据申请人信息,获取申请人的历史申请数据;及根据历史申请数据获得申请特征集。In one of the embodiments, the feature extraction module 400 is specifically configured to: obtain applicant information based on credit application data; obtain applicant historical application data based on applicant information; and obtain application feature sets based on historical application data.
在其中一个实施例中,评分计算模块600用于:将申请特征集输入预设孤立森林模型;通过预设森林模型中的孤立树对申请特征集内的各个申请特征进行异常检测,获取各分别得到每个申请特征对应的异常度;及根据各申请特征的异常度,获取申请特征集对应的申请评分。In one of the embodiments, the score calculation module 600 is used to: input the application feature set into the preset isolated forest model; perform anomaly detection on each application feature in the application feature set through the isolated tree in the preset forest model to obtain the respective Obtain the anomaly degree corresponding to each application feature; and obtain the application score corresponding to the application feature set according to the anomaly degree of each application feature.
在其中一个实施例中,还包括模型训练模块,模型训练模块包括:In one of the embodiments, it further includes a model training module. The model training module includes:
孤立树建立单元,用于通过预设无标记的训练特征集构建孤立树;The isolated tree building unit is used to construct an isolated tree by preset unmarked training feature sets;
模型建立单元,根据孤立树构建预设的孤立森林模型。The model building unit constructs a preset isolated forest model based on the isolated tree.
在其中一个实施例中,孤立树建立单元具体用于:从无标记的训练特征集中抽样出包含ψ个样本的样本特征集,作为孤立树的训练样本集;随机选取样本特征集内样本的单个特征;根据特征对样本特征集进行二叉划分,获得两个子特征集;判断子特征集是否能再次划分,当子特征集能再次划分时,将子特征集作为新的样本特征集返回根据特征对样本特征集进行二叉划分, 获得两个子特征集的步骤;及当子特征集内样本数等于1时,停止对子特征集的二叉划分。In one of the embodiments, the isolated tree building unit is specifically used for: sampling a sample feature set containing ψ samples from the unlabeled training feature set as an isolated tree training sample set; randomly selecting a single sample in the sample feature set Features; Binary divide the sample feature set according to the feature to obtain two sub-feature sets; determine whether the sub-feature set can be divided again, and when the sub-feature set can be divided again, return the sub-feature set as a new sample feature set according to the feature The step of bifurcation of the sample feature set to obtain two sub-feature sets; and when the number of samples in the sub-feature set is equal to 1, stop the bifurcation of the sub-feature set.
在其中一个实施例中,孤立树建立单元还用于:将子特征集作为新的样本特征集返回根据特征对样本特征集进行二叉划分,获得两个子特征集的步骤,记录返回次数,当返回次数大于log2(ψ)-2时,停止对子特征集的二叉划分。In one of the embodiments, the isolated tree building unit is also used to: return the sub-feature set as a new sample feature set. Binary divide the sample feature set according to the feature, obtain two sub-feature sets, record the number of returns, when When the number of returns is greater than log2(ψ)-2, stop the binary division of the sub-feature set.
在其中一个实施例中,评分计算模块600具体用于:获取申请特征集中申请特征;将对应的申请特征在预设孤立森林模型中的孤立树上运行,记录申请特征在运行过程中经过的路径长度;及根据申请特征集内所有申请特征对应的路径长度确定申请特征集的申请评分。In one of the embodiments, the score calculation module 600 is specifically used to: obtain the application features in the application feature set; run the corresponding application features on the isolated tree in the preset isolated forest model, and record the path that the application features traverse during the operation Length; and determine the application score of the application feature set according to the path length corresponding to all the application features in the application feature set.
在其中一个实施例中,异常申请判定模块800具体用于:获取申请评分与1的差值,当差值低于预设阈值时,判定申请为异常申请。In one of the embodiments, the abnormal application determination module 800 is specifically configured to: obtain the difference between the application score and 1, and when the difference is lower than a preset threshold, determine that the application is an abnormal application.
关于异常申请检测装置的具体限定可以参见上文中对于异常申请检测方法的限定,在此不再赘述。上述异常申请检测装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For the specific limitation of the abnormal application detection device, please refer to the above definition of the abnormal application detection method, which will not be repeated here. Each module in the above abnormal application detection device may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in the hardware form or independent of the processor in the computer device, or may be stored in the memory in the computer device in the form of software so that the processor can call and execute the operations corresponding to the above modules.
在其中一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图7所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储异常申请数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种异常申请检测方法。In one of the embodiments, a computer device is provided. The computer device may be a server, and its internal structure may be as shown in FIG. 7. The computer device includes a processor, memory, network interface, and database connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The database of the computer device is used to store abnormal application data. The network interface of the computer device is used to communicate with external terminals through a network connection. When the computer program is executed by the processor, an abnormal application detection method is realized.
本领域技术人员可以理解,图7中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in FIG. 7 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may Include more or less components than shown in the figure, or combine certain components, or have a different arrangement of components.
一种计算机设备,包括存储器和一个或多个处理器,存储器中存储有计算机可读指令,计算机可读指令被处理器执行时实现本申请任意一个实施例中提供的异常申请检测方法的步骤。A computer device includes a memory and one or more processors. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the steps of the abnormal application detection method provided in any embodiment of the present application are implemented.
一个或多个存储有计算机可读指令的非易失性存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器实现本申请任意一个 实施例中提供的异常申请检测方法方法的步骤。One or more non-volatile storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to implement the exception provided in any embodiment of the present application Apply the steps of the detection method.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。A person of ordinary skill in the art may understand that all or part of the process in the method of the above embodiments may be completed by instructing relevant hardware through computer-readable instructions, and the computer-readable instructions may be stored in a non-volatile computer-readable In the storage medium, when the computer-readable instructions are executed, they may include the processes of the foregoing method embodiments. Wherein, any reference to the memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above embodiments can be arbitrarily combined. In order to simplify the description, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should be It is considered as the scope described in this specification.
以上实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above examples only express several implementations of the present application, and their descriptions are more specific and detailed, but they should not be construed as limiting the scope of the invention patent. It should be noted that, for those of ordinary skill in the art, without departing from the concept of the present application, a number of modifications and improvements can also be made, which all fall within the protection scope of the present application. Therefore, the protection scope of the patent of this application shall be subject to the appended claims.

Claims (20)

  1. 一种异常申请检测方法,包括:An abnormal application detection method, including:
    获取申请人的信贷申请数据;Obtain the applicant's credit application data;
    根据所述信贷申请数据获取申请人信息以及与所述申请人信息对应的申请特征集;Obtaining applicant information and an application feature set corresponding to the applicant information according to the credit application data;
    将所述申请特征集输入预设孤立森林模型,通过所述预设孤立森林模型对所述申请特征集内特征进行异常检测,获取所述申请特征集对应的申请评分,所述预设孤立森林模型基于无标记的训练特征集构建,用于通过申请特征集内各个特征在所述孤立森林模型的孤立树中对应路径的长度进行异常特征检测,并基于所述路径的长度对申请特征集进行评分;及Input the application feature set into a preset isolated forest model, perform anomaly detection on the features in the application feature set through the preset isolated forest model, and obtain an application score corresponding to the application feature set, the preset isolated forest The model is built based on the unmarked training feature set, which is used to perform anomaly feature detection by applying the features in the feature set to the corresponding path length in the isolated tree of the isolated forest model, and based on the length of the path. Score; and
    获取所述申请评分与预设基准值差值的绝对值,当所述差值的绝对值低于预设阈值时,判定所述信贷申请数据对应的申请为异常申请。Obtain the absolute value of the difference between the application score and the preset reference value, and when the absolute value of the difference is lower than the preset threshold, determine that the application corresponding to the credit application data is an abnormal application.
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述信贷申请数据获取申请人信息以及与所述申请人信息对应的申请特征集,包括:The method according to claim 1, wherein the obtaining applicant information and the application feature set corresponding to the applicant information according to the credit application data includes:
    根据所述信贷申请数据获取申请人信息;Obtain applicant information based on the credit application data;
    根据所述申请人信息,获取申请人的历史申请数据;及Obtain the applicant's historical application data based on the applicant information; and
    根据所述历史申请数据获得申请特征集。The application feature set is obtained according to the historical application data.
  3. 根据权利要求1所述的方法,其特征在于,所述将申请特征集输入预设孤立森林模型,通过预设孤立森林模型对申请特征集内特征进行异常检测,获取申请特征集对应的申请评分包括:The method according to claim 1, wherein the application feature set is input into a preset isolated forest model, and anomaly detection is performed on the features in the application feature set through the preset isolated forest model to obtain an application score corresponding to the application feature set include:
    将所述申请特征集输入预设孤立森林模型;Input the application feature set into a preset isolated forest model;
    通过所述预设森林模型中的孤立树对申请特征集内的各个申请特征进行异常检测,获取各分别得到每个申请特征对应的异常度;及Perform anomaly detection on each application feature in the application feature set through the isolated tree in the preset forest model, and obtain the anomalies corresponding to each application feature; and
    根据各申请特征的异常度,获取申请特征集对应的申请评分。According to the abnormality of each application feature, the application score corresponding to the application feature set is obtained.
  4. 根据权利要求1所述的方法,其特征在于,所述将所述申请特征集输入预设孤立森林模型,获取所述申请特征集对应的申请评分之前,包括:The method according to claim 1, wherein the inputting the application feature set into a preset isolated forest model and obtaining the application score corresponding to the application feature set includes:
    通过预设无标记的训练特征集构建孤立树;及Build an isolated tree by preset unmarked training feature sets; and
    根据所述孤立树构建预设的孤立森林模型。Construct a preset isolated forest model according to the isolated tree.
  5. 根据权利要求4所述的方法,其特征在于,所述通过预设无标记的训练特征集构建孤立树包括:The method according to claim 4, wherein the construction of the isolated tree by presetting the unmarked training feature set includes:
    从无标记的训练特征集中抽样出包含ψ个样本的样本特征集,作为孤立树的训练样本集;The sample feature set containing ψ samples is sampled from the unlabeled training feature set, which is used as the training sample set of the isolated tree;
    随机选取所述样本特征集内样本的单个特征;Randomly select individual features of the samples in the sample feature set;
    根据所述特征对所述样本特征集进行二叉划分,获得两个子特征集;Performing a binary division on the sample feature set according to the features to obtain two sub-feature sets;
    判断所述子特征集是否能再次划分,当所述子特征集能再次划分时,将 子特征集作为新的样本特征集返回根据所述特征对所述样本特征集进行二叉划分,获得两个子特征集的步骤;及Determine whether the sub-feature set can be divided again, and when the sub-feature set can be divided again, return the sub-feature set as a new sample feature set to binary divide the sample feature set according to the feature to obtain two Steps of a sub-feature set; and
    当所述子特征集内样本数等于1时,停止对所述子特征集的二叉划分。When the number of samples in the sub-feature set is equal to 1, the binary division of the sub-feature set is stopped.
  6. 根据权利要求5所述的方法,其特征在于,在根据所述特征对所述样本特征集进行二叉划分,获得两个子特征集之后,还包括:The method according to claim 5, characterized in that after performing a binary division on the sample feature set according to the feature to obtain two sub-feature sets, the method further comprises:
    将子特征集作为新的样本特征集返回根据所述特征对所述样本特征集进行二叉划分,获得两个子特征集的步骤,记录返回次数,当所述返回次数大于log2(ψ)-2时,停止对所述子特征集的二叉划分。Return the sub-feature set as a new sample feature set. Binary divide the sample feature set according to the feature to obtain two sub-feature sets. Record the number of returns. When the number of returns is greater than log2(ψ)-2 , The binary division of the sub-feature set is stopped.
  7. 根据权利要求6所述的方法,其特征在于,所述将所述申请特征集输入预设孤立森林模型,获取所述申请特征集对应的申请评分包括:The method according to claim 6, wherein the inputting the application feature set into a preset isolated forest model and obtaining the application score corresponding to the application feature set includes:
    获取所述申请特征集中申请特征;Obtain the application features in the application feature set;
    将对应的申请特征在所述预设孤立森林模型中的孤立树上运行,记录所述申请特征在所述运行过程中经过的路径长度;及Running the corresponding application feature on the isolated tree in the preset isolated forest model, and recording the path length of the application feature during the operation process; and
    根据所述申请特征集内所有申请特征对应的路径长度确定所述申请特征集的申请评分。The application score of the application feature set is determined according to the path length corresponding to all the application features in the application feature set.
  8. 一种异常申请检测装置,包括:An abnormal application detection device, including:
    数据获取模块,用于获取申请人的信贷申请数据;The data acquisition module is used to obtain the credit application data of the applicant;
    特征提取模块,用于根据所述信贷申请数据获取申请人信息以及与所述申请人信息对应的申请特征集;A feature extraction module, used to obtain applicant information and an application feature set corresponding to the applicant information according to the credit application data;
    评分计算模块,用于将所述申请特征集输入预设孤立森林模型,通过所述预设孤立森林模型对所述申请特征集内特征进行异常检测,获取所述申请特征集对应的申请评分,所述预设孤立森林模型基于无标记的训练特征集构建,用于通过申请特征集内各个特征在所述孤立森林模型的孤立树中对应路径的长度进行异常特征检测,并基于所述路径的长度对申请特征集进行评分;及A score calculation module, configured to input the application feature set into a preset isolated forest model, perform anomaly detection on the features in the application feature set through the preset isolated forest model, and obtain an application score corresponding to the application feature set, The preset isolated forest model is constructed based on the unmarked training feature set, which is used to perform abnormal feature detection by applying the features in the feature set to the corresponding path length in the isolated tree of the isolated forest model, and based on the path Length to score the application feature set; and
    异常申请判定模块,用于获取所述申请评分与预设基准值差值的绝对值,当所述差值的绝对值低于预设阈值时,判定所述信贷申请数据对应的申请为异常申请。The abnormal application determination module is used to obtain the absolute value of the difference between the application score and the preset reference value, and when the absolute value of the difference is lower than the preset threshold, the application corresponding to the credit application data is determined to be an abnormal application .
  9. 根据权利要求8所述的装置,其特征在于,所述特征提取模块具体用于:The device according to claim 8, wherein the feature extraction module is specifically configured to:
    根据所述信贷申请数据获取申请人信息;Obtain applicant information based on the credit application data;
    根据所述申请人信息,获取申请人的历史申请数据;According to the applicant information, obtain the applicant's historical application data;
    根据所述历史申请数据获得申请特征集。The application feature set is obtained according to the historical application data.
  10. 一种计算机设备,包括存储器及一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行 时,使得所述一个或多个处理器执行以下步骤:A computer device includes a memory and one or more processors. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the one or more processors, the one or more Each processor performs the following steps:
    获取申请人的信贷申请数据;Obtain the applicant's credit application data;
    根据所述信贷申请数据获取申请人信息以及与所述申请人信息对应的申请特征集;Obtaining applicant information and an application feature set corresponding to the applicant information according to the credit application data;
    将所述申请特征集输入预设孤立森林模型,通过所述预设孤立森林模型对所述申请特征集内特征进行异常检测,获取所述申请特征集对应的申请评分,所述预设孤立森林模型基于无标记的训练特征集构建,用于通过申请特征集内各个特征在所述孤立森林模型的孤立树中对应路径的长度进行异常特征检测,并基于所述路径的长度对申请特征集进行评分;及Input the application feature set into a preset isolated forest model, perform anomaly detection on the features in the application feature set through the preset isolated forest model, and obtain an application score corresponding to the application feature set, the preset isolated forest The model is built based on the unmarked training feature set, which is used to perform anomaly feature detection by applying the features in the feature set to the corresponding path length in the isolated tree of the isolated forest model, and based on the length of the path. Score; and
    获取所述申请评分与预设基准值差值的绝对值,当所述差值的绝对值低于预设阈值时,判定所述信贷申请数据对应的申请为异常申请。Obtain the absolute value of the difference between the application score and the preset reference value, and when the absolute value of the difference is lower than the preset threshold, determine that the application corresponding to the credit application data is an abnormal application.
  11. 根据权利要求10所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:The computer device according to claim 10, wherein the processor further executes the following steps when executing the computer-readable instructions:
    根据所述信贷申请数据获取申请人信息;Obtain applicant information based on the credit application data;
    根据所述申请人信息,获取申请人的历史申请数据;及Obtain the applicant's historical application data based on the applicant information; and
    根据所述历史申请数据获得申请特征集。The application feature set is obtained according to the historical application data.
  12. 根据权利要求10所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:The computer device according to claim 10, wherein the processor further executes the following steps when executing the computer-readable instructions:
    将所述申请特征集输入预设孤立森林模型;Input the application feature set into a preset isolated forest model;
    通过所述预设森林模型中的孤立树对申请特征集内的各个申请特征进行异常检测,获取各分别得到每个申请特征对应的异常度;及Perform anomaly detection on each application feature in the application feature set through the isolated tree in the preset forest model, and obtain the anomalies corresponding to each application feature; and
    根据各申请特征的异常度,获取申请特征集对应的申请评分。According to the abnormality of each application feature, the application score corresponding to the application feature set is obtained.
  13. 根据权利要求10所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:The computer device according to claim 10, wherein the processor further executes the following steps when executing the computer-readable instructions:
    通过预设无标记的训练特征集构建孤立树;及Build an isolated tree by preset unmarked training feature sets; and
    根据所述孤立树构建预设的孤立森林模型。Construct a preset isolated forest model according to the isolated tree.
  14. 根据权利要求13所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:The computer device according to claim 13, wherein the processor further executes the following steps when executing the computer-readable instructions:
    从无标记的训练特征集中抽样出包含ψ个样本的样本特征集,作为孤立树的训练样本集;The sample feature set containing ψ samples is sampled from the unlabeled training feature set, which is used as the training sample set of the isolated tree;
    随机选取所述样本特征集内样本的单个特征;Randomly select individual features of the samples in the sample feature set;
    根据所述特征对所述样本特征集进行二叉划分,获得两个子特征集;Performing a binary division on the sample feature set according to the features to obtain two sub-feature sets;
    判断所述子特征集是否能再次划分,当所述子特征集能再次划分时,将子特征集作为新的样本特征集返回根据所述特征对所述样本特征集进行二叉划分,获得两个子特征集的步骤;及Determine whether the sub-feature set can be divided again, and when the sub-feature set can be divided again, return the sub-feature set as a new sample feature set to binary divide the sample feature set according to the feature to obtain two Steps of a sub-feature set; and
    当所述子特征集内样本数等于1时,停止对所述子特征集的二叉划分。When the number of samples in the sub-feature set is equal to 1, the binary division of the sub-feature set is stopped.
  15. 根据权利要求14所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:The computer device according to claim 14, wherein the processor further executes the following steps when executing the computer-readable instructions:
    将子特征集作为新的样本特征集返回根据所述特征对所述样本特征集进行二叉划分,获得两个子特征集的步骤,记录返回次数,当所述返回次数大于log2(ψ)-2时,停止对所述子特征集的二叉划分。Return the sub-feature set as a new sample feature set. Binary divide the sample feature set according to the feature to obtain two sub-feature sets. Record the number of returns. When the number of returns is greater than log2(ψ)-2 , The binary division of the sub-feature set is stopped.
  16. 一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:One or more non-volatile computer-readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the following steps:
    获取申请人的信贷申请数据;Obtain the applicant's credit application data;
    根据所述信贷申请数据获取申请人信息以及与所述申请人信息对应的申请特征集;Obtaining applicant information and an application feature set corresponding to the applicant information according to the credit application data;
    将所述申请特征集输入预设孤立森林模型,通过所述预设孤立森林模型对所述申请特征集内特征进行异常检测,获取所述申请特征集对应的申请评分,所述预设孤立森林模型基于无标记的训练特征集构建,用于通过申请特征集内各个特征在所述孤立森林模型的孤立树中对应路径的长度进行异常特征检测,并基于所述路径的长度对申请特征集进行评分;及Input the application feature set into a preset isolated forest model, perform anomaly detection on the features in the application feature set through the preset isolated forest model, and obtain an application score corresponding to the application feature set, the preset isolated forest The model is built based on the unmarked training feature set, which is used to perform anomaly feature detection by applying the features in the feature set to the corresponding path length in the isolated tree of the isolated forest model, and based on the length of the path. Score; and
    获取所述申请评分与预设基准值差值的绝对值,当所述差值的绝对值低于预设阈值时,判定所述信贷申请数据对应的申请为异常申请。Obtain the absolute value of the difference between the application score and the preset reference value, and when the absolute value of the difference is lower than the preset threshold, determine that the application corresponding to the credit application data is an abnormal application.
  17. 根据权利要求16所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:The storage medium according to claim 16, wherein when the computer-readable instructions are executed by the processor, the following steps are further performed:
    根据所述信贷申请数据获取申请人信息;Obtain applicant information based on the credit application data;
    根据所述申请人信息,获取申请人的历史申请数据;及Obtain the applicant's historical application data based on the applicant information; and
    根据所述历史申请数据获得申请特征集。The application feature set is obtained according to the historical application data.
  18. 根据权利要求16所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:The storage medium according to claim 16, wherein when the computer-readable instructions are executed by the processor, the following steps are further performed:
    将所述申请特征集输入预设孤立森林模型;Input the application feature set into a preset isolated forest model;
    通过所述预设森林模型中的孤立树对申请特征集内的各个申请特征进行异常检测,获取各分别得到每个申请特征对应的异常度;及Perform anomaly detection on each application feature in the application feature set through the isolated tree in the preset forest model, and obtain the anomalies corresponding to each application feature; and
    根据各申请特征的异常度,获取申请特征集对应的申请评分。According to the abnormality of each application feature, the application score corresponding to the application feature set is obtained.
  19. 根据权利要求16所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:The storage medium according to claim 16, wherein when the computer-readable instructions are executed by the processor, the following steps are further performed:
    通过预设无标记的训练特征集构建孤立树;及Build an isolated tree by preset unmarked training feature sets; and
    根据所述孤立树构建预设的孤立森林模型。Construct a preset isolated forest model according to the isolated tree.
  20. 根据权利要求19所述的存储介质,其特征在于,所述计算机可读指 令被所述处理器执行时还执行以下步骤:The storage medium according to claim 19, wherein the computer readable instruction further executes the following steps when executed by the processor:
    从无标记的训练特征集中抽样出包含ψ个样本的样本特征集,作为孤立树的训练样本集;The sample feature set containing ψ samples is sampled from the unlabeled training feature set, which is used as the training sample set of the isolated tree;
    随机选取所述样本特征集内样本的单个特征;Randomly select individual features of the samples in the sample feature set;
    根据所述特征对所述样本特征集进行二叉划分,获得两个子特征集;Performing a binary division on the sample feature set according to the features to obtain two sub-feature sets;
    判断所述子特征集是否能再次划分,当所述子特征集能再次划分时,将子特征集作为新的样本特征集返回根据所述特征对所述样本特征集进行二叉划分,获得两个子特征集的步骤;及Determine whether the sub-feature set can be divided again, and when the sub-feature set can be divided again, return the sub-feature set as a new sample feature set to binary divide the sample feature set according to the feature to obtain two Steps of a sub-feature set; and
    当所述子特征集内样本数等于1时,停止对所述子特征集的二叉划分。When the number of samples in the sub-feature set is equal to 1, the binary division of the sub-feature set is stopped.
PCT/CN2019/123190 2019-01-04 2019-12-05 Abnormal application detection method and apparatus, and computer device and storage medium WO2020140678A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910007045.4 2019-01-04
CN201910007045.4A CN109859029A (en) 2019-01-04 2019-01-04 Abnormal application detection method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2020140678A1 true WO2020140678A1 (en) 2020-07-09

Family

ID=66893857

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/123190 WO2020140678A1 (en) 2019-01-04 2019-12-05 Abnormal application detection method and apparatus, and computer device and storage medium

Country Status (2)

Country Link
CN (1) CN109859029A (en)
WO (1) WO2020140678A1 (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111950853A (en) * 2020-07-14 2020-11-17 东南大学 Power running state white list generation method based on information physical bilateral data
CN112733897A (en) * 2020-12-30 2021-04-30 胜斗士(上海)科技技术发展有限公司 Method and equipment for determining abnormal reason of multi-dimensional sample data
CN112971762A (en) * 2021-02-07 2021-06-18 中国人民解放军总医院 Respiratory signal quality evaluation method
CN112971795A (en) * 2021-02-07 2021-06-18 中国人民解放军总医院 Electrocardiosignal quality evaluation method
CN113076350A (en) * 2021-03-02 2021-07-06 无锡先导智能装备股份有限公司 Welding abnormity detection method and device, computer equipment and storage medium
CN113283901A (en) * 2021-04-19 2021-08-20 河南大学 Byte code-based fraud contract detection method for block chain platform
CN113537642A (en) * 2021-08-20 2021-10-22 日月光半导体制造股份有限公司 Product quality prediction method, device, electronic equipment and storage medium
CN114240059A (en) * 2021-11-22 2022-03-25 中国建设银行股份有限公司 Resource online application processing method and device, computer equipment and storage medium
CN114611616A (en) * 2022-03-16 2022-06-10 吕少岚 Unmanned aerial vehicle intelligent fault detection method and system based on integrated isolated forest
CN114979369A (en) * 2022-04-14 2022-08-30 马上消费金融股份有限公司 Abnormal call detection method and device, electronic equipment and storage medium
CN116659826A (en) * 2022-08-23 2023-08-29 国家电投集团科学技术研究院有限公司 Method and device for detecting state of wind power bolt
CN117538491A (en) * 2024-01-09 2024-02-09 武汉怡特环保科技有限公司 Station room air quality intelligent monitoring method and system
CN111950853B (en) * 2020-07-14 2024-05-31 东南大学 Electric power running state white list generation method based on information physical bilateral data

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109859029A (en) * 2019-01-04 2019-06-07 深圳壹账通智能科技有限公司 Abnormal application detection method, device, computer equipment and storage medium
CN110378386A (en) * 2019-06-20 2019-10-25 平安科技(深圳)有限公司 Based on unmarked abnormality recognition method, device and the storage medium for having supervision
CN110322368A (en) * 2019-07-03 2019-10-11 厦门理工学院 A kind of harmonic data method for detecting abnormality, terminal device and storage medium
CN110288362A (en) * 2019-07-03 2019-09-27 北京工业大学 Brush single prediction technique, device and electronic equipment
CN110398375B (en) * 2019-07-16 2021-10-19 广州亚美信息科技有限公司 Method, device, equipment and medium for monitoring working state of vehicle cooling system
CN110806546B (en) * 2019-10-28 2022-03-08 腾讯科技(深圳)有限公司 Battery health assessment method and device, storage medium and electronic equipment
CN110991552B (en) * 2019-12-12 2021-03-12 支付宝(杭州)信息技术有限公司 Isolated forest model construction and prediction method and device based on federal learning
CN112990246B (en) * 2019-12-17 2022-09-09 杭州海康威视数字技术股份有限公司 Method and device for establishing isolated tree model
CN111340063B (en) * 2020-02-10 2023-08-29 国能信控互联技术有限公司 Data anomaly detection method for coal mill
CN111275547B (en) * 2020-03-19 2023-07-18 重庆富民银行股份有限公司 Wind control system and method based on isolated forest
CN111612040B (en) * 2020-04-24 2024-04-30 平安直通咨询有限公司上海分公司 Financial data anomaly detection method and related device based on isolated forest algorithm
CN111833172A (en) * 2020-05-25 2020-10-27 百维金科(上海)信息科技有限公司 Consumption credit fraud detection method and system based on isolated forest
CN113159923A (en) * 2021-04-29 2021-07-23 中国工商银行股份有限公司 Risk screening method and device
CN113961434A (en) * 2021-09-29 2022-01-21 西安交通大学 Method and system for monitoring abnormal behaviors of distributed block chain system users
CN114580580B (en) * 2022-05-07 2022-08-16 深圳索信达数据技术有限公司 Intelligent operation and maintenance abnormity detection method and device
CN117786543B (en) * 2024-02-28 2024-05-10 沂水友邦养殖服务有限公司 Digital broiler raising information storage management method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016200971A (en) * 2015-04-09 2016-12-01 キヤノン株式会社 Learning apparatus, identification apparatus, learning method, identification method and program
CN107357790A (en) * 2016-05-09 2017-11-17 阿里巴巴集团控股有限公司 A kind of unexpected message detection method, apparatus and system
CN107391569A (en) * 2017-06-16 2017-11-24 阿里巴巴集团控股有限公司 Identification, model training, Risk Identification Method, device and the equipment of data type
CN108549973A (en) * 2018-03-22 2018-09-18 中国平安人寿保险股份有限公司 Identification model is built and method, apparatus, storage medium and the terminal of assessment
CN109859029A (en) * 2019-01-04 2019-06-07 深圳壹账通智能科技有限公司 Abnormal application detection method, device, computer equipment and storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060218079A1 (en) * 2005-02-08 2006-09-28 Goldblatt Joel N Web-based consumer loan database with automated controls for preventing predatory lending practices
US8484132B1 (en) * 2012-06-08 2013-07-09 Lexisnexis Risk Solutions Fl Inc. Systems and methods for segmented risk scoring of identity fraud
CN106960358A (en) * 2017-01-13 2017-07-18 重庆小富农康农业科技服务有限公司 A kind of financial fraud behavior based on rural area electronic commerce big data deep learning quantifies detecting system
CN107886425A (en) * 2017-10-25 2018-04-06 上海壹账通金融科技有限公司 Credit evaluation method, apparatus, equipment and computer-readable recording medium
CN108038700A (en) * 2017-12-22 2018-05-15 上海前隆信息科技有限公司 A kind of anti-fraud data analysing method and system
CN108364224A (en) * 2018-01-12 2018-08-03 深圳壹账通智能科技有限公司 Credit risk joint control method, apparatus, equipment and readable storage medium storing program for executing
CN108985632A (en) * 2018-07-16 2018-12-11 国网上海市电力公司 A kind of electricity consumption data abnormality detection model based on isolated forest algorithm

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016200971A (en) * 2015-04-09 2016-12-01 キヤノン株式会社 Learning apparatus, identification apparatus, learning method, identification method and program
CN107357790A (en) * 2016-05-09 2017-11-17 阿里巴巴集团控股有限公司 A kind of unexpected message detection method, apparatus and system
CN107391569A (en) * 2017-06-16 2017-11-24 阿里巴巴集团控股有限公司 Identification, model training, Risk Identification Method, device and the equipment of data type
CN108549973A (en) * 2018-03-22 2018-09-18 中国平安人寿保险股份有限公司 Identification model is built and method, apparatus, storage medium and the terminal of assessment
CN109859029A (en) * 2019-01-04 2019-06-07 深圳壹账通智能科技有限公司 Abnormal application detection method, device, computer equipment and storage medium

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111950853A (en) * 2020-07-14 2020-11-17 东南大学 Power running state white list generation method based on information physical bilateral data
CN111950853B (en) * 2020-07-14 2024-05-31 东南大学 Electric power running state white list generation method based on information physical bilateral data
CN112733897A (en) * 2020-12-30 2021-04-30 胜斗士(上海)科技技术发展有限公司 Method and equipment for determining abnormal reason of multi-dimensional sample data
CN112971762B (en) * 2021-02-07 2023-04-18 中国人民解放军总医院 Respiratory signal quality evaluation method
CN112971762A (en) * 2021-02-07 2021-06-18 中国人民解放军总医院 Respiratory signal quality evaluation method
CN112971795A (en) * 2021-02-07 2021-06-18 中国人民解放军总医院 Electrocardiosignal quality evaluation method
CN112971795B (en) * 2021-02-07 2023-04-18 中国人民解放军总医院 Electrocardiosignal quality evaluation method
CN113076350A (en) * 2021-03-02 2021-07-06 无锡先导智能装备股份有限公司 Welding abnormity detection method and device, computer equipment and storage medium
CN113076350B (en) * 2021-03-02 2024-05-07 无锡先导智能装备股份有限公司 Welding abnormality detection method, welding abnormality detection device, computer device, and storage medium
CN113283901A (en) * 2021-04-19 2021-08-20 河南大学 Byte code-based fraud contract detection method for block chain platform
CN113537642A (en) * 2021-08-20 2021-10-22 日月光半导体制造股份有限公司 Product quality prediction method, device, electronic equipment and storage medium
CN114240059A (en) * 2021-11-22 2022-03-25 中国建设银行股份有限公司 Resource online application processing method and device, computer equipment and storage medium
CN114611616A (en) * 2022-03-16 2022-06-10 吕少岚 Unmanned aerial vehicle intelligent fault detection method and system based on integrated isolated forest
CN114611616B (en) * 2022-03-16 2023-02-07 吕少岚 Unmanned aerial vehicle intelligent fault detection method and system based on integrated isolated forest
CN114979369A (en) * 2022-04-14 2022-08-30 马上消费金融股份有限公司 Abnormal call detection method and device, electronic equipment and storage medium
CN116659826A (en) * 2022-08-23 2023-08-29 国家电投集团科学技术研究院有限公司 Method and device for detecting state of wind power bolt
CN116659826B (en) * 2022-08-23 2024-02-06 国家电投集团科学技术研究院有限公司 Method and device for detecting state of wind power bolt
CN117538491A (en) * 2024-01-09 2024-02-09 武汉怡特环保科技有限公司 Station room air quality intelligent monitoring method and system
CN117538491B (en) * 2024-01-09 2024-04-05 武汉怡特环保科技有限公司 Station room air quality intelligent monitoring method and system

Also Published As

Publication number Publication date
CN109859029A (en) 2019-06-07

Similar Documents

Publication Publication Date Title
WO2020140678A1 (en) Abnormal application detection method and apparatus, and computer device and storage medium
CN110489520B (en) Knowledge graph-based event processing method, device, equipment and storage medium
US9965531B2 (en) Data storage extract, transform and load operations for entity and time-based record generation
US20180253657A1 (en) Real-time credit risk management system
WO2019218699A1 (en) Fraud transaction determining method and apparatus, computer device, and storage medium
WO2020211299A1 (en) Data cleansing method
US11631032B2 (en) Failure feedback system for enhancing machine learning accuracy by synthetic data generation
US20210092160A1 (en) Data set creation with crowd-based reinforcement
WO2017133615A1 (en) Service parameter acquisition method and apparatus
Banerjee Population growth and endogenous technological change: Australian economic growth in the long run
CN111784392A (en) Abnormal user group detection method, device and equipment based on isolated forest
CN112035611B (en) Target user recommendation method, device, computer equipment and storage medium
CN114579584B (en) Data table processing method and device, computer equipment and storage medium
CN112418978A (en) Product recommendation method, device, equipment and medium
CN106844588A (en) A kind of analysis method and system of the user behavior data based on web crawlers
CN115630221A (en) Terminal application interface display data processing method and device and computer equipment
CN114693409A (en) Product matching method, device, computer equipment, storage medium and program product
CN114495137B (en) Bill abnormity detection model generation method and bill abnormity detection method
CN117313015A (en) Time sequence abnormality detection method and system based on time sequence and multiple variables
CN111291795A (en) Crowd characteristic analysis method and device, storage medium and computer equipment
CN110163722B (en) Big data analysis system and analysis method for accurate sale of agricultural products
Lee et al. Detecting anomaly teletraffic using stochastic self-similarity based on Hadoop
CN116155597A (en) Access request processing method and device and computer equipment
WO2019062013A1 (en) Electronic apparatus, user grouping method and system, and computer-readable storage medium
US20140324524A1 (en) Evolving a capped customer linkage model using genetic models

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19907781

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 20/10/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19907781

Country of ref document: EP

Kind code of ref document: A1