CN114282988A - Abnormal application identification method and equipment - Google Patents

Abnormal application identification method and equipment Download PDF

Info

Publication number
CN114282988A
CN114282988A CN202111609530.2A CN202111609530A CN114282988A CN 114282988 A CN114282988 A CN 114282988A CN 202111609530 A CN202111609530 A CN 202111609530A CN 114282988 A CN114282988 A CN 114282988A
Authority
CN
China
Prior art keywords
borrower
address
loan
group
administrative
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111609530.2A
Other languages
Chinese (zh)
Inventor
蔡远航
郑少杰
范增虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN202111609530.2A priority Critical patent/CN114282988A/en
Publication of CN114282988A publication Critical patent/CN114282988A/en
Priority to PCT/CN2022/100697 priority patent/WO2023123929A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Technology Law (AREA)
  • Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Educational Administration (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The application provides an abnormal application identification method and equipment. The method comprises the following steps: acquiring application address information of at least one borrower, comprising: loan party addresses and loan party addresses corresponding to different administrative areas; clustering at least one borrower according to the borrower address and the borrower address to obtain at least one borrower group; obtaining the different-place loan probability corresponding to the borrower group from a preset transfer matrix according to the borrower address and the borrower address corresponding to the borrower group, wherein the preset transfer matrix is used for indicating the different-place loan probability between different administrative areas; and determining a target borrower group with abnormal application from at least one borrower group according to the remote loan probability of the borrower group. The method and the system can identify the target borrower group of the abnormal loan by combining the abnormal loan probability among different administrative regions on the basis of address clustering. The data of different channels and manual processing are not needed, so that the identification cost is reduced, and the identification accuracy is improved.

Description

Abnormal application identification method and equipment
Technical Field
The embodiment of the application relates to the technical field of financial science and technology, in particular to an abnormal application identification method and equipment.
Background
With the development of computer technology, more and more technologies are applied in the financial field, the traditional financial industry is gradually changing to financial technology (Finteh), and the technology for recognizing abnormal applications of loans is not an exception, but because of the requirements of security and real-time performance of the financial industry, higher requirements are also put forward on the technologies. Applying for loans is a common scenario in the field of financial technology (Fintech). After the loan is applied, the loan is paid on time, and if the loan is not paid on time, economic loss is caused to the loan party. To avoid economic loss to the lender, it is desirable to identify which of the loan applications are abnormal applications.
In the prior art, there are two ways to identify an abnormal application. Fig. 1 is a schematic diagram of an abnormal application identification process provided in the prior art. Referring to fig. 1, firstly, inputting a plurality of loan application information into a clustering algorithm for clustering to obtain one or more loan party groups; the analyst then analyzes the lender groups using empirical knowledge to determine which lender groups are lender groups for which there is an abnormal application. The loan application information may include: basic information, login information, association information and loan information. The basic information includes: the age of the borrower, the income of the borrower. The login information may include: the model number of the device used for login and the network (IP) address of the device used for login. The association information may include: contact information and family member information. The loan information may include: loan amount and consumption address.
The second way of identifying an anomalous application in the prior art is by correlating maps. Specifically, first, basic information and loan transaction information of a large number of borrowers are acquired from various channels; then, analyzing information such as the identity, income, preference and the like of the lender, and marking a label on the lender; then, establishing a corresponding relation between a large number of labels and loan transaction information to form an associated map; and finally, identifying the abnormal application according to the associated map.
However, the first method requires an analyst to perform analysis, and the weights of the dimensions in the clustering process also need to be determined manually, which results in higher identification cost of the abnormal application. The second method described above requires a large amount of data from different channels as support. But in practical application, data of multiple channels are difficult to obtain, so that the obtained relation map is not comprehensive enough, and the identification accuracy of the abnormal application is reduced.
Disclosure of Invention
The application provides an abnormal application identification method and equipment, so that the identification cost of abnormal loans is reduced, and the identification accuracy is improved.
In a first aspect, the present application provides an abnormal application identification method, including:
obtaining application address information of at least one borrower through a deep learning model, wherein the application address information comprises: the loan address and the borrower address, wherein the administrative region corresponding to the loan address is different from the administrative region corresponding to the borrower address;
clustering the at least one borrower according to the borrower address and the borrower address to obtain at least one borrower group;
for each borrower group, acquiring the remote loan probability corresponding to the borrower group from a preset transfer matrix according to the borrower address and the borrower address corresponding to the borrower group, wherein the preset transfer matrix is used for indicating the remote loan probability between different administrative regions;
and determining a target borrower group with abnormal application from at least one borrower group according to the allopatric loan probability of the borrower group.
Optionally, the determining, according to the remote loan probability of the borrower group, a target borrower group with an abnormal application from at least one borrower group includes:
according to the remote loan probability of the borrower group and abnormal loan data which appears in the borrower group, determining a target borrower group with abnormal application from at least one borrower group, wherein the abnormal loan data comprises at least one of the following items: the total amount of the overdue loan of each borrower in the borrower group and the total number of the overdue loan days of each borrower in the borrower group.
Optionally, the determining, according to the remote loan probability of the borrower group and the abnormal loan data that has occurred in the borrower group, a target borrower group with an abnormal application from at least one borrower group includes:
determining the ratio of abnormal loan data which appears in the borrower group to the allopatric loan probability of the borrower group;
and determining a target borrower group with abnormal application from at least one borrower group according to the ratio.
Optionally, the allopatric loan probability between the different administrative areas is generated by:
carrying out weighting operation on at least one associated attribute between a first administrative area corresponding to the borrower address and a second administrative area corresponding to the borrower address to obtain the allopatric loan probability, wherein the at least one associated attribute comprises at least one of the following items: the distance between the first administrative area and the second administrative area, the ratio between the total production value of the second administrative area and the total production value of the first administrative area, and the proportion of non-overdue loans, of which borrower addresses belong to the first administrative area and the second administrative area, in off-site loans.
Optionally, the borrower address includes at least one hierarchical administrative region, and the obtaining of the address application information of at least one borrower through the deep learning model includes:
inputting the borrower address text of the borrower into a deep learning model to obtain the administrative region of the at least one level, wherein the deep learning model is obtained by training a preset training sample, and the training sample comprises at least one of the following items: sample address text, each character in the sample address text corresponding to a sample type of the character, the sample type being one of: a start character of an administrative region of one level, an end character of an administrative region of one level, and the remaining characters.
Optionally, the deep learning model comprises: the training system comprises an input layer, a bidirectional LSTM layer and a CRF layer, wherein in the training process, the input layer is used for receiving the sample address text, the bidirectional LSTM layer is used for processing the sample address text to obtain a vector, the CRF layer is used for predicting the prediction type of each character in the sample address text according to the vector, the sample type and the prediction type are used for determining a loss value, when the loss value meets a convergence condition, the training is finished, and when the loss value does not meet the convergence condition, the parameters of the deep learning model are adjusted according to the loss value to perform the next round of training.
Optionally, after the step of inputting the borrower address text of the borrower into the deep learning model and obtaining the administrative region of the at least one hierarchy, the method further includes:
if the administrative regions of partial hierarchy are missing in the administrative regions of at least one hierarchy, determining the missing administrative regions according to a preset administrative region tree, wherein the preset administrative region tree is used for representing the hierarchical relationship among the administrative regions; and/or the presence of a gas in the gas,
and if the administrative region of a part of the hierarchy is missing in the administrative region of at least one hierarchy, inputting the borrower address text into a third-party interface to obtain the missing administrative region.
In a second aspect, the present application provides an abnormal application identification apparatus, including:
the application address information acquisition module is used for acquiring application address information of at least one borrower through a deep learning model, and the application address information comprises: the loan address and the borrower address, wherein the administrative region corresponding to the loan address is different from the administrative region corresponding to the borrower address;
the address clustering module is used for clustering the at least one borrower according to the borrower address and the borrower address to obtain at least one borrower group;
the remote loan probability acquisition module is used for acquiring remote loan probabilities corresponding to the borrower groups from a preset transfer matrix according to the borrower addresses and the borrower addresses corresponding to the borrower groups aiming at each borrower group, wherein the preset transfer matrix is used for indicating the remote loan probabilities between different administrative regions;
and the abnormal recognition module is used for determining a target borrower group with abnormal application from at least one borrower group according to the allopatric loan probability of the borrower group.
Optionally, the anomaly identification module is further configured to:
according to the remote loan probability of the borrower group and abnormal loan data which appears in the borrower group, determining a target borrower group with abnormal application from at least one borrower group, wherein the abnormal loan data comprises at least one of the following items: the total amount of the overdue loan of each borrower in the borrower group and the total number of the overdue loan days of each borrower in the borrower group.
Optionally, the anomaly identification module is further configured to:
when a target borrower group with abnormal application is determined from at least one borrower group according to the remote loan probability of the borrower group and the abnormal loan data appeared in the borrower group, determining the ratio of the abnormal loan data appeared in the borrower group to the remote loan probability of the borrower group;
and determining a target borrower group with abnormal application from at least one borrower group according to the ratio.
Optionally, the allopatric loan probability between the different administrative areas is generated by the following modules:
the first loan probability generation module is used for performing weighting operation on at least one associated attribute between a first administrative area corresponding to a borrower address and a second administrative area corresponding to the borrower address to obtain the allopatric loan probability, wherein the at least one associated attribute comprises at least one of the following: the distance between the first administrative area and the second administrative area, the ratio between the total production value of the second administrative area and the total production value of the first administrative area, and the proportion of non-overdue loans, of which borrower addresses belong to the first administrative area and the second administrative area, in off-site loans.
Optionally, the borrower address includes at least one hierarchical administrative region, and the application address information obtaining module is further configured to:
inputting the borrower address text of the borrower into a deep learning model to obtain the administrative region of the at least one level, wherein the deep learning model is obtained by training a preset training sample, and the training sample comprises at least one of the following items: sample address text, each character in the sample address text corresponding to a sample type of the character, the sample type being one of: a start character of an administrative region of one level, an end character of an administrative region of one level, and the remaining characters.
Optionally, the deep learning model comprises: the training system comprises an input layer, a bidirectional LSTM layer and a CRF layer, wherein in the training process, the input layer is used for receiving the sample address text, the bidirectional LSTM layer is used for processing the sample address text to obtain a vector, the CRF layer is used for predicting the prediction type of each character in the sample address text according to the vector, the sample type and the prediction type are used for determining a loss value, when the loss value meets a convergence condition, the training is finished, and when the loss value does not meet the convergence condition, the parameters of the deep learning model are adjusted according to the loss value to perform the next round of training.
Optionally, the apparatus further comprises:
the administrative region completion module is used for determining a missing administrative region according to a preset administrative region tree which is used for representing the hierarchical relationship between the administrative regions if the administrative regions of partial levels are missing in the administrative regions of the at least one level after the borrower address text of the borrower is input into the deep learning model to obtain the administrative regions of the at least one level; and/or the presence of a gas in the gas,
and if the administrative region of a part of the hierarchy is missing in the administrative region of at least one hierarchy, inputting the borrower address text into a third-party interface to obtain the missing administrative region.
In a third aspect, the present application provides an electronic device, comprising: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executes the computer-executable instructions stored by the memory such that the electronic device implements the method of the first aspect as described above.
In a fourth aspect, the present application provides a computer-readable storage medium having stored thereon computer-executable instructions that, when executed by a processor, cause a computing device to implement the method of the first aspect as described above.
In a fifth aspect, the present application provides a computer program for implementing the method of the first aspect as described above.
The application provides an abnormal application identification method and equipment, and the method comprises the following steps: obtaining application address information of at least one borrower through a deep learning model, wherein the application address information comprises: the loan address and the borrower address are different, and an administrative region corresponding to the loan address is different from an administrative region corresponding to the borrower address; clustering at least one borrower according to the borrower address and the borrower address to obtain at least one borrower group; aiming at each borrower group, acquiring the remote loan probability corresponding to the borrower group from a preset transfer matrix according to the borrower address and the borrower address corresponding to the borrower group, wherein the preset transfer matrix is used for indicating the remote loan probability between different administrative regions; and determining a target borrower group with abnormal application from at least one borrower group according to the remote loan probability of the borrower group. According to the method and the device, the borrowers can be clustered according to the borrower addresses and the borrower addresses to obtain at least one borrower group, and then the target borrower group of the abnormal loan is identified by combining the abnormal loan probability between different administrative areas in the transfer matrix. The whole process does not need human processing, so that the identification cost is reduced. In addition, the embodiment of the application only needs the address of the lender and the address of the borrower, and does not need data of other channels, so that the problem of low recognition accuracy caused by the fact that data of more channels cannot be taken can be avoided.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.
FIG. 1 is a schematic diagram of an abnormal application identification process provided by the prior art;
FIG. 2 is a flowchart illustrating specific steps of an abnormal application identification method according to an embodiment of the present application;
FIG. 3 is a schematic structural diagram of a deep learning model provided in an embodiment of the present application;
fig. 4 is a block diagram of an abnormal application identification apparatus provided in an embodiment of the present application;
fig. 5 is a block diagram of an electronic device according to an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the application can be applied to loan scenes. The borrower may provide a loan application to the borrower, which may specify information required for the review. And the loan party audits the loan application, and the loan is successful after the audit is passed.
In order to avoid economic loss caused by the loan, the abnormal loan needs to be accurately identified. This identification process may be performed during an audit process, and when a loan is identified as an abnormal loan, the result of the audit may be determined to be an audit failure. The identification process may also be performed after the loan to inform the borrower to make the payment as soon as possible.
Fig. 2 is a flowchart illustrating specific steps of an abnormal application identification method according to an embodiment of the present application. Referring to fig. 2, the method may include:
s101: obtaining application address information of at least one borrower through a deep learning model, wherein the application address information comprises: the loan address and the borrower address are different, and the administrative region corresponding to the loan address is different from the administrative region corresponding to the borrower address.
Wherein the borrower address may include at least one of: the account address of the borrower, the living address of the borrower and the working address of the borrower.
For the lender address, it may include the administrative area where the lender is located. The administrative region where the lender is located may be determined according to an IP (internet protocol) address of an electronic device of the lender used when the lender submits a loan application. The correspondence between the administrative areas and the IP addresses is preset, and one administrative area may correspond to one or more IP addresses. For example, the borrower may need to go to the lender's offline service point and submit a loan application using the electronic equipment provided by the lender. The loan application may be input by the borrower himself or by the borrower's staff. In this case, the IP address of the lender's electronic device may be obtained, and after obtaining the IP address, the corresponding administrative area may be obtained. Tools are provided in the prior art for converting IP addresses to administrative areas, which can be called to convert IP addresses to administrative areas.
The borrower address is generally input when the borrower applies for a loan, and generally comprises an administrative region and a detailed address below the administrative region. And the administrative region corresponding to the borrower address. The administrative region in the borrower address can be obtained through a deep learning model. The borrower enters an address text that includes province, city and county (or district), street and cell. For example, the borrower address text may be "XXXX city XXXXXX street XXXX cell" XXXX province. At this time, the administrative area may be identified from the address text.
It can be seen that the above administrative regions are hierarchical. For example, a province is a first hierarchy, a city is a second hierarchy, a county (or district) is a third hierarchy, a province may include one or more cities, and a city may include one or more counties.
The embodiment of the application can identify at least one level of administrative region from the borrower address text through a deep learning model.
Specifically, the borrower address text of the borrower can be input into the deep learning model, and at least one level of administrative region is obtained. The deep learning model is obtained by training a large number of preset training samples, wherein the training samples comprise at least one of the following items: sample address text, each character in the sample address text corresponds to a sample type of the character, and the sample type is one of the following items: a start character of an administrative region of one level, an end character of an administrative region of one level, and the remaining characters.
The sample address text is similar to the borrower address text in content, except that each character in the sample address text corresponds to a sample type. For example, multiple levels of administrative regions include: in province, CITY and COUNTY, the start character and the end character of the province are "B-PROV" and "E-PROV", respectively, the start character and the end character of the CITY are "B-CITY" and "E-CITY", respectively, the start character and the end character of the COUNTY are "B-COUNTY" and "E-COUNTY", respectively, and the remaining characters are "O". Thus, one sample address text can be "X \ B-PROV \ X \ O \ province \ E-PROV \ X \ B-CITY \ X \ O \ City \ E-CITY \ X \ B-COUNTY \ X \ O \ COUNTY \ E-COUNTY \ X \ O \ X \ O \ street \ O \ X \ O \ X \ O \ X \ O \ X \ O \ X \ O \ X \ O \ X \ O \ X \ O \.
The deep learning model may be any existing deep learning model, and the embodiment of the present application does not limit the deep learning model. A deep learning model with high recognition accuracy is obtained through multiple tests of the embodiment of the application. Fig. 3 is a schematic structural diagram of a deep learning model provided in an embodiment of the present application. Referring to fig. 3, the deep learning model may include: an input layer, a bidirectional LSTM (long short term memory) layer and a CRF layer.
The CRF layer is used for predicting the prediction type of each character in the sample address text according to the vector.
Of course, in application, the input layer is used for receiving the borrower address text, the bidirectional LSTM layer is used for processing the borrower address text to obtain a vector, and the CRF (conditional random field) layer is used for predicting the type of each character in the borrower address text according to the vector, so that administrative regions of various levels can be extracted from the borrower address text according to the type.
The prediction type of each character corresponds to one of the sample types of the labels in the sample address text, and similarly, the type of each character corresponds to one of the sample types of the labels in the sample address text. But the prediction type and sample type for the same character may be the same or different.
The process of training the deep learning model through the training samples may include multiple iterations. In each iteration, a group of training samples can be input into the deep learning model to obtain the prediction type of each character in each training sample, and then the prediction type of each character and the sample type of each character in the group of training samples are input into the loss function to obtain a loss value; and finally, determining whether the loss value meets a convergence condition, finishing the training when the loss value meets the convergence condition, and adjusting the parameters of the deep learning model according to the loss value to perform the next round of training when the loss value does not meet the convergence condition.
The above loss function may be a loss function commonly used in the prior art, such as a cross entropy loss function, an absolute value loss function, and a sum of squares loss function.
The above-mentioned loss function satisfying the convergence condition may include, but is not limited to: the loss value is less than or equal to the preset loss value threshold, and the loss value after multiple iterations is not reduced.
It can be seen that in an ideal case, if the borrower address text entered by the borrower includes the administrative regions of each level, the administrative regions identified from the borrower address text include all levels.
However, in practical applications, the borrower address text input by the borrower may lack information of a part of the administrative region, so that a part of the hierarchy of the administrative region that is obtained by identifying the borrower address text is missing. This may result in inaccurate subsequent processing based on the borrower address. In order to improve the accuracy of the subsequent processing, the administrative area of the borrower address needs to be completed.
Specifically, if a part of hierarchical administrative regions are missing in at least one hierarchical administrative region, determining the missing administrative regions according to a preset administrative region tree; and/or inputting the borrower address text into a third-party interface to obtain the missing administrative regions, wherein the preset administrative region tree is used for representing the hierarchical relationship among the administrative regions.
The preset administrative region tree is a tree structure, and the tree structure comprises hierarchical relations among all administrative regions. The nodes in the administrative region tree form a parent-child relationship, and the administrative region of the child node belongs to the administrative region of only one parent node. Therefore, if a low-level administrative region exists in at least one hierarchical level of administrative regions but a high-level administrative region is absent, a parent node can be determined according to a node corresponding to the low-level administrative region, so that the administrative region corresponding to the parent node is determined as the high-level administrative region.
However, if the preset region tree cannot fill up the administrative region, the third-party interface may be invoked to determine the missing administrative region according to the refined address in the borrower address text. For example, the county to which the country belongs may be determined from "cell or street".
S102: and clustering at least one borrower according to the borrower address and the borrower address to obtain at least one borrower group.
Specifically, the loan party address is converted into longitude and latitude coordinates, the borrowing party address is converted into longitude and latitude coordinates, and then the borrower is clustered according to the longitude and latitude coordinates of the loan party and the longitude and latitude coordinates of the borrower.
It can be understood that the clustering may adopt an existing clustering algorithm, and when the existing clustering algorithm can only perform clustering according to one dimension, the clustering algorithm is called for clustering for many times. For example, a clustering algorithm is called first to cluster at least one borrower according to longitude and latitude coordinates of a borrower address to obtain at least one first borrower group; and aiming at each first borrower group, calling a clustering algorithm to cluster each borrower in the first borrower group according to the longitude and latitude coordinates of the borrower address to obtain at least one second borrower group, wherein each second borrower group of each first borrower group is the at least one borrower group obtained in the S102.
When the borrower address comprises: when the residence address, the account address, and the working address of the borrower are used, the process of calling the clustering algorithm to cluster each borrower in the first borrower group according to the longitude and latitude coordinates of the borrower address to obtain at least one second borrower group may include: firstly, calling a clustering algorithm to cluster each borrower in the first borrower group according to longitude and latitude coordinates of a residential address to obtain at least one first subgroup; then, calling a clustering algorithm to cluster each borrower in each first subgroup according to the longitude and latitude coordinates of the account address to obtain at least one second subgroup; and finally, calling a clustering algorithm to cluster the borrowers in each second subgroup according to the longitude and latitude coordinates of the working address to obtain at least one third subgroup. Thus, each third subgroup is a second borrower group.
Of course, the clustering sequence among the lender address, the resident address of the lender, the account address, and the working address can be flexibly adjusted, and the method is not limited in the embodiment of the present application.
It is understood that, through the above clustering at S102, the borrowers in each borrower group have the same borrower address (referred to as the borrower address of the borrower group) and the same lender address (referred to as the lender address of the borrower group).
S103: and aiming at each borrower group, acquiring the allopatric loan probability corresponding to the borrower group from a preset transfer matrix according to the borrower address and the borrower address corresponding to the borrower group, wherein the preset transfer matrix is used for indicating the allopatric loan probability between different administrative regions.
The value of the mth row and the nth column of the preset transfer matrix can be the allopatric loan probability from the mth administrative area to the nth administrative area.
It is understood that the greater the off-site loan probability, which is generally determined by the attributes between two administrative areas, which may include but is not limited to: distance, GPD (gross domestic product) gap, unexpired loan proportion. If the distance is smaller, the GDP gap is larger, the non-overdue loan proportion is larger, and the probability of the non-local loan is larger. Therefore, the less the probability of off-site loan, the more likely the borrower group will have an abnormal loan.
The distance may be a length of a drivable route between two administrative areas, and is not a straight-line distance between the two administrative areas.
Specifically, the remote loan probability between different administrative areas may be generated by the following steps:
and performing weighting operation on at least one correlation attribute between a first administrative area corresponding to the borrower address and a second administrative area corresponding to the borrower address to obtain the allopatric loan probability from the first administrative area to the second administrative area. The at least one association attribute includes at least one of: the distance between the first administrative area and the second administrative area, the ratio between the total production value of the second administrative area and the total production value of the first administrative area, and the proportion of non-overdue loans, of which the borrower address belongs to the first administrative area and the borrower address belongs to the second administrative area, in the off-site loans.
The remote loan refers to a loan application that the borrower address and the lender address belong to different administrative areas.
The proportion may be a quantity proportion or a total loan proportion.
It is understood that the above steps may be performed periodically, for example, once a year or a month, each time using loan information for all administrative areas in the current period.
The administrative areas in the transition matrix may be all administrative areas at any level. Of course, if the hierarchy is lower, the number of administrative regions is greater, the transfer matrix is larger, and the accuracy is higher. So that the transition matrix can be generated from county or district.
After the transition matrix is obtained, the allopatric loan probability of any borrower group can be obtained from the transition matrix. Specifically, firstly, determining an administrative region corresponding to the borrower address of the borrower group as a borrower administrative region, and determining an administrative region corresponding to the borrower address of the borrower group as a loan administrative region; then, the remote loan probability of the behavior borrower administrative region which is listed as the lender administrative region is obtained from the transition matrix and is used as the remote loan probability of the borrower group.
S104: and determining a target borrower group with abnormal application from at least one borrower group according to the remote loan probability of the borrower group.
The determination strategy of the target borrower group can comprise a plurality of strategies.
In the first strategy, the borrower group with the minimum or smaller allopatric loan probability can be determined as the target borrower group.
In the second strategy, the remote loan probability and the rest information are combined to determine a target borrower group. Specifically, a target borrower group with abnormal application is determined from at least one borrower group according to the remote loan probability of the borrower group and abnormal loan data which occurs in the borrower group, wherein the abnormal loan data comprises at least one of the following items: the total amount of the overdue loan of each borrower in the borrower group and the total number of the overdue loan days of each borrower in the borrower group.
The borrower group with the smaller probability of the remote loan, the larger total overdue loan amount and the larger total overdue loan days is the target borrower group.
In one example, the borrower groups may be sorted in a general descending order according to the reciprocal of the remote loan probability, the total amount of the loan that has expired, and the total number of days of the loan that has expired. So that the top ranked borrower group can be determined as the target borrower group.
In another example, for each borrower group, determining the ratio of abnormal loan data which occurs in the borrower group to the remote loan probability of the borrower group; then, a target borrower group with abnormal application is determined from at least one borrower group according to the ratio.
Specifically, when the occurred allopatric loan data is the total overdue loan amount of each borrower in the borrower group and the total overdue loan days of each borrower in the borrower group, determining the ratio of the abnormal loan data occurred in the borrower group to the allopatric loan probability of the borrower group may include: firstly, determining the product of the overdue total loan amount of each borrower in the borrower group and the overdue total loan days of each borrower in the borrower group; then, the ratio of the product to the remote loan probability of the borrower group is determined.
After obtaining the ratio, one or more borrower groups with larger ratios can be determined as the target borrower group.
After the target borrower group is obtained, the loan application of the target borrower group can be prevented from being approved, and the target borrower group which has already approved the loan application can be prompted, including but not limited to: telephone or short message prompt.
Fig. 4 is a block diagram of an abnormal application identification apparatus provided in the embodiment of the present application, corresponding to the abnormal application identification method in the foregoing embodiment. For convenience of explanation, only portions related to the embodiments of the present application are shown. Referring to fig. 4, the abnormal application recognition apparatus 200 includes: the system comprises an application address information acquisition module 201, an address clustering module 202, a different-place loan probability acquisition module 203 and an abnormality identification module 204.
An application address information obtaining module 201, configured to obtain application address information of at least one borrower through a deep learning model, where the application address information includes: the loan system comprises a loan party address and a borrower address, wherein an administrative region corresponding to the loan party address is different from an administrative region corresponding to the borrower address.
And the address clustering module 202 is configured to cluster the at least one borrower according to the borrower address and the borrower address to obtain at least one borrower group.
The remote loan probability obtaining module 203 is configured to obtain, for each borrower group, a remote loan probability corresponding to the borrower group from a preset transfer matrix according to a borrower address and a borrower address corresponding to the borrower group, where the preset transfer matrix is used to indicate remote loan probabilities between different administrative regions.
And the abnormal recognition module 204 is used for determining a target borrower group with abnormal application from at least one borrower group according to the remote loan probability of the borrower group.
Optionally, the anomaly identification module 204 is further configured to:
according to the remote loan probability of the borrower group and abnormal loan data which appears in the borrower group, determining a target borrower group with abnormal application from at least one borrower group, wherein the abnormal loan data comprises at least one of the following items: the total amount of the overdue loan of each borrower in the borrower group and the total number of the overdue loan days of each borrower in the borrower group.
Optionally, the anomaly identification module 204 is further configured to:
when a target borrower group with abnormal application is determined from at least one borrower group according to the remote loan probability of the borrower group and the abnormal loan data appeared in the borrower group, determining the ratio of the abnormal loan data appeared in the borrower group to the remote loan probability of the borrower group; and determining a target borrower group with abnormal application from at least one borrower group according to the ratio.
Optionally, the allopatric loan probability between the different administrative areas is generated by the following modules:
the first loan probability generation module is used for performing weighting operation on at least one associated attribute between a first administrative area corresponding to a borrower address and a second administrative area corresponding to the borrower address to obtain the allopatric loan probability, wherein the at least one associated attribute comprises at least one of the following: the distance between the first administrative area and the second administrative area, the ratio between the total production value of the second administrative area and the total production value of the first administrative area, and the proportion of non-overdue loans, of which borrower addresses belong to the first administrative area and the second administrative area, in off-site loans.
Optionally, the borrower address includes at least one hierarchical administrative region, and the application address information obtaining module 201 is further configured to:
inputting the borrower address text of the borrower into a deep learning model to obtain the administrative region of the at least one level, wherein the deep learning model is obtained by training a preset training sample, and the training sample comprises at least one of the following items: sample address text, each character in the sample address text corresponding to a sample type of the character, the sample type being one of: a start character of an administrative region of one level, an end character of an administrative region of one level, and the remaining characters.
Optionally, the deep learning model comprises: the training system comprises an input layer, a bidirectional LSTM layer and a CRF layer, wherein in the training process, the input layer is used for receiving the sample address text, the bidirectional LSTM layer is used for processing the sample address text to obtain a vector, the CRF layer is used for predicting the prediction type of each character in the sample address text according to the vector, the sample type and the prediction type are used for determining a loss value, when the loss value meets a convergence condition, the training is finished, and when the loss value does not meet the convergence condition, the parameters of the deep learning model are adjusted according to the loss value to perform the next round of training.
Optionally, the apparatus further comprises:
the administrative region completion module is used for determining a missing administrative region according to a preset administrative region tree which is used for representing the hierarchical relationship between the administrative regions if the administrative regions of partial levels are missing in the administrative regions of the at least one level after the borrower address text of the borrower is input into the deep learning model to obtain the administrative regions of the at least one level; and/or if the administrative region of a part of hierarchies is missing in the administrative region of at least one hierarchy, inputting the borrower address text into a third-party interface to obtain the missing administrative region.
Fig. 5 is a block diagram of an electronic device according to an embodiment of the present disclosure. The electronic device 600 comprises a memory 602 and at least one processor 601.
The memory 602 stores, among other things, computer-executable instructions. The at least one processor 601 executes computer-executable instructions stored by the memory 602 to cause the electronic device 601 to implement the method of fig. 2.
In addition, the electronic device may further include a receiver 603 and a transmitter 604, the receiver 603 being configured to receive information from the remaining apparatuses or devices and forward the information to the processor 601, and the transmitter 604 being configured to transmit the information to the remaining apparatuses or devices.
An embodiment of the present application further provides a computer-readable storage medium, where a computer executable instruction is stored in the computer-readable storage medium, and when the processor executes the computer executable instruction, the computing device is caused to implement the method described in the foregoing fig. 2.
The embodiment of the present application further provides a computer program, where the computer program is used to implement the method described in the foregoing fig. 2.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims (10)

1. An abnormal application identification method, characterized in that the method comprises:
obtaining application address information of at least one borrower through a deep learning model, wherein the application address information comprises: the loan address and the borrower address, wherein the administrative region corresponding to the loan address is different from the administrative region corresponding to the borrower address;
clustering the at least one borrower according to the borrower address and the borrower address to obtain at least one borrower group;
for each borrower group, acquiring the remote loan probability corresponding to the borrower group from a preset transfer matrix according to the borrower address and the borrower address corresponding to the borrower group, wherein the preset transfer matrix is used for indicating the remote loan probability between different administrative regions;
and determining a target borrower group with abnormal application from at least one borrower group according to the allopatric loan probability of the borrower group.
2. The method according to claim 1, wherein said determining a target borrower group with abnormal application from at least one borrower group according to the allopatric loan probability of the borrower group comprises:
according to the remote loan probability of the borrower group and abnormal loan data which appears in the borrower group, determining a target borrower group with abnormal application from at least one borrower group, wherein the abnormal loan data comprises at least one of the following items: the total amount of the overdue loan of each borrower in the borrower group and the total number of the overdue loan days of each borrower in the borrower group.
3. The method according to claim 2, wherein said determining a target borrower group with abnormal applications from at least one borrower group according to the offsite loan probability of the borrower group and the abnormal loan data already occurred in the borrower group comprises:
determining the ratio of abnormal loan data which appears in the borrower group to the allopatric loan probability of the borrower group;
and determining a target borrower group with abnormal application from at least one borrower group according to the ratio.
4. A method according to any one of claims 1 to 3, wherein the allopatric loan probability between different administrative areas is generated by:
carrying out weighting operation on at least one associated attribute between a first administrative area corresponding to the borrower address and a second administrative area corresponding to the borrower address to obtain the allopatric loan probability, wherein the at least one associated attribute comprises at least one of the following items: the distance between the first administrative area and the second administrative area, the ratio between the total production value of the second administrative area and the total production value of the first administrative area, and the proportion of non-overdue loans, of which borrower addresses belong to the first administrative area and the second administrative area, in off-site loans.
5. The method according to any one of claims 1 to 3, wherein the borrower address comprises at least one hierarchical administrative region, and the obtaining of the address application information of at least one borrower through the deep learning model comprises:
inputting the borrower address text of the borrower into a deep learning model to obtain the administrative region of the at least one level, wherein the deep learning model is obtained by training a preset training sample, and the training sample comprises at least one of the following items: sample address text, each character in the sample address text corresponding to a sample type of the character, the sample type being one of: a start character of an administrative region of one level, an end character of an administrative region of one level, and the remaining characters.
6. The method of claim 5, wherein the deep learning model comprises: the training system comprises an input layer, a bidirectional LSTM layer and a CRF layer, wherein in the training process, the input layer is used for receiving the sample address text, the bidirectional LSTM layer is used for processing the sample address text to obtain a vector, the CRF layer is used for predicting the prediction type of each character in the sample address text according to the vector, the sample type and the prediction type are used for determining a loss value, when the loss value meets a convergence condition, the training is finished, and when the loss value does not meet the convergence condition, the parameters of the deep learning model are adjusted according to the loss value to perform the next round of training.
7. The method of claim 5, wherein entering the borrower address text of the borrower into the deep learning model, after obtaining the administrative domain of the at least one level, further comprises:
if the administrative regions of partial hierarchy are missing in the administrative regions of at least one hierarchy, determining the missing administrative regions according to a preset administrative region tree, wherein the preset administrative region tree is used for representing the hierarchical relationship among the administrative regions; and/or the presence of a gas in the gas,
and if the administrative region of a part of the hierarchy is missing in the administrative region of at least one hierarchy, inputting the borrower address text into a third-party interface to obtain the missing administrative region.
8. An abnormal application recognition apparatus, comprising:
the application address information acquisition module is used for acquiring application address information of at least one borrower through a deep learning model, and the application address information comprises: the loan address and the borrower address, wherein the administrative region corresponding to the loan address is different from the administrative region corresponding to the borrower address;
the address clustering module is used for clustering the at least one borrower according to the borrower address and the borrower address to obtain at least one borrower group;
the remote loan probability acquisition module is used for acquiring remote loan probabilities corresponding to the borrower groups from a preset transfer matrix according to the borrower addresses and the borrower addresses corresponding to the borrower groups aiming at each borrower group, wherein the preset transfer matrix is used for indicating the remote loan probabilities between different administrative regions;
and the abnormal recognition module is used for determining a target borrower group with abnormal application from at least one borrower group according to the allopatric loan probability of the borrower group.
9. An electronic device, characterized in that the electronic device comprises: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executing the computer-executable instructions stored by the memory causes the electronic device to implement the method of any of claims 1-7.
10. A computer-readable storage medium having computer-executable instructions stored thereon, which, when executed by a processor, cause a computing device to implement the method of any one of claims 1 to 7.
CN202111609530.2A 2021-12-27 2021-12-27 Abnormal application identification method and equipment Pending CN114282988A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111609530.2A CN114282988A (en) 2021-12-27 2021-12-27 Abnormal application identification method and equipment
PCT/CN2022/100697 WO2023123929A1 (en) 2021-12-27 2022-06-23 Abnormal application recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111609530.2A CN114282988A (en) 2021-12-27 2021-12-27 Abnormal application identification method and equipment

Publications (1)

Publication Number Publication Date
CN114282988A true CN114282988A (en) 2022-04-05

Family

ID=80875893

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111609530.2A Pending CN114282988A (en) 2021-12-27 2021-12-27 Abnormal application identification method and equipment

Country Status (2)

Country Link
CN (1) CN114282988A (en)
WO (1) WO2023123929A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023123929A1 (en) * 2021-12-27 2023-07-06 深圳前海微众银行股份有限公司 Abnormal application recognition method and device

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10043213B2 (en) * 2012-07-03 2018-08-07 Lexisnexis Risk Solutions Fl Inc. Systems and methods for improving computation efficiency in the detection of fraud indicators for loans with multiple applicants
CN107578331B (en) * 2017-09-19 2018-08-24 马上消费金融股份有限公司 The method and system of risk monitoring and control after a kind of loan
CN109711975A (en) * 2018-11-27 2019-05-03 深圳市买买提信息科技有限公司 A kind of debt-credit Risk Identification Method and device
US11379821B2 (en) * 2019-08-30 2022-07-05 Comenity Llc Replacing a customer card payment with a one-time loan at a point of sale
CN111369342B (en) * 2020-03-05 2023-10-17 中国建设银行股份有限公司 Loan approval method, device, equipment and storage medium based on machine learning
CN114282988A (en) * 2021-12-27 2022-04-05 深圳前海微众银行股份有限公司 Abnormal application identification method and equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023123929A1 (en) * 2021-12-27 2023-07-06 深圳前海微众银行股份有限公司 Abnormal application recognition method and device

Also Published As

Publication number Publication date
WO2023123929A1 (en) 2023-07-06

Similar Documents

Publication Publication Date Title
US11586992B2 (en) Travel plan recommendation method, apparatus, device and computer readable storage medium
CN111444951B (en) Sample recognition model generation method, device, computer equipment and storage medium
CN111709244A (en) Deep learning method for identifying causal relationship of contradictory dispute events
CN114445143A (en) Service data prediction method, device, equipment and medium
CN113886372A (en) User portrait construction method based on improved analytic hierarchy process
US9201934B2 (en) Interactive data mining
CN114282988A (en) Abnormal application identification method and equipment
CN110689359A (en) Method and device for dynamically updating model
CN111126422B (en) Method, device, equipment and medium for establishing industry model and determining industry
CN117235608B (en) Risk detection method, risk detection device, electronic equipment and storage medium
CN112950347B (en) Resource data processing optimization method and device, storage medium and terminal
CN112836750A (en) System resource allocation method, device and equipment
CN116992880A (en) Building name identification method, device, electronic equipment and storage medium
CN108830302B (en) Image classification method, training method, classification prediction method and related device
CN116402630A (en) Financial risk prediction method and system based on characterization learning
CN106779181A (en) Method is recommended by a kind of medical institutions based on linear regression factor Non-negative Matrix Factorization model
CN111274301A (en) Intelligent management method and system based on data assets
CN111091460A (en) Data processing method and device
CN115577797A (en) Local noise perception-based federated learning optimization method and system
CN114064269A (en) Address matching method and device and terminal equipment
CN113344645A (en) House price prediction method and device and electronic equipment
Li et al. Identifying urban form typologies in seoul with mixture model based clustering
CN115209441A (en) Method, device, equipment and storage medium for predicting base station out-of-service alarm
CN114462391B (en) Nested entity identification method and system based on contrast learning
CN108520042B (en) System and method for realizing suspect case-involved role calibration and role evaluation in detection work

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination