WO2023123929A1 - Abnormal application recognition method and device - Google Patents

Abnormal application recognition method and device Download PDF

Info

Publication number
WO2023123929A1
WO2023123929A1 PCT/CN2022/100697 CN2022100697W WO2023123929A1 WO 2023123929 A1 WO2023123929 A1 WO 2023123929A1 CN 2022100697 W CN2022100697 W CN 2022100697W WO 2023123929 A1 WO2023123929 A1 WO 2023123929A1
Authority
WO
WIPO (PCT)
Prior art keywords
borrower
address
group
administrative
abnormal
Prior art date
Application number
PCT/CN2022/100697
Other languages
French (fr)
Chinese (zh)
Inventor
蔡远航
郑少杰
范增虎
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2023123929A1 publication Critical patent/WO2023123929A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Definitions

  • the embodiments of the present application relate to the technical field of financial technology, and in particular to a method and device for identifying abnormal applications.
  • FIG. 1 is a schematic diagram of an abnormal application identification process provided by the prior art.
  • the loan application information may include: basic information, login information, associated information and loan information.
  • Basic information includes: age of the borrower, income of the borrower.
  • the login information may include: the model of the equipment used for login and the network (IP, Internet protocol) address of the equipment used for login.
  • the associated information may include: contact information and family member information.
  • loan information may include: loan amount and consumption address.
  • the second way to identify abnormal applications in the prior art is through correlation graphs. Specifically, firstly, obtain a large number of borrowers' basic information and loan transaction information from various channels; then, analyze the lender's identity, income, preference and other information, and label the lender; then, establish a large number of labels and loan transactions Correspondence between information forms an association map; finally, abnormal applications are identified according to the association map.
  • the above-mentioned first method requires analysts to conduct analysis, and the weights of each dimension in the clustering process also need to be manually determined, resulting in high identification costs for abnormal applications.
  • the second method above requires a large amount of data from different channels as support. However, it is difficult to obtain multi-channel data in practical applications, resulting in an incomplete relationship map, which reduces the accuracy of identifying abnormal applications.
  • the present application provides a method and equipment for identifying abnormal applications, so as to reduce the identification cost of abnormal loans and improve the identification accuracy.
  • the present application provides a method for identifying abnormal applications, the method comprising:
  • the application address information includes: the address of the lender and the address of the borrower, and the administrative area corresponding to the address of the lender is different from the administrative area corresponding to the address of the borrower ;
  • the probability of a foreign loan corresponding to the borrower group is obtained from a preset transfer matrix, and the preset transfer matrix is used Indicates the probability of inter-regional loans between different administrative regions;
  • a target borrower group with abnormal applications is determined from at least one borrower group according to the off-site loan probability of the borrower group.
  • the determining the target borrower group with abnormal application from at least one borrower group according to the probability of the borrower group's off-site loan includes:
  • the off-site loan probability of the borrower group and the abnormal loan data that have appeared in the borrower group determine the target borrower group that has an abnormal application from at least one borrower group, and the abnormal loan data includes at least one of the following Item: the total overdue loan amount of each borrower in the borrower group, and the total number of days overdue loans of each borrower in the borrower group.
  • the determining the target borrower group with abnormal application from at least one borrower group according to the probability of off-site loans of the borrower group and the abnormal loan data that have appeared in the borrower group includes:
  • a target borrower group with abnormal applications is determined from at least one borrower group according to the ratio.
  • the inter-regional loan probability between different administrative regions is generated through the following steps:
  • the at least one associated attribute includes at least one of the following : the distance between the first administrative region and the second administrative region, the ratio between the gross production value of the second administrative region and the gross production value of the first administrative region, the address of the borrower belongs to the The proportion of non-overdue loans in the above-mentioned first administrative region and the address of the lender belonging to the above-mentioned second administrative region among the non-local loans.
  • the address of the borrower includes at least one level of administrative region
  • the acquisition of the application address information of at least one borrower through a deep learning model includes:
  • the deep learning model is obtained through training with preset training samples, and the training samples include at least one of the following Item: sample address text, each character in the sample address text corresponds to the sample type of the character, and the sample type is one of the following: the start character of a level of administrative area, the end of a level of administrative area characters, other characters.
  • the deep learning model includes: an input layer, a bidirectional LSTM layer, and a CRF layer.
  • the input layer is used to receive the sample address text
  • the bidirectional LSTM layer is used to process the sample address text.
  • the address text is processed to obtain a vector
  • the CRF layer is used to predict the prediction type of each character in the sample address text according to the vector
  • the sample type and the prediction type are used to determine a loss value.
  • the loss When the value satisfies the convergence condition, the training ends, and when the loss value does not satisfy the convergence condition, the parameters of the deep learning model are adjusted according to the loss value to perform the next round of training.
  • the borrower's address text of the borrower into the deep learning model after inputting the borrower's address text of the borrower into the deep learning model to obtain the at least one level of administrative region, it also includes:
  • the missing administrative areas are determined according to a preset administrative area tree, and the preset administrative area tree is used to represent the hierarchical relationship between administrative areas; and/ or,
  • the text of the address of the borrower is input into the third-party interface to obtain the missing administrative areas.
  • an abnormal application identification device including:
  • the application address information acquisition module is used to obtain the application address information of at least one borrower through a deep learning model, and the application address information includes: the address of the lender and the address of the borrower, the administrative area corresponding to the address of the lender and the The administrative region corresponding to the address of the borrower is different;
  • An address clustering module configured to cluster the at least one borrower according to the address of the lender and the address of the borrower to obtain at least one group of borrowers;
  • the off-site loan probability acquisition module is used to obtain the off-site loan probability corresponding to the borrower group from the preset transfer matrix according to the lender address and the borrower address corresponding to the borrower group for each of the borrower groups , the preset transfer matrix is used to indicate the probability of inter-regional loans between different administrative regions;
  • the abnormal identification module is used to determine the target borrower group with abnormal application from at least one borrower group according to the probability of borrowing in other places of the borrower group.
  • the abnormality identification module is also used for:
  • the off-site loan probability of the borrower group and the abnormal loan data that have appeared in the borrower group determine the target borrower group that has an abnormal application from at least one borrower group, and the abnormal loan data includes at least one of the following Item: the total overdue loan amount of each borrower in the borrower group, and the total number of days overdue loans of each borrower in the borrower group.
  • the abnormality identification module is also used for:
  • a target borrower group with abnormal applications is determined from at least one borrower group according to the ratio.
  • inter-regional loan probability between different administrative regions is generated through the following modules:
  • the first loan probability generation module is used to perform weighted operation on at least one correlation attribute between the first administrative region corresponding to the borrower's address and the second administrative region corresponding to the lender's address to obtain the off-site loan probability, the at least An associated attribute includes at least one of the following: the distance between the first administrative area and the second administrative area, the total value of production of the second administrative area and the total value of production of the first administrative area.
  • the borrower's address includes at least one level of administrative region
  • the application address information acquisition module is also used to:
  • the deep learning model is obtained through training with preset training samples, and the training samples include at least one of the following Item: sample address text, each character in the sample address text corresponds to the sample type of the character, and the sample type is one of the following: the start character of a level of administrative area, the end of a level of administrative area characters, other characters.
  • the deep learning model includes: an input layer, a bidirectional LSTM layer, and a CRF layer.
  • the input layer is used to receive the sample address text
  • the bidirectional LSTM layer is used to process the sample address text.
  • the address text is processed to obtain a vector
  • the CRF layer is used to predict the prediction type of each character in the sample address text according to the vector
  • the sample type and the prediction type are used to determine a loss value.
  • the loss When the value satisfies the convergence condition, the training ends, and when the loss value does not satisfy the convergence condition, the parameters of the deep learning model are adjusted according to the loss value to perform the next round of training.
  • the device also includes:
  • the administrative area completion module is used to input the borrower address text of the borrower into the deep learning model to obtain the administrative area of the at least one level, if some levels are missing in the administrative area of the at least one level administrative regions, the missing administrative regions are determined according to the preset administrative region tree, which is used to represent the hierarchical relationship between administrative regions; and/or,
  • the text of the address of the borrower is input into the third-party interface to obtain the missing administrative areas.
  • the present application provides an electronic device, including: at least one processor and a memory;
  • the memory stores computer-executable instructions
  • the at least one processor executes the computer-executed instructions stored in the memory, so that the electronic device implements the method in the aforementioned first aspect.
  • the present application provides a computer-readable storage medium, where computer-executable instructions are stored in the computer-readable storage medium, and when the processor executes the computer-executable instructions, the computing device realizes the above-mentioned first aspect. method.
  • the present application provides a computer program, the computer program is used to implement the method in the aforementioned first aspect.
  • the abnormal application identification method and equipment provided in this application includes: obtaining the application address information of at least one borrower through a deep learning model, the application address information includes: the address of the lender and the address of the borrower, and the administrative area corresponding to the address of the lender The administrative region corresponding to the address of the borrower is different; according to the above address of the lender and the address of the borrower, at least one borrower is clustered to obtain at least one group of borrowers; for each group of borrowers, according to the corresponding loan of the borrower group The address of the borrower and the address of the borrower, and obtain the probability of a foreign loan corresponding to the borrower group from the preset transfer matrix, which is used to indicate the probability of a foreign loan between different administrative regions; according to the probability of a foreign loan of a borrower group , determining a target borrower group with an abnormal application from at least one borrower group.
  • the borrowers can be clustered according to the address of the lender and the address of the borrower to obtain at least one borrower group, and then the target borrower group of the abnormal loan can be identified by combining the abnormal loan probability between different administrative regions in the transfer matrix .
  • the whole process does not require manual processing, thereby reducing the identification cost.
  • this embodiment of the application only needs the address of the lender and the address of the borrower, and does not require data from other channels, so that the problem of low recognition accuracy due to the inability to obtain data from more channels can be avoided.
  • Fig. 1 is a schematic diagram of an abnormal application identification process provided by the prior art
  • Fig. 2 is a flow chart of specific steps of the abnormal application identification method provided by the embodiment of the present application.
  • Fig. 3 is a schematic structural diagram of a deep learning model provided by an embodiment of the present application.
  • Fig. 4 is a structural block diagram of an abnormal application identification device provided by an embodiment of the present application.
  • Fig. 5 is a structural block diagram of an electronic device provided by an embodiment of the present application.
  • This embodiment of the application can be applied to loan scenarios.
  • the borrower can provide a loan application to the lender, and the information required for review can be specified in the loan application.
  • the lender reviews the loan application, and the loan is successful after the review is passed.
  • This identification process can be carried out during the review process, and when the loan is identified as an abnormal loan, the review result can be determined as failed review. This identification process can also be done post-loan to notify the borrower to repay as soon as possible.
  • Fig. 2 is a flow chart of specific steps of the abnormal application identification method provided by the embodiment of the present application. Referring to Figure 2, the method may include:
  • S101 Obtain the application address information of at least one borrower through the deep learning model.
  • the application address information includes: the address of the lender and the address of the borrower, and the administrative area corresponding to the address of the lender is different from the administrative area corresponding to the address of the borrower.
  • the address of the borrower may include at least one of the following: the account address of the borrower, the residence address of the borrower, and the work address of the borrower.
  • the administrative region where the lender is located may be determined according to the IP (internet protocol, Internet Protocol) address of the lender's electronic device used when the borrower submits the loan application.
  • IP internet protocol
  • IP Protocol Internet Protocol
  • the correspondence between administrative regions and IP addresses is preset, and one administrative region may correspond to one or more IP addresses.
  • the borrower needs to go to the lender's offline service point and use the electronic device provided by the lender to submit a loan application.
  • the loan application may be entered by the lender itself, or may be entered by a staff member of the borrower.
  • the IP address of the lender's electronic device can be obtained, and the corresponding administrative region can be obtained after obtaining the IP address.
  • a tool for converting an IP address into an administrative area is provided in the prior art, and the tool can be called to convert the IP address into an administrative area.
  • the address of the borrower For the above-mentioned address of the borrower, it is usually entered by the borrower when applying for a loan, and the address of the borrower usually includes the administrative area and the detailed address below the administrative area.
  • the administrative area in the above borrower address can be obtained through a deep learning model.
  • the borrower enters an address text that includes the state, city, and county (or district), street, and subdivision.
  • the text of the borrower's address can be "XXXX Community, XXX Street, XXX County, XXXX City, XXXX Province".
  • the administrative region can be identified from the address text.
  • a province is the first level
  • a city is the second level
  • a county (or district) is the third level.
  • a province can include one or more cities
  • a city can include one or more counties.
  • At least one level of administrative regions can be identified from the borrower's address text through a deep learning model.
  • the text of the borrower's address of the borrower can be input into the deep learning model to obtain at least one level of administrative regions.
  • the deep learning model is obtained by training a large number of preset training samples, the training samples include at least one of the following: sample address text, each character in the sample address text corresponds to the sample type of the character, and the sample type is as follows One item: the start character of a level of administrative area, the end character of a level of administrative area, and the remaining characters.
  • each character in the sample address text corresponds to a sample type.
  • the start and end characters of the province are “B-PROV” and “E-PROV” respectively
  • the start and end characters of the city are “B-PROV” respectively.
  • CITY” and "E-CITY” the county starts and ends with “B- COUNTY” and “E-COUNTY” and the remaining characters are "O”.
  • a sample address text could be "X ⁇ B-PROV ⁇ X ⁇ O ⁇ province ⁇ E-PROV ⁇ X ⁇ B-CITY ⁇ X ⁇ O ⁇ city ⁇ E-CITY ⁇ X ⁇ B-COUNTY ⁇ X ⁇ O ⁇ county ⁇ E-COUNTY ⁇ X ⁇ O ⁇ X ⁇ O ⁇ Street ⁇ O ⁇ Dao ⁇ O ⁇ X ⁇ O ⁇ X ⁇ O ⁇ X ⁇ O ⁇ Small ⁇ O ⁇ District ⁇ O".
  • Fig. 3 is a schematic structural diagram of a deep learning model provided by an embodiment of the present application.
  • the deep learning model may include: an input layer, a bidirectional LSTM (long short term memory, long short term memory network) layer, and a CRF layer.
  • the input layer is used to receive the sample address text
  • the bidirectional LSTM layer is used to process the sample address text to obtain a vector
  • the CRF layer is used to predict the prediction type of each character in the sample address text according to the vector.
  • the input layer is used to receive the borrower’s address text
  • the bidirectional LSTM layer is used to process the borrower’s address text to obtain a vector
  • the CRF (conditional random field) layer is used to predict the borrower’s address text based on the vector The type of each character in the address text, so that the administrative regions of each level can be extracted from the borrower address text according to the type.
  • the predicted type of each character above corresponds to one of the sample types marked in the sample address text, and similarly, the type of each character corresponds to one of the sample types marked in the sample address text. But the prediction type and sample type corresponding to the same character may be the same or different.
  • the process of training the deep learning model through the above training samples may include multiple rounds of iterations.
  • a set of training samples can be input into the deep learning model to obtain the predicted type of each character in each training sample, and then, the predicted type of each character in this set of training samples and each The sample type of the character is input into the loss function to obtain the loss value; finally, it is determined whether the loss value meets the convergence condition.
  • the training ends.
  • the deep learning The parameters of the model are adjusted for the next round of training.
  • the loss function mentioned above may adopt a loss function commonly used in the prior art, for example, a cross-entropy loss function, an absolute value loss function, and a square sum loss function.
  • Satisfying the convergence condition of the above loss function may include but is not limited to: the loss value is less than or equal to a preset loss value threshold, and the loss value does not decrease after multiple rounds of iterations.
  • the borrower's address text entered by the borrower may lack information on some administrative regions, so that the administrative region identified from the borrower's address text lacks some levels. In this way, subsequent processing based on the address of the borrower will be inaccurate. In order to improve the accuracy of subsequent processing, it is necessary to complete the administrative area of the borrower's address.
  • the missing administrative regions are determined according to the preset administrative region tree; and/or, the borrower’s address text is input into the third-party interface to obtain the missing administrative regions Region, the preset administrative region tree is used to represent the hierarchical relationship between administrative regions.
  • the preset administrative region tree is a tree structure, which includes hierarchical relationships among all administrative regions.
  • the nodes in the administrative area tree form a parent-child relationship, and the administrative area of the child node belongs to the only administrative area of the parent node. Therefore, if there is a low-level administrative area in at least one level of administrative area, but there is no high-level administrative area, then the parent node can be determined according to the node corresponding to the low-level administrative area, so that the parent node corresponds to The administrative area is determined as a high-level administrative area.
  • a third-party interface can also be called to determine the missing administrative area based on the detailed address in the borrower's address text. For example, the county you belong to can be determined based on "community or street".
  • S102 Clustering at least one borrower according to the above lender address and borrower address to obtain at least one borrower group.
  • the address of the lender is converted into latitude and longitude coordinates
  • the address of the borrower is converted into coordinates of latitude and longitude
  • the borrowers are clustered according to the latitude and longitude coordinates of the lender and the latitude and longitude coordinates of the borrower.
  • the above clustering may use an existing clustering algorithm, and when the existing clustering algorithm can only perform clustering according to one dimension, the clustering algorithm is called multiple times to perform clustering. For example, first call the clustering algorithm to cluster at least one borrower according to the latitude and longitude coordinates of the lender’s address to obtain at least one first borrower group; The latitude and longitude coordinates of each borrower in the first borrower group are clustered to obtain at least one second borrower group, and each second borrower group of each first borrower group is at least one borrower obtained in S102 group.
  • the above-mentioned clustering algorithm is used to cluster the borrowers in the first borrower group according to the latitude and longitude coordinates of the borrower's address, and at least A process for the second group of borrowers may include: firstly, calling the clustering algorithm to cluster each borrower in the first group of borrowers according to the latitude and longitude coordinates of the residence address to obtain at least one first subgroup; then, calling The clustering algorithm clusters each borrower in each first subgroup according to the latitude and longitude coordinates of the account address to obtain at least one second subgroup; finally, the clustering algorithm is called to cluster each borrower in the first subgroup according to the latitude and longitude coordinates of the work address.
  • Each borrower in the second subgroup is clustered to obtain at least one third subgroup.
  • each third subgroup is a second group of borrowers.
  • each borrower in each borrower group obtained has the same borrower address (called the borrower address of the borrower group), and has the same lender address (The address of the lender known as the borrower group).
  • S103 For each group of borrowers, according to the address of the lender and the address of the borrower corresponding to the group of borrowers, obtain the probability of inter-regional loans corresponding to the group of borrowers from the preset transfer matrix, the preset transfer matrix is used to indicate different Probability of off-site loans between administrative regions.
  • the value of the mth row and the nth column of the preset transfer matrix may be the probability of off-site loans from the mth administrative region to the nth administrative region.
  • the probability of off-site loans is usually determined by the attribute between two administrative regions. This attribute can be Including but not limited to: distance, GPD (gross domestic product, gross domestic product) gap, and the proportion of loans that are not overdue. The smaller the distance, the larger the GDP gap, and the larger the proportion of unoverdue loans, the greater the probability of loans from other places. Therefore, the borrower group with the lower probability of off-site loans is more likely to have abnormal loans.
  • GPD gros domestic product, gross domestic product
  • the distance may be the length of a drivable route between two administrative regions, not the straight-line distance between the two administrative regions.
  • inter-regional loan probability between the above-mentioned different administrative regions can be generated through the following steps:
  • a weighted operation is performed on at least one correlation attribute between the first administrative region corresponding to the borrower's address and the second administrative region corresponding to the lender's address to obtain the probability of inter-regional loans from the first administrative region to the second administrative region.
  • the at least one associated attribute includes at least one of the following: the distance between the first administrative region and the second administrative region, the ratio between the gross production value of the second administrative region and the gross production value of the first administrative region, the borrower The proportion of non-overdue loans whose address belongs to the first administrative region and whose lender's address belongs to the second administrative region among the non-local loans.
  • the non-local loan refers to the loan application in which the address of the borrower and the address of the lender belong to different administrative regions.
  • the above ratio may be the ratio of quantity or the ratio of total loans.
  • the administrative regions in the above transfer matrix can be all administrative regions at any level. Of course, if the level is lower, the number of administrative regions is larger, the transfer matrix is larger, and the accuracy is higher.
  • a transition matrix can thus be generated by county or district.
  • the probability of off-site loans of any borrower group can be obtained from the transfer matrix. Specifically, first, determine the administrative area corresponding to the borrower address of the borrower group as the administrative area of the borrower, and determine the administrative area corresponding to the address of the lender of the borrower group as the administrative area of the lender; then, from the transfer matrix The administrative region of the borrower is obtained and the probability of non-local loans listed as the administrative region of the lender is used as the probability of non-regional loans of the group of borrowers.
  • S104 Determine a target borrower group that has an abnormal application from at least one borrower group according to the probability of borrowing in a different place of the borrower group.
  • the determination strategy of the target borrower group may include multiple strategies.
  • the borrower group with the smallest or smaller probability of off-site loans can be determined as the target borrower group.
  • the target borrower group is determined by combining the off-site loan probability and other information. Specifically, according to the borrower group's off-site loan probability and the abnormal loan data that have appeared in the borrower group, determine the target borrower group that has an abnormal application from at least one borrower group, and the abnormal loan data includes at least one of the following: The total amount of overdue loans of each borrower in the borrower group, and the total number of days overdue for each borrower in the borrower group.
  • the target borrower group is the group of borrowers whose probability of taking loans in other places is smaller, the total amount of overdue loans is larger, and the total number of days overdue of loans is larger.
  • the group of borrowers can be sorted in descending order comprehensively according to the reciprocal of the probability of off-site loans, the total amount of overdue loans, and the total number of days overdue for loans. Therefore, the borrower group with the highest ranking can be determined as the target borrower group.
  • the ratio of the data to the probability of off-site loans for the group of borrowers may include: first, determining the product of the total amount of overdue loans of each borrower in the group of borrowers and the total number of days overdue for loans of each borrower in the group of borrowers; then, A ratio of the product to the off-site loan probability for the group of borrowers is determined.
  • one or more borrower groups with a larger ratio can be determined as the target borrower group.
  • FIG. 4 is a structural block diagram of an apparatus for identifying abnormal applications provided in the embodiments of the present application. For ease of description, only the parts related to the embodiment of the present application are shown.
  • the abnormal application identification device 200 includes: an application address information acquisition module 201 , an address clustering module 202 , a remote loan probability acquisition module 203 and an abnormal identification module 204 .
  • the application address information acquisition module 201 is used to obtain the application address information of at least one borrower through a deep learning model, the application address information includes: the address of the lender and the address of the borrower, the administrative area corresponding to the address of the lender and the The administrative regions corresponding to the borrower's address are different.
  • the address clustering module 202 is configured to cluster the at least one borrower according to the address of the lender and the address of the borrower to obtain at least one group of borrowers.
  • the off-site loan probability acquisition module 203 is configured to acquire the off-site loan corresponding to the borrower group from the preset transfer matrix according to the lender address and the borrower address corresponding to the borrower group for each borrower group Probability, the preset transfer matrix is used to indicate the probability of inter-regional loans between different administrative regions.
  • the abnormal identification module 204 is configured to determine a target borrower group with abnormal applications from at least one borrower group according to the probability of borrowing in other places of the borrower group.
  • the abnormality identification module 204 is also used for:
  • the off-site loan probability of the borrower group and the abnormal loan data that have appeared in the borrower group determine the target borrower group that has an abnormal application from at least one borrower group, and the abnormal loan data includes at least one of the following Item: the total overdue loan amount of each borrower in the borrower group, and the total number of days overdue loans of each borrower in the borrower group.
  • the abnormality identification module 204 is also used for:
  • inter-regional loan probability between different administrative regions is generated through the following modules:
  • the first loan probability generation module is used to perform weighted operation on at least one correlation attribute between the first administrative region corresponding to the borrower's address and the second administrative region corresponding to the lender's address to obtain the off-site loan probability, the at least An associated attribute includes at least one of the following: the distance between the first administrative area and the second administrative area, the total value of production of the second administrative area and the total value of production of the first administrative area.
  • the borrower's address includes at least one level of administrative regions
  • the application address information acquisition module 201 is also used to:
  • the deep learning model is obtained through training with preset training samples, and the training samples include at least one of the following Item: sample address text, each character in the sample address text corresponds to the sample type of the character, and the sample type is one of the following: the start character of a level of administrative area, the end of a level of administrative area characters, other characters.
  • the deep learning model includes: an input layer, a bidirectional LSTM layer, and a CRF layer.
  • the input layer is used to receive the sample address text
  • the bidirectional LSTM layer is used to process the sample address text.
  • the address text is processed to obtain a vector
  • the CRF layer is used to predict the prediction type of each character in the sample address text according to the vector
  • the sample type and the prediction type are used to determine a loss value.
  • the loss When the value satisfies the convergence condition, the training ends, and when the loss value does not satisfy the convergence condition, the parameters of the deep learning model are adjusted according to the loss value to perform the next round of training.
  • the device also includes:
  • the administrative area completion module is used to input the borrower address text of the borrower into the deep learning model to obtain the administrative area of the at least one level, if some levels are missing in the administrative area of the at least one level administrative regions, then determine the missing administrative regions according to the preset administrative region tree, which is used to represent the hierarchical relationship between administrative regions; and/or, if the administrative regions of at least one level are missing For administrative regions at some levels, input the text of the address of the borrower into the third-party interface to obtain the missing administrative regions.
  • Fig. 5 is a structural block diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device 600 includes a memory 602 and at least one processor 601 .
  • the memory 602 stores computer-executable instructions. At least one processor 601 executes the computer-executed instructions stored in the memory 602, so that the electronic device 601 implements the method in FIG. 2 .
  • the electronic device may also include a receiver 603 and a transmitter 604, the receiver 603 is used to receive information from other devices or devices, and forwards it to the processor 601, and the transmitter 604 is used to send information to other devices or devices .
  • the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores computer-executable instructions, and when the processor executes the computer-executable instructions, the computing device implements the method described in FIG. 2 .
  • An embodiment of the present application further provides a computer program, the computer program is used to implement the method described in FIG. 2 above.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Technology Law (AREA)
  • Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Educational Administration (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Provided in the present application are an abnormal application recognition method and device. The method comprises: acquiring application address information of at least one borrower, wherein the application address information comprises a lender address and a borrower address, which correspond to different administrative regions; clustering the at least one borrower according to the lender address and the borrower address, so as to obtain at least one borrower group; according to a lender address and a borrower address that correspond to the borrower group, acquiring, from a preset transfer matrix, a remote lending probability corresponding to the borrower group, wherein the preset transfer matrix is used for indicating remote lending probabilities between different administrative regions; and determining, from among the at least one borrower group and according to the remote lending probability of the borrower group, a target borrower group having an abnormal application. By means of the present application, a target borrower group having abnormal lending can be recognized by combining address clustering with abnormal lending probabilities between different administrative regions. There is no need to manually process data from different channels, thereby reducing the recognition cost, and improving the recognition accuracy.

Description

异常申请识别方法及设备Abnormal application identification method and equipment
本申请要求于2021年12月27日提交中国专利局、申请号为202111609530.2、申请名称为“异常申请识别方法及设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202111609530.2 and the title of "abnormal application identification method and equipment" filed with the China Patent Office on December 27, 2021, the entire contents of which are incorporated by reference in this application.
技术领域technical field
本申请实施例涉及金融科技技术领域,尤其涉及一种异常申请识别方法及设备。The embodiments of the present application relate to the technical field of financial technology, and in particular to a method and device for identifying abnormal applications.
背景技术Background technique
随着计算机技术的发展,越来越多的技术应用在金融领域,传统金融业正在逐步向金融科技(Finteh)转变,贷款的异常申请识别技术也不例外,但由于金融行业的安全性、实时性要求,也对技术提出的更高的要求。在金融科技(Fintech)技术领域中,申请贷款是很常见的场景。申请贷款之后需要按时还款,如果不按时还款,那么会给贷款方造成经济损失。为了避免给贷款方带来经济损失,需要识别贷款申请中哪些是异常申请。With the development of computer technology, more and more technologies are applied in the financial field. The traditional financial industry is gradually transforming into financial technology (Finteh). The abnormal application identification technology for loans is no exception. However, due to the security and real-time Sexual requirements, but also higher requirements for technology. In the financial technology (Fintech) technology field, applying for a loan is a very common scenario. After applying for a loan, you need to repay the loan on time. If you fail to repay the loan on time, it will cause economic losses to the lender. In order to avoid economic losses to lenders, it is necessary to identify abnormal applications among loan applications.
现有技术中,识别异常申请的方式有两种。图1是现有技术提供的一种异常申请识别过程示意图。参照图1所示,首先,将多个贷款申请信息输入到聚类算法中进行聚类,得到一个或多个贷款方群;然后,分析人员利用经验知识对贷款方群进行分析,以确定哪些贷款方群是存在异常申请的贷款方群。其中,贷款申请信息中可以包括:基础信息、登录信息、关联信息和贷款信息。基本信息包括:借款方的年龄、借款方的收入。登录信息可以包括:登录所使用设备的型号和登录所使用设备的网络(IP,Internet protocol)地址。关联信息可以包括:联系人信息和家庭成员信息。贷款信息可以包括:贷款金额和消费地址。In the prior art, there are two ways to identify abnormal applications. FIG. 1 is a schematic diagram of an abnormal application identification process provided by the prior art. Referring to Figure 1, firstly, multiple loan application information is input into the clustering algorithm for clustering to obtain one or more loan party groups; then, analysts use empirical knowledge to analyze the loan party groups to determine which The lender group is a lender group with abnormal applications. Wherein, the loan application information may include: basic information, login information, associated information and loan information. Basic information includes: age of the borrower, income of the borrower. The login information may include: the model of the equipment used for login and the network (IP, Internet protocol) address of the equipment used for login. The associated information may include: contact information and family member information. Loan information may include: loan amount and consumption address.
现有技术中识别异常申请的第二种方式是通过关联图谱实现的。具体地,首先,从各个渠道获取大量借款方的基本信息、贷款交易信息;然后,分析贷款方身份、收入、偏好等信息,对贷款方打一个标签;再然后,建立大量的标签与贷款交易信息之间的对应关系形成关联图谱;最后,根据该关联图谱识别异常申请。The second way to identify abnormal applications in the prior art is through correlation graphs. Specifically, firstly, obtain a large number of borrowers' basic information and loan transaction information from various channels; then, analyze the lender's identity, income, preference and other information, and label the lender; then, establish a large number of labels and loan transactions Correspondence between information forms an association map; finally, abnormal applications are identified according to the association map.
然而,上述第一种方式需要分析人员进行分析,并且聚类过程中各个维度的权重也需要人为确定,导致异常申请的识别成本较高。上述第二种方法需要大量不同渠道的数据作为支撑。但是实际应用中很难拿到多渠道的数据,导致得到的关系图谱不够全面,降低了异常申请识别准确度。However, the above-mentioned first method requires analysts to conduct analysis, and the weights of each dimension in the clustering process also need to be manually determined, resulting in high identification costs for abnormal applications. The second method above requires a large amount of data from different channels as support. However, it is difficult to obtain multi-channel data in practical applications, resulting in an incomplete relationship map, which reduces the accuracy of identifying abnormal applications.
技术解决方案technical solution
本申请提供一种异常申请识别方法及设备,以降低异常贷款的识别成本,提高识别准确度。The present application provides a method and equipment for identifying abnormal applications, so as to reduce the identification cost of abnormal loans and improve the identification accuracy.
第一方面,本申请提供一种异常申请识别方法,所述方法包括:In the first aspect, the present application provides a method for identifying abnormal applications, the method comprising:
通过深度学习模型获取至少一个借款方的申请地址信息,所述申请地址信息中包括:贷款方地址和借款方地址,所述贷款方地址对应的行政区域和所述借款方地址对应的行政区域不同;Obtain the application address information of at least one borrower through a deep learning model, the application address information includes: the address of the lender and the address of the borrower, and the administrative area corresponding to the address of the lender is different from the administrative area corresponding to the address of the borrower ;
根据所述贷款方地址和所述借款方地址对所述至少一个借款方进行聚类,得到至少一个借款方群;clustering the at least one borrower according to the address of the lender and the address of the borrower to obtain at least one group of borrowers;
针对每个所述借款方群,根据所述借款方群对应的贷款方地址和借款方地址,从预设转移矩阵中获取所述借款方群对应的异地贷款概率,所述预设转移矩阵用于指示不同行政区域之间的异地贷款概率;For each of the borrower groups, according to the lender address and the borrower address corresponding to the borrower group, the probability of a foreign loan corresponding to the borrower group is obtained from a preset transfer matrix, and the preset transfer matrix is used Indicates the probability of inter-regional loans between different administrative regions;
根据所述借款方群的异地贷款概率,从至少一个借款方群中确定存在异常申请的目标借款方群。A target borrower group with abnormal applications is determined from at least one borrower group according to the off-site loan probability of the borrower group.
可选地,所述根据所述借款方群的异地贷款概率,从至少一个借款方群中确定存在异常申请的目标借款方群,包括:Optionally, the determining the target borrower group with abnormal application from at least one borrower group according to the probability of the borrower group's off-site loan includes:
根据所述借款方群的异地贷款概率和所述借款方群中已出现的异常贷款数据,从至少一个借款方群中确定存在异常申请的目标借款方群,所述异常贷款数据包括以下至少一项:所述借款方群中各借款方已逾期的贷款总金额、所述借款方群中各借款方的贷款逾期总天数。According to the off-site loan probability of the borrower group and the abnormal loan data that have appeared in the borrower group, determine the target borrower group that has an abnormal application from at least one borrower group, and the abnormal loan data includes at least one of the following Item: the total overdue loan amount of each borrower in the borrower group, and the total number of days overdue loans of each borrower in the borrower group.
可选地,所述根据所述借款方群的异地贷款概率和所述借款方群中已出现的异常贷款数据,从至少一个借款方群中确定存在异常申请的目标借款方群,包括:Optionally, the determining the target borrower group with abnormal application from at least one borrower group according to the probability of off-site loans of the borrower group and the abnormal loan data that have appeared in the borrower group includes:
确定所述借款方群中已出现的异常贷款数据与所述借款方群的异地贷款概率的比值;Determining the ratio of the abnormal loan data that has appeared in the borrower group to the probability of off-site loans of the borrower group;
根据所述比值从至少一个借款方群中确定存在异常申请的目标借款方群。A target borrower group with abnormal applications is determined from at least one borrower group according to the ratio.
可选地,所述不同行政区域之间的异地贷款概率是通过以下步骤生成的:Optionally, the inter-regional loan probability between different administrative regions is generated through the following steps:
对借款方地址对应的第一行政区域和贷款方地址对应的第二行政区域之间的至少一种关联属性进行加权运算得到所述异地贷款概率,所述至少一种关联属性包括以下至少一项:所述第一行政区域与所述第二行政区域之间的距离、所述第二行政区域的生产总值和所述第一行政区域的生产总值之间的比值、借款方地址属于所述第一行政区域且贷款方地址属于所述第二行政区域的未逾期贷款在异地贷款中的占比。Perform a weighted operation on at least one associated attribute between the first administrative region corresponding to the borrower's address and the second administrative region corresponding to the lender's address to obtain the probability of the off-site loan, and the at least one associated attribute includes at least one of the following : the distance between the first administrative region and the second administrative region, the ratio between the gross production value of the second administrative region and the gross production value of the first administrative region, the address of the borrower belongs to the The proportion of non-overdue loans in the above-mentioned first administrative region and the address of the lender belonging to the above-mentioned second administrative region among the non-local loans.
可选地,所述借款方地址中包括至少一个层级的行政区域,所述通过深度学习模型获取至少一个借款方的申请地址信息,包括:Optionally, the address of the borrower includes at least one level of administrative region, and the acquisition of the application address information of at least one borrower through a deep learning model includes:
将所述借款方的借款方地址文本输入到深度学习模型中,得到所述至少一个层级的行政区域,所述深度学习模型是通过预设训练样本训练得到的,所述训练样本包括以下至少一项:样本地址文本,所述样本地址文本中的每个字符对应有所述字符的样本类型,所述样本类型为以下一项:一个层级的行政区域的开始字符、一个层级的行政区域的结束字符、其余字符。Inputting the borrower's address text of the borrower into the deep learning model to obtain the at least one level of administrative region, the deep learning model is obtained through training with preset training samples, and the training samples include at least one of the following Item: sample address text, each character in the sample address text corresponds to the sample type of the character, and the sample type is one of the following: the start character of a level of administrative area, the end of a level of administrative area characters, other characters.
可选地,所述深度学习模型包括:输入层、双向LSTM层、CRF层,在训练过程中,所述输入层用于接收所述样本地址文本,所述双向LSTM层用于对所述样本地址文本进行处理得到向量,所述CRF层用于根据所述向量预测所述样本地址文本中每个字符的预测类型,所述样本类型和所述预测类型用于确定损失值,在所述损失值满足收敛条件时,结束训练,在所述损失值不满足收敛条件时,根据所述损失值对所述深度学习模型的参数进行调整,以进行以下一轮训练。Optionally, the deep learning model includes: an input layer, a bidirectional LSTM layer, and a CRF layer. During training, the input layer is used to receive the sample address text, and the bidirectional LSTM layer is used to process the sample address text. The address text is processed to obtain a vector, and the CRF layer is used to predict the prediction type of each character in the sample address text according to the vector, and the sample type and the prediction type are used to determine a loss value. In the loss When the value satisfies the convergence condition, the training ends, and when the loss value does not satisfy the convergence condition, the parameters of the deep learning model are adjusted according to the loss value to perform the next round of training.
可选地,所述将所述借款方的借款方地址文本输入到深度学习模型中,得到所述至少一个层级的行政区域之后,还包括:Optionally, after inputting the borrower's address text of the borrower into the deep learning model to obtain the at least one level of administrative region, it also includes:
若所述至少一个层级的行政区域中缺失部分层级的行政区域,则根据预设行政区域树确定缺失的行政区域,所述预设行政区域树用于表示行政区域之间的层级关系;和/或,If some levels of administrative areas are missing in the at least one level of administrative areas, the missing administrative areas are determined according to a preset administrative area tree, and the preset administrative area tree is used to represent the hierarchical relationship between administrative areas; and/ or,
若所述至少一个层级的行政区域中缺失部分层级的行政区域,则将所述借款方地址文本输入至第三方接口,得到缺失的行政区域。If some levels of administrative areas are missing in the at least one level of administrative areas, the text of the address of the borrower is input into the third-party interface to obtain the missing administrative areas.
第二方面,本申请提供一种异常申请识别装置,包括:In the second aspect, the present application provides an abnormal application identification device, including:
申请地址信息获取模块,用于通过深度学习模型获取至少一个借款方的申请地址信息,所述申请地址信息中包括:贷款方地址和借款方地址,所述贷款方地址对应的行政区域和所述借款方地址对应的行政区域不同;The application address information acquisition module is used to obtain the application address information of at least one borrower through a deep learning model, and the application address information includes: the address of the lender and the address of the borrower, the administrative area corresponding to the address of the lender and the The administrative region corresponding to the address of the borrower is different;
地址聚类模块,用于根据所述贷款方地址和所述借款方地址对所述至少一个借款方进行聚类,得到至少一个借款方群;An address clustering module, configured to cluster the at least one borrower according to the address of the lender and the address of the borrower to obtain at least one group of borrowers;
异地贷款概率获取模块,用于针对每个所述借款方群,根据所述借款方群对应的贷款方地址和借款方地址,从预设转移矩阵中获取所述借款方群对应的异地贷款概率,所述预设转移矩阵用于指示不同行政区域之间的异地贷款概率;The off-site loan probability acquisition module is used to obtain the off-site loan probability corresponding to the borrower group from the preset transfer matrix according to the lender address and the borrower address corresponding to the borrower group for each of the borrower groups , the preset transfer matrix is used to indicate the probability of inter-regional loans between different administrative regions;
异常识别模块,用于根据所述借款方群的异地贷款概率,从至少一个借款方群中确定存在异常申请的目标借款方群。The abnormal identification module is used to determine the target borrower group with abnormal application from at least one borrower group according to the probability of borrowing in other places of the borrower group.
可选地,所述异常识别模块还用于:Optionally, the abnormality identification module is also used for:
根据所述借款方群的异地贷款概率和所述借款方群中已出现的异常贷款数据,从至少一个借款方群中确定存在异常申请的目标借款方群,所述异常贷款数据包括以下至少一项:所述借款方群中各借款方已逾期的贷款总金额、所述借款方群中各借款方的贷款逾期总天数。According to the off-site loan probability of the borrower group and the abnormal loan data that have appeared in the borrower group, determine the target borrower group that has an abnormal application from at least one borrower group, and the abnormal loan data includes at least one of the following Item: the total overdue loan amount of each borrower in the borrower group, and the total number of days overdue loans of each borrower in the borrower group.
可选地,所述异常识别模块还用于:Optionally, the abnormality identification module is also used for:
在根据所述借款方群的异地贷款概率和所述借款方群中已出现的异常贷款数据,从至少一个借款方群中确定存在异常申请的目标借款方群时,确定所述借款方群中已出现的异常贷款数据与所述借款方群的异地贷款概率的比值;When determining a target borrower group with an abnormal application from at least one borrower group according to the probability of a foreign loan of the borrower group and the abnormal loan data that has appeared in the borrower group, determine that in the borrower group The ratio of the abnormal loan data that has appeared to the probability of foreign loans of the borrower group;
根据所述比值从至少一个借款方群中确定存在异常申请的目标借款方群。A target borrower group with abnormal applications is determined from at least one borrower group according to the ratio.
可选地,所述不同行政区域之间的异地贷款概率是通过以下模块生成的:Optionally, the inter-regional loan probability between different administrative regions is generated through the following modules:
第一贷款概率生成模块,用于对借款方地址对应的第一行政区域和贷款方地址对应的第二行政区域之间的至少一种关联属性进行加权运算得到所述异地贷款概率,所述至少一种关联属性包括以下至少一项:所述第一行政区域与所述第二行政区域之间的距离、所述第二行政区域的生产总值和所述第一行政区域的生产总值之间的比值、借款方地址属于所述第一行政区域且贷款方地址属于所述第二行政区域的未逾期贷款在异地贷款中的占比。The first loan probability generation module is used to perform weighted operation on at least one correlation attribute between the first administrative region corresponding to the borrower's address and the second administrative region corresponding to the lender's address to obtain the off-site loan probability, the at least An associated attribute includes at least one of the following: the distance between the first administrative area and the second administrative area, the total value of production of the second administrative area and the total value of production of the first administrative area The ratio between the ratios, the ratio of non-overdue loans whose address of the borrower belongs to the first administrative region and the address of the lender belongs to the second administrative region in the proportion of non-local loans.
可选地,所述借款方地址中包括至少一个层级的行政区域,所述申请地址信息获取模块还用于:Optionally, the borrower's address includes at least one level of administrative region, and the application address information acquisition module is also used to:
将所述借款方的借款方地址文本输入到深度学习模型中,得到所述至少一个层级的行政区域,所述深度学习模型是通过预设训练样本训练得到的,所述训练样本包括以下至少一项:样本地址文本,所述样本地址文本中的每个字符对应有所述字符的样本类型,所述样本类型为以下一项:一个层级的行政区域的开始字符、一个层级的行政区域的结束字符、其余字符。Inputting the borrower's address text of the borrower into the deep learning model to obtain the at least one level of administrative region, the deep learning model is obtained through training with preset training samples, and the training samples include at least one of the following Item: sample address text, each character in the sample address text corresponds to the sample type of the character, and the sample type is one of the following: the start character of a level of administrative area, the end of a level of administrative area characters, other characters.
可选地,所述深度学习模型包括:输入层、双向LSTM层、CRF层,在训练过程中,所述输入层用于接收所述样本地址文本,所述双向LSTM层用于对所述样本地址文本进行处理得到向量,所述CRF层用于根据所述向量预测所述样本地址文本中每个字符的预测类型,所述样本类型和所述预测类型用于确定损失值,在所述损失值满足收敛条件时,结束训练,在所述损失值不满足收敛条件时,根据所述损失值对所述深度学习模型的参数进行调整,以进行以下一轮训练。Optionally, the deep learning model includes: an input layer, a bidirectional LSTM layer, and a CRF layer. During training, the input layer is used to receive the sample address text, and the bidirectional LSTM layer is used to process the sample address text. The address text is processed to obtain a vector, and the CRF layer is used to predict the prediction type of each character in the sample address text according to the vector, and the sample type and the prediction type are used to determine a loss value. In the loss When the value satisfies the convergence condition, the training ends, and when the loss value does not satisfy the convergence condition, the parameters of the deep learning model are adjusted according to the loss value to perform the next round of training.
可选地,所述装置还包括:Optionally, the device also includes:
行政区域补全模块,用于在将所述借款方的借款方地址文本输入到深度学习模型中,得到所述至少一个层级的行政区域之后,若所述至少一个层级的行政区域中缺失部分层级的行政区域,则根据预设行政区域树确定缺失的行政区域,所述预设行政区域树用于表示行政区域之间的层级关系;和/或,The administrative area completion module is used to input the borrower address text of the borrower into the deep learning model to obtain the administrative area of the at least one level, if some levels are missing in the administrative area of the at least one level administrative regions, the missing administrative regions are determined according to the preset administrative region tree, which is used to represent the hierarchical relationship between administrative regions; and/or,
若所述至少一个层级的行政区域中缺失部分层级的行政区域,则将所述借款方地址文本输入至第三方接口,得到缺失的行政区域。If some levels of administrative areas are missing in the at least one level of administrative areas, the text of the address of the borrower is input into the third-party interface to obtain the missing administrative areas.
第三方面,本申请提供一种电子设备,包括:至少一个处理器和存储器;In a third aspect, the present application provides an electronic device, including: at least one processor and a memory;
所述存储器存储计算机执行指令;the memory stores computer-executable instructions;
所述至少一个处理器执行所述存储器存储的计算机执行指令,使得所述电子设备实现如前述第一方面的方法。The at least one processor executes the computer-executed instructions stored in the memory, so that the electronic device implements the method in the aforementioned first aspect.
第四方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,使计算设备实现如前述第一方面的方法。In a fourth aspect, the present application provides a computer-readable storage medium, where computer-executable instructions are stored in the computer-readable storage medium, and when the processor executes the computer-executable instructions, the computing device realizes the above-mentioned first aspect. method.
第五方面,本申请提供一种计算机程序,所述计算机程序用于实现如前述第一方面的方法。In a fifth aspect, the present application provides a computer program, the computer program is used to implement the method in the aforementioned first aspect.
本申请提供的异常申请识别方法及设备,该方法包括:通过深度学习模型获取至少一个借款方的申请地址信息,申请地址信息中包括:贷款方地址和借款方地址,贷款方地址对应的行政区域和借款方地址对应的行政区域不同;根据上述贷款方地址和借款方地址对至少一个借款方进行聚类,得到至少一个借款方群;针对每个借款方群,根据该借款方群对应的贷款方地址和借款方地址,从预设转移矩阵中获取该借款方群对应的异地贷款概率,该预设转移矩阵用于指示不同行政区域之间的异地贷款概率;根据借款方群的异地贷款概率,从至少一个借款方群中确定存在异常申请的目标借款方群。本申请实施例可以按照贷款方地址和借款方地址对借款方进行聚类,得到至少一个借款方群,然后再结合转移矩阵中不同行政区域之间的异常贷款概率识别异常贷款的目标借款方群。这整个过程不需要人为进行处理,从而降低了识别成本。此外,本申请实施例仅需要贷款方地址和借款方地址即可,并不需要其余渠道的数据,从而可以避免由于无法拿到较多渠道的数据而导致识别准确度较低的问题。The abnormal application identification method and equipment provided in this application, the method includes: obtaining the application address information of at least one borrower through a deep learning model, the application address information includes: the address of the lender and the address of the borrower, and the administrative area corresponding to the address of the lender The administrative region corresponding to the address of the borrower is different; according to the above address of the lender and the address of the borrower, at least one borrower is clustered to obtain at least one group of borrowers; for each group of borrowers, according to the corresponding loan of the borrower group The address of the borrower and the address of the borrower, and obtain the probability of a foreign loan corresponding to the borrower group from the preset transfer matrix, which is used to indicate the probability of a foreign loan between different administrative regions; according to the probability of a foreign loan of a borrower group , determining a target borrower group with an abnormal application from at least one borrower group. In this embodiment of the application, the borrowers can be clustered according to the address of the lender and the address of the borrower to obtain at least one borrower group, and then the target borrower group of the abnormal loan can be identified by combining the abnormal loan probability between different administrative regions in the transfer matrix . The whole process does not require manual processing, thereby reducing the identification cost. In addition, this embodiment of the application only needs the address of the lender and the address of the borrower, and does not require data from other channels, so that the problem of low recognition accuracy due to the inability to obtain data from more channels can be avoided.
附图说明Description of drawings
图1是现有技术提供的一种异常申请识别过程示意图;Fig. 1 is a schematic diagram of an abnormal application identification process provided by the prior art;
图2是本申请实施例提供的异常申请识别方法的具体步骤流程图;Fig. 2 is a flow chart of specific steps of the abnormal application identification method provided by the embodiment of the present application;
图3是本申请实施例提供的深度学习模型的结构示意图;Fig. 3 is a schematic structural diagram of a deep learning model provided by an embodiment of the present application;
图4是本申请实施例提供的异常申请识别装置的一种结构框图;Fig. 4 is a structural block diagram of an abnormal application identification device provided by an embodiment of the present application;
图5是本申请实施例提供的电子设备的一种结构框图。Fig. 5 is a structural block diagram of an electronic device provided by an embodiment of the present application.
本发明的实施方式Embodiments of the present invention
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the application with reference to the drawings in the embodiments of the application. Apparently, the described embodiments are only some of the embodiments of the application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例例如能够以除了在这里图示或描述的那些以外的顺序实施。The terms "first", "second" and the like in the specification and claims of the present application and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the application described herein, for example, can be practiced in sequences other than those illustrated or described herein.
此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。Furthermore, the terms "comprising" and "having", as well as any variations thereof, are intended to cover a non-exclusive inclusion, for example, a process, method, system, product or device comprising a sequence of steps or elements is not necessarily limited to the expressly listed instead, may include other steps or elements not explicitly listed or inherent to the process, method, product or apparatus.
本申请实施例可以适用于贷款场景中。借款方可以向贷款方提供贷款申请,贷款申请中可以指定审核所需要的信息。贷款方对贷款申请进行审核,在审核通过之后贷款成功。This embodiment of the application can be applied to loan scenarios. The borrower can provide a loan application to the lender, and the information required for review can be specified in the loan application. The lender reviews the loan application, and the loan is successful after the review is passed.
为了避免贷款带来的经济损失,需要准确的识别异常贷款。这个识别过程可以在审核过程中进行,当将贷款识别为异常贷款时,可以将审核结果确定为审核未通过。该识别过程还可以在贷款之后进行,以通知借款方尽快还款。In order to avoid the economic losses caused by loans, it is necessary to accurately identify abnormal loans. This identification process can be carried out during the review process, and when the loan is identified as an abnormal loan, the review result can be determined as failed review. This identification process can also be done post-loan to notify the borrower to repay as soon as possible.
图2是本申请实施例提供的异常申请识别方法的具体步骤流程图。参照图2所示,该方法可以包括:Fig. 2 is a flow chart of specific steps of the abnormal application identification method provided by the embodiment of the present application. Referring to Figure 2, the method may include:
S101:通过深度学习模型获取至少一个借款方的申请地址信息,申请地址信息中包括:贷款方地址和借款方地址,贷款方地址对应的行政区域和借款方地址对应的行政区域不同。S101: Obtain the application address information of at least one borrower through the deep learning model. The application address information includes: the address of the lender and the address of the borrower, and the administrative area corresponding to the address of the lender is different from the administrative area corresponding to the address of the borrower.
其中,借款方地址可以包括以下至少一种:借款方的户口地址、借款方的居住地址、借款方的工作地址。Wherein, the address of the borrower may include at least one of the following: the account address of the borrower, the residence address of the borrower, and the work address of the borrower.
对于上述贷款方地址,其可以包括贷款方所在的行政区域。贷款方所在的行政区域可以根据借款方提交贷款申请时,所使用的贷款方的电子设备的IP(internet protocol,因特网协议)地址确定。行政区域和IP地址之间的对应关系是预设的,一个行政区域可以对应有一个或多个IP地址。例如,借款方需要到贷款方的线下服务点,使用贷款方提供的电子设备提交贷款申请。该贷款申请可以是贷款方自己输入的,也可以是借款方的工作人员输入的。在这种情况下,可以获取该贷款方的电子设备的IP地址,在得到IP地址之后可以得到对应的行政区域。现有技术中提供有将IP地址转换为行政区域的工具,可调用该工具将IP地址转换为行政区域。For the above address of the lender, it may include the administrative region where the lender is located. The administrative region where the lender is located may be determined according to the IP (internet protocol, Internet Protocol) address of the lender's electronic device used when the borrower submits the loan application. The correspondence between administrative regions and IP addresses is preset, and one administrative region may correspond to one or more IP addresses. For example, the borrower needs to go to the lender's offline service point and use the electronic device provided by the lender to submit a loan application. The loan application may be entered by the lender itself, or may be entered by a staff member of the borrower. In this case, the IP address of the lender's electronic device can be obtained, and the corresponding administrative region can be obtained after obtaining the IP address. A tool for converting an IP address into an administrative area is provided in the prior art, and the tool can be called to convert the IP address into an administrative area.
对于上述借款方地址,其通常是借款方申请贷款时输入的,借款方地址通常包括行政区域和行政区域以下的细化地址。借款方地址对应的行政区域和借款方地址对应的行政区域。上述借款方地址中的行政区域可以通过深度学习模型获取到。借款方输入一个地址文本,该地址文本中包括省、市和县(或区)、街道和小区。例如,借款方地址文本可以为“XXXX省XXXX市XXXX县XXX街道XXXX小区”。此时,可以从该地址文本中识别行政区域。For the above-mentioned address of the borrower, it is usually entered by the borrower when applying for a loan, and the address of the borrower usually includes the administrative area and the detailed address below the administrative area. The administrative area corresponding to the address of the borrower and the administrative area corresponding to the address of the borrower. The administrative area in the above borrower address can be obtained through a deep learning model. The borrower enters an address text that includes the state, city, and county (or district), street, and subdivision. For example, the text of the borrower's address can be "XXXX Community, XXX Street, XXX County, XXXX City, XXXX Province". At this point, the administrative region can be identified from the address text.
可以看出,上述行政区域是具有层级的。例如,省是第一层级,市是第二层级,县(或区)是第三层级,一个省可以包括一个或多个市,一个市可以包括一个或多个县。It can be seen that the above-mentioned administrative regions are hierarchical. For example, a province is the first level, a city is the second level, and a county (or district) is the third level. A province can include one or more cities, and a city can include one or more counties.
本申请实施例可以通过深度学习模型从借款方地址文本中识别到至少一个层级的行政区域。In this embodiment of the present application, at least one level of administrative regions can be identified from the borrower's address text through a deep learning model.
具体地,可以将借款方的借款方地址文本输入到深度学习模型中,得到至少一个层级的行政区域。该深度学习模型是通过大量的预设训练样本训练得到的,该训练样本包括以下至少一项:样本地址文本,样本地址文本中的每个字符对应有该字符的样本类型,该样本类型为以下一项:一个层级的行政区域的开始字符、一个层级的行政区域的结束字符、其余字符。Specifically, the text of the borrower's address of the borrower can be input into the deep learning model to obtain at least one level of administrative regions. The deep learning model is obtained by training a large number of preset training samples, the training samples include at least one of the following: sample address text, each character in the sample address text corresponds to the sample type of the character, and the sample type is as follows One item: the start character of a level of administrative area, the end character of a level of administrative area, and the remaining characters.
其中,样本地址文本与借款方地址文本包括的内容类似,区别在于样本地址文本中的每个字符均对应一个样本类型。例如,多个层级的行政区域包括:省、市和县时,省的开始字符和结束字符分别为“B-PROV”和“E-PROV”,市的开始字符和结束字符分别为“B-CITY”和“E-CITY”,县的开始字符和结束字符分别为“B- COUNTY”和“E- COUNTY”,其余字符为“O”。从而,一个样本地址文本可以为“X\ B-PROV\ X\ O\省\E-PROV\X\ B-CITY \X\O\市\ E-CITY \X\ B- COUNTY \X\O\县\ E- COUNTY \ X \O\X\ O\街\O\道\O\X\O\X\O\X\O\X\O\小\O\区\O”。Wherein, the content of the sample address text is similar to that of the borrower address text, the difference is that each character in the sample address text corresponds to a sample type. For example, when multiple levels of administrative regions include: province, city, and county, the start and end characters of the province are "B-PROV" and "E-PROV" respectively, and the start and end characters of the city are "B-PROV" respectively. CITY" and "E-CITY", the county starts and ends with "B- COUNTY" and "E-COUNTY" and the remaining characters are "O". Thus, a sample address text could be "X\ B-PROV\ X\ O\province\E-PROV\X\ B-CITY \X\O \city\ E-CITY\X\ B-COUNTY\X\O\county\ E-COUNTY\X \O\X\ O\Street\O\Dao\O\X\O\X\O\X\O\X\O\Small\O\District\O".
上述深度学习模型可以是现有的任意深度学习模型,本申请实施例对其不加以限制。经过本申请实施例的多次试验得出,一种识别准确度较高的深度学习模型。图3是本申请实施例提供的深度学习模型的一种结构示意图。参照图3所示,该深度学习模型可以包括:输入层、双向LSTM(long short term memory,长短期记忆网络)层、CRF层。The foregoing deep learning model may be any existing deep learning model, which is not limited in this embodiment of the present application. Through multiple experiments in the embodiments of the present application, it is obtained that a deep learning model with high recognition accuracy is obtained. Fig. 3 is a schematic structural diagram of a deep learning model provided by an embodiment of the present application. Referring to FIG. 3 , the deep learning model may include: an input layer, a bidirectional LSTM (long short term memory, long short term memory network) layer, and a CRF layer.
其中,输入层用于接收样本地址文本,双向LSTM层用于对样本地址文本进行处理得到向量,CRF层用于根据该向量预测样本地址文本中每个字符的预测类型。Among them, the input layer is used to receive the sample address text, the bidirectional LSTM layer is used to process the sample address text to obtain a vector, and the CRF layer is used to predict the prediction type of each character in the sample address text according to the vector.
当然,在应用时,输入层用于接收借款方地址文本,双向LSTM层用于对借款方地址文本进行处理得到向量,CRF(conditional random field,条件随机场)层用于根据该向量预测借款方地址文本中每个字符的类型,从而可以根据该类型从借款方地址文本中提取到各个层级的行政区域。Of course, in application, the input layer is used to receive the borrower’s address text, the bidirectional LSTM layer is used to process the borrower’s address text to obtain a vector, and the CRF (conditional random field) layer is used to predict the borrower’s address text based on the vector The type of each character in the address text, so that the administrative regions of each level can be extracted from the borrower address text according to the type.
上述每个字符的预测类型与样本地址文本中的标注的其中一个样本类型对应,同理,每个字符的类型与样本地址文本中的标注的其中一个样本类型对应。但是同一个字符对应的预测类型和样本类型可能相同也可能不同。The predicted type of each character above corresponds to one of the sample types marked in the sample address text, and similarly, the type of each character corresponds to one of the sample types marked in the sample address text. But the prediction type and sample type corresponding to the same character may be the same or different.
通过上述训练样本对深度学习模型进行训练的过程可以包括多轮迭代。在每一轮迭代中,可以将一组训练样本输入到深度学习模型中得到每个训练样本中每个字符的预测类型,然后,将这一组训练样本的每个字符的预测类型和每个字符的样本类型,输入到损失函数中得到损失值;最后,确定损失值是否满足收敛条件,在损失值满足收敛条件时,结束训练,在损失值不满足收敛条件时,根据损失值对深度学习模型的参数进行调整,以进行以下一轮训练。The process of training the deep learning model through the above training samples may include multiple rounds of iterations. In each round of iteration, a set of training samples can be input into the deep learning model to obtain the predicted type of each character in each training sample, and then, the predicted type of each character in this set of training samples and each The sample type of the character is input into the loss function to obtain the loss value; finally, it is determined whether the loss value meets the convergence condition. When the loss value meets the convergence condition, the training ends. When the loss value does not meet the convergence condition, the deep learning The parameters of the model are adjusted for the next round of training.
其中,上述损失函数可以采用现有技术常用的损失函数,例如,交叉熵损失函数、绝对值损失函数,平方和损失函数。Wherein, the loss function mentioned above may adopt a loss function commonly used in the prior art, for example, a cross-entropy loss function, an absolute value loss function, and a square sum loss function.
上述损失函数满足收敛条件可以包括但不限于:损失值小于或等于预设损失值阈值、多轮迭代之后的损失值不再减小。Satisfying the convergence condition of the above loss function may include but is not limited to: the loss value is less than or equal to a preset loss value threshold, and the loss value does not decrease after multiple rounds of iterations.
可以看出,在理想情况下,如果借款方输入的借款方地址文本中包括每个层级的行政区域,那么从借款方地址文本中识别得到的行政区域包括了所有层级。It can be seen that, ideally, if the borrower address text entered by the borrower includes the administrative area of each level, then the administrative area identified from the borrower address text includes all levels.
但是,在实际应用中,借款方输入的借款方地址文本可能缺失部分行政区域的信息,从而从借款方地址文本中识别得到的行政区域缺失部分层级。这样,会导致根据该借款方地址进行的后续处理不准确。为了提高后续处理的准确度,需要对借款方地址的行政区域进行补全。However, in practical applications, the borrower's address text entered by the borrower may lack information on some administrative regions, so that the administrative region identified from the borrower's address text lacks some levels. In this way, subsequent processing based on the address of the borrower will be inaccurate. In order to improve the accuracy of subsequent processing, it is necessary to complete the administrative area of the borrower's address.
具体地,若至少一个层级的行政区域中缺失部分层级的行政区域,则根据预设行政区域树确定缺失的行政区域;和/或,将借款方地址文本输入至第三方接口,得到缺失的行政区域,该预设行政区域树用于表示行政区域之间的层级关系。Specifically, if some levels of administrative regions are missing in at least one level of administrative regions, the missing administrative regions are determined according to the preset administrative region tree; and/or, the borrower’s address text is input into the third-party interface to obtain the missing administrative regions Region, the preset administrative region tree is used to represent the hierarchical relationship between administrative regions.
其中,预设行政区域树中是一个树状结构,其中包括了所有行政区域之间的层级关系。行政区域树中的节点之间构成父子关系,子节点的行政区域属于唯一一个父节点的行政区域。从而,如果至少一个层级的行政区域中存在低层级的行政区域,但缺失高层级的行政区域,那么可以根据该低层级的行政区域所对应的节点,确定父节点,从而将该父节点对应的行政区域确定为高层级的行政区域。Wherein, the preset administrative region tree is a tree structure, which includes hierarchical relationships among all administrative regions. The nodes in the administrative area tree form a parent-child relationship, and the administrative area of the child node belongs to the only administrative area of the parent node. Therefore, if there is a low-level administrative area in at least one level of administrative area, but there is no high-level administrative area, then the parent node can be determined according to the node corresponding to the low-level administrative area, so that the parent node corresponds to The administrative area is determined as a high-level administrative area.
然而,如果上述预设区域树无法补全行政区域时,还可以调用第三方接口根据借款方地址文本中的细化地址确定缺失的行政区域。例如,可以根据“小区或街道”确定所归属的县。However, if the above preset area tree cannot complete the administrative area, a third-party interface can also be called to determine the missing administrative area based on the detailed address in the borrower's address text. For example, the county you belong to can be determined based on "community or street".
S102:根据上述贷款方地址和借款方地址对至少一个借款方进行聚类,得到至少一个借款方群。S102: Clustering at least one borrower according to the above lender address and borrower address to obtain at least one borrower group.
具体地,首先,将贷款方地址转换为经纬度坐标,以及将借款方地址转换为经纬度坐标,然后,按照贷款方的经纬度坐标和借款方的经纬度坐标对借款方进行聚类。Specifically, firstly, the address of the lender is converted into latitude and longitude coordinates, and the address of the borrower is converted into coordinates of latitude and longitude, and then, the borrowers are clustered according to the latitude and longitude coordinates of the lender and the latitude and longitude coordinates of the borrower.
可以理解的是,上述聚类可以采用现有的聚类算法,当现有的聚类算法仅能按照一个维度进行聚类时,多次调用该聚类算法进行聚类。例如,先调用聚类算法按照贷款方地址的经纬度坐标对至少一个借款方进行聚类,得到至少一个第一借款方群;针对每个第一借款方群,再调用聚类算法按照借款方地址的经纬度坐标对该第一借款方群中的各个借款方进行聚类,得到至少一个第二借款方群,各个第一借款方群的各个第二借款方群为S102所得到的至少一个借款方群。It can be understood that the above clustering may use an existing clustering algorithm, and when the existing clustering algorithm can only perform clustering according to one dimension, the clustering algorithm is called multiple times to perform clustering. For example, first call the clustering algorithm to cluster at least one borrower according to the latitude and longitude coordinates of the lender’s address to obtain at least one first borrower group; The latitude and longitude coordinates of each borrower in the first borrower group are clustered to obtain at least one second borrower group, and each second borrower group of each first borrower group is at least one borrower obtained in S102 group.
当上述借款方地址包括:借款方的居住地址、户口地址、工作地址时,上述调用聚类算法按照借款方地址的经纬度坐标对该第一借款方群中的各个借款方进行聚类,得到至少一个第二借款方群的过程,可以包括:首先,调用聚类算法按照居住地址的经纬度坐标对该第一借款方群中的各个借款方进行聚类得到至少一个第一子群;然后,调用聚类算法按照户口地址的经纬度坐标,对每个第一子群中的各个借款方进行聚类得到至少一个第二子群;最后,调用聚类算法按照工作地址的经纬度坐标,对每个第二子群中的各个借款方进行聚类得到至少一个第三子群。这样,每个第三子群为一个第二借款方群。When the address of the above-mentioned borrower includes: the residential address, account address, and work address of the borrower, the above-mentioned clustering algorithm is used to cluster the borrowers in the first borrower group according to the latitude and longitude coordinates of the borrower's address, and at least A process for the second group of borrowers may include: firstly, calling the clustering algorithm to cluster each borrower in the first group of borrowers according to the latitude and longitude coordinates of the residence address to obtain at least one first subgroup; then, calling The clustering algorithm clusters each borrower in each first subgroup according to the latitude and longitude coordinates of the account address to obtain at least one second subgroup; finally, the clustering algorithm is called to cluster each borrower in the first subgroup according to the latitude and longitude coordinates of the work address. Each borrower in the second subgroup is clustered to obtain at least one third subgroup. Thus, each third subgroup is a second group of borrowers.
当然,上述贷款方地址、借款方的居住地址、户口地址、工作地址之间的聚类顺序可以灵活调整,本申请实施例对其不加以限制。Of course, the above-mentioned clustering order among the lender's address, the borrower's residential address, household address, and work address can be flexibly adjusted, and this embodiment of the application does not limit it.
可以理解的是,经过S102的上述聚类,得到的每个借款方群中的各个借款方具有相同的借款方地址(称为借款方群方的借款方地址),以及具有相同的贷款方地址(称为借款方群方的贷款方地址)。It can be understood that after the above clustering in S102, each borrower in each borrower group obtained has the same borrower address (called the borrower address of the borrower group), and has the same lender address (The address of the lender known as the borrower group).
S103:针对每个借款方群,根据该借款方群对应的贷款方地址和借款方地址,从预设转移矩阵中获取该借款方群对应的异地贷款概率,该预设转移矩阵用于指示不同行政区域之间的异地贷款概率。S103: For each group of borrowers, according to the address of the lender and the address of the borrower corresponding to the group of borrowers, obtain the probability of inter-regional loans corresponding to the group of borrowers from the preset transfer matrix, the preset transfer matrix is used to indicate different Probability of off-site loans between administrative regions.
其中,预设转移矩阵的第m行第n列的取值可以为第m个行政区域至第n个行政区域的异地贷款概率。Wherein, the value of the mth row and the nth column of the preset transfer matrix may be the probability of off-site loans from the mth administrative region to the nth administrative region.
可以理解的是,异地贷款概率越大,代表第m个行政区域到第n个行政区域进行贷款的可能性越高,异地贷款概率通常是两个行政区域之间的属性确定的,该属性可以包括但不限于:距离、GPD(gross domestic product,国内生产总值)差距、未逾期的贷款占比。若距离越小,GDP差距越大,未逾期的贷款占比越大,则异地贷款概率越大。从而,越小异地贷款概率的借款方群越有可能存在异常贷款。It can be understood that the greater the probability of off-site loans, the higher the possibility of loans from the mth administrative region to the n-th administrative region. The probability of off-site loans is usually determined by the attribute between two administrative regions. This attribute can be Including but not limited to: distance, GPD (gross domestic product, gross domestic product) gap, and the proportion of loans that are not overdue. The smaller the distance, the larger the GDP gap, and the larger the proportion of unoverdue loans, the greater the probability of loans from other places. Therefore, the borrower group with the lower probability of off-site loans is more likely to have abnormal loans.
其中,距离可以是两个行政区域之间可行驶的路线长度,不是两个行政区域之间的直线距离。Wherein, the distance may be the length of a drivable route between two administrative regions, not the straight-line distance between the two administrative regions.
具体地,上述不同行政区域之间的异地贷款概率可以通过以下步骤生成:Specifically, the inter-regional loan probability between the above-mentioned different administrative regions can be generated through the following steps:
对借款方地址对应的第一行政区域和贷款方地址对应的第二行政区域之间的至少一种关联属性进行加权运算,得到第一行政区域到第二行政区域的异地贷款概率。该至少一种关联属性包括以下至少一项:第一行政区域与第二行政区域之间的距离、第二行政区域的生产总值和第一行政区域的生产总值之间的比值、借款方地址属于第一行政区域且贷款方地址属于第二行政区域的未逾期贷款在异地贷款中的占比。A weighted operation is performed on at least one correlation attribute between the first administrative region corresponding to the borrower's address and the second administrative region corresponding to the lender's address to obtain the probability of inter-regional loans from the first administrative region to the second administrative region. The at least one associated attribute includes at least one of the following: the distance between the first administrative region and the second administrative region, the ratio between the gross production value of the second administrative region and the gross production value of the first administrative region, the borrower The proportion of non-overdue loans whose address belongs to the first administrative region and whose lender's address belongs to the second administrative region among the non-local loans.
其中,异地贷款是指借款方地址和贷款方地址属于不同行政区域的贷款申请。Among them, the non-local loan refers to the loan application in which the address of the borrower and the address of the lender belong to different administrative regions.
上述占比可以是数量占比,也可以是贷款总额占比。The above ratio may be the ratio of quantity or the ratio of total loans.
可以理解的是,上述步骤可以周期性进行,例如,一年或一月进行一次,每次使用当前周期内所有行政区域的贷款信息。It can be understood that the above steps can be performed periodically, for example, once a year or once a month, and the loan information of all administrative regions in the current cycle is used each time.
上述转移矩阵中的行政区域可以是任一层级的所有行政区域。当然,如果层级越低,行政区域的数量越多,转移矩阵越大,准确度越高。从而可以根据县或区生成转移矩阵。The administrative regions in the above transfer matrix can be all administrative regions at any level. Of course, if the level is lower, the number of administrative regions is larger, the transfer matrix is larger, and the accuracy is higher. A transition matrix can thus be generated by county or district.
在得到上述转移矩阵之后,可以从该转移矩阵中获取到任意借款方群的异地贷款概率。具体地,首先,确定借款方群的借款方地址所对应的行政区域作为借款方行政区域,以及确定借款方群的贷款方地址所对应的行政区域作为贷款方行政区域;然后,从转移矩阵中获取行为借款方行政区域,且列为贷款方行政区域的异地贷款概率,作为该借款方群的异地贷款概率。After obtaining the above transfer matrix, the probability of off-site loans of any borrower group can be obtained from the transfer matrix. Specifically, first, determine the administrative area corresponding to the borrower address of the borrower group as the administrative area of the borrower, and determine the administrative area corresponding to the address of the lender of the borrower group as the administrative area of the lender; then, from the transfer matrix The administrative region of the borrower is obtained and the probability of non-local loans listed as the administrative region of the lender is used as the probability of non-regional loans of the group of borrowers.
S104:根据借款方群的异地贷款概率,从至少一个借款方群中确定存在异常申请的目标借款方群。S104: Determine a target borrower group that has an abnormal application from at least one borrower group according to the probability of borrowing in a different place of the borrower group.
其中,目标借款方群的确定策略可以包括多个。Wherein, the determination strategy of the target borrower group may include multiple strategies.
在第一种策略中,可以将异地贷款概率最小或较小的借款方群确定为目标借款方群。In the first strategy, the borrower group with the smallest or smaller probability of off-site loans can be determined as the target borrower group.
在第二种策略中,结合异地贷款概率和其余信息确定目标借款方群。具体地,根据借款方群的异地贷款概率和借款方群中已出现的异常贷款数据,从至少一个借款方群中确定存在异常申请的目标借款方群,该异常贷款数据包括以下至少一项:借款方群中各借款方已逾期的贷款总金额、借款方群中各借款方的贷款逾期总天数。In the second strategy, the target borrower group is determined by combining the off-site loan probability and other information. Specifically, according to the borrower group's off-site loan probability and the abnormal loan data that have appeared in the borrower group, determine the target borrower group that has an abnormal application from at least one borrower group, and the abnormal loan data includes at least one of the following: The total amount of overdue loans of each borrower in the borrower group, and the total number of days overdue for each borrower in the borrower group.
其中,异地贷款概率越小、且已逾期的贷款总金额越大、且贷款逾期总天数越大的借款方群为目标借款方群。Among them, the target borrower group is the group of borrowers whose probability of taking loans in other places is smaller, the total amount of overdue loans is larger, and the total number of days overdue of loans is larger.
在一种示例中,可以按异地贷款概率的倒数、已逾期的贷款总金额、贷款逾期总天数,对借款方群进行综合降序排序。从而可以将排序靠前的借款方群确定为目标借款方群。In one example, the group of borrowers can be sorted in descending order comprehensively according to the reciprocal of the probability of off-site loans, the total amount of overdue loans, and the total number of days overdue for loans. Therefore, the borrower group with the highest ranking can be determined as the target borrower group.
在另一种示例中,针对每个借款方群,确定该借款方群中已出现的异常贷款数据与该借款方群的异地贷款概率的比值;然后,根据该比值从至少一个借款方群中确定存在异常申请的目标借款方群。In another example, for each group of borrowers, determine the ratio of the abnormal loan data that has appeared in the group of borrowers to the probability of foreign loans of the group of borrowers; Identify target borrower groups with unusual applications.
具体地,当已出现的异地贷款数据为借款方群中各借款方已逾期的贷款总金额、借款方群中各借款方的贷款逾期总天数时,确定该借款方群中已出现的异常贷款数据与该借款方群的异地贷款概率的比值,可以包括:首先,确定借款方群中各借款方已逾期的贷款总金额和借款方群中各借款方的贷款逾期总天数的乘积;然后,确定该乘积与该借款方群的异地贷款概率的比值。Specifically, when the data of off-site loans that have occurred is the total amount of overdue loans of each borrower in the borrower group, and the total number of days overdue loans of each borrower in the borrower group, determine the abnormal loan that has occurred in the borrower group The ratio of the data to the probability of off-site loans for the group of borrowers may include: first, determining the product of the total amount of overdue loans of each borrower in the group of borrowers and the total number of days overdue for loans of each borrower in the group of borrowers; then, A ratio of the product to the off-site loan probability for the group of borrowers is determined.
在得到上述比值之后,可以将比值较大的一个或多个借款方群确定为目标借款方群。 After the above ratio is obtained, one or more borrower groups with a larger ratio can be determined as the target borrower group.
在得到目标借款方群之后,可以避免这些目标借款方群的贷款申请审核通过,还可以对已经审核通过的贷款申请的目标借款方群进行提示,包括但不限于:电话或短信提示。After obtaining the target borrower groups, it is possible to prevent the approval of the loan applications of these target borrower groups, and to remind the target borrower groups of the loan applications that have been approved, including but not limited to: telephone or SMS reminders.
对应于上文实施例的异常申请识别方法,图4是本申请实施例提供的异常申请识别装置的一种结构框图。为了便于说明,仅示出了与本申请实施例相关的部分。参照图4,上述异常申请识别装置200包括:申请地址信息获取模块201、地址聚类模块202、异地贷款概率获取模块203和异常识别模块204。Corresponding to the method for identifying abnormal applications in the above embodiments, FIG. 4 is a structural block diagram of an apparatus for identifying abnormal applications provided in the embodiments of the present application. For ease of description, only the parts related to the embodiment of the present application are shown. Referring to FIG. 4 , the abnormal application identification device 200 includes: an application address information acquisition module 201 , an address clustering module 202 , a remote loan probability acquisition module 203 and an abnormal identification module 204 .
申请地址信息获取模块201,用于通过深度学习模型获取至少一个借款方的申请地址信息,所述申请地址信息中包括:贷款方地址和借款方地址,所述贷款方地址对应的行政区域和所述借款方地址对应的行政区域不同。The application address information acquisition module 201 is used to obtain the application address information of at least one borrower through a deep learning model, the application address information includes: the address of the lender and the address of the borrower, the administrative area corresponding to the address of the lender and the The administrative regions corresponding to the borrower's address are different.
地址聚类模块202,用于根据所述贷款方地址和所述借款方地址对所述至少一个借款方进行聚类,得到至少一个借款方群。The address clustering module 202 is configured to cluster the at least one borrower according to the address of the lender and the address of the borrower to obtain at least one group of borrowers.
异地贷款概率获取模块203,用于针对每个所述借款方群,根据所述借款方群对应的贷款方地址和借款方地址,从预设转移矩阵中获取所述借款方群对应的异地贷款概率,所述预设转移矩阵用于指示不同行政区域之间的异地贷款概率。The off-site loan probability acquisition module 203 is configured to acquire the off-site loan corresponding to the borrower group from the preset transfer matrix according to the lender address and the borrower address corresponding to the borrower group for each borrower group Probability, the preset transfer matrix is used to indicate the probability of inter-regional loans between different administrative regions.
异常识别模块204,用于根据所述借款方群的异地贷款概率,从至少一个借款方群中确定存在异常申请的目标借款方群。The abnormal identification module 204 is configured to determine a target borrower group with abnormal applications from at least one borrower group according to the probability of borrowing in other places of the borrower group.
可选地,所述异常识别模块204还用于:Optionally, the abnormality identification module 204 is also used for:
根据所述借款方群的异地贷款概率和所述借款方群中已出现的异常贷款数据,从至少一个借款方群中确定存在异常申请的目标借款方群,所述异常贷款数据包括以下至少一项:所述借款方群中各借款方已逾期的贷款总金额、所述借款方群中各借款方的贷款逾期总天数。According to the off-site loan probability of the borrower group and the abnormal loan data that have appeared in the borrower group, determine the target borrower group that has an abnormal application from at least one borrower group, and the abnormal loan data includes at least one of the following Item: the total overdue loan amount of each borrower in the borrower group, and the total number of days overdue loans of each borrower in the borrower group.
可选地,所述异常识别模块204还用于:Optionally, the abnormality identification module 204 is also used for:
在根据所述借款方群的异地贷款概率和所述借款方群中已出现的异常贷款数据,从至少一个借款方群中确定存在异常申请的目标借款方群时,确定所述借款方群中已出现的异常贷款数据与所述借款方群的异地贷款概率的比值;根据所述比值从至少一个借款方群中确定存在异常申请的目标借款方群。When determining a target borrower group with an abnormal application from at least one borrower group according to the probability of a foreign loan of the borrower group and the abnormal loan data that has appeared in the borrower group, determine that in the borrower group The ratio of the abnormal loan data that has appeared to the probability of foreign loans of the borrower group; according to the ratio, determine the target borrower group that has an abnormal application from at least one borrower group.
可选地,所述不同行政区域之间的异地贷款概率是通过以下模块生成的:Optionally, the inter-regional loan probability between different administrative regions is generated through the following modules:
第一贷款概率生成模块,用于对借款方地址对应的第一行政区域和贷款方地址对应的第二行政区域之间的至少一种关联属性进行加权运算得到所述异地贷款概率,所述至少一种关联属性包括以下至少一项:所述第一行政区域与所述第二行政区域之间的距离、所述第二行政区域的生产总值和所述第一行政区域的生产总值之间的比值、借款方地址属于所述第一行政区域且贷款方地址属于所述第二行政区域的未逾期贷款在异地贷款中的占比。The first loan probability generation module is used to perform weighted operation on at least one correlation attribute between the first administrative region corresponding to the borrower's address and the second administrative region corresponding to the lender's address to obtain the off-site loan probability, the at least An associated attribute includes at least one of the following: the distance between the first administrative area and the second administrative area, the total value of production of the second administrative area and the total value of production of the first administrative area The ratio between the ratios, the ratio of non-overdue loans whose address of the borrower belongs to the first administrative region and the address of the lender belongs to the second administrative region in the proportion of non-local loans.
可选地,所述借款方地址中包括至少一个层级的行政区域,所述申请地址信息获取模块201还用于:Optionally, the borrower's address includes at least one level of administrative regions, and the application address information acquisition module 201 is also used to:
将所述借款方的借款方地址文本输入到深度学习模型中,得到所述至少一个层级的行政区域,所述深度学习模型是通过预设训练样本训练得到的,所述训练样本包括以下至少一项:样本地址文本,所述样本地址文本中的每个字符对应有所述字符的样本类型,所述样本类型为以下一项:一个层级的行政区域的开始字符、一个层级的行政区域的结束字符、其余字符。Inputting the borrower's address text of the borrower into the deep learning model to obtain the at least one level of administrative region, the deep learning model is obtained through training with preset training samples, and the training samples include at least one of the following Item: sample address text, each character in the sample address text corresponds to the sample type of the character, and the sample type is one of the following: the start character of a level of administrative area, the end of a level of administrative area characters, other characters.
可选地,所述深度学习模型包括:输入层、双向LSTM层、CRF层,在训练过程中,所述输入层用于接收所述样本地址文本,所述双向LSTM层用于对所述样本地址文本进行处理得到向量,所述CRF层用于根据所述向量预测所述样本地址文本中每个字符的预测类型,所述样本类型和所述预测类型用于确定损失值,在所述损失值满足收敛条件时,结束训练,在所述损失值不满足收敛条件时,根据所述损失值对所述深度学习模型的参数进行调整,以进行以下一轮训练。Optionally, the deep learning model includes: an input layer, a bidirectional LSTM layer, and a CRF layer. During training, the input layer is used to receive the sample address text, and the bidirectional LSTM layer is used to process the sample address text. The address text is processed to obtain a vector, and the CRF layer is used to predict the prediction type of each character in the sample address text according to the vector, and the sample type and the prediction type are used to determine a loss value. In the loss When the value satisfies the convergence condition, the training ends, and when the loss value does not satisfy the convergence condition, the parameters of the deep learning model are adjusted according to the loss value to perform the next round of training.
可选地,所述装置还包括:Optionally, the device also includes:
行政区域补全模块,用于在将所述借款方的借款方地址文本输入到深度学习模型中,得到所述至少一个层级的行政区域之后,若所述至少一个层级的行政区域中缺失部分层级的行政区域,则根据预设行政区域树确定缺失的行政区域,所述预设行政区域树用于表示行政区域之间的层级关系;和/或,若所述至少一个层级的行政区域中缺失部分层级的行政区域,则将所述借款方地址文本输入至第三方接口,得到缺失的行政区域。The administrative area completion module is used to input the borrower address text of the borrower into the deep learning model to obtain the administrative area of the at least one level, if some levels are missing in the administrative area of the at least one level administrative regions, then determine the missing administrative regions according to the preset administrative region tree, which is used to represent the hierarchical relationship between administrative regions; and/or, if the administrative regions of at least one level are missing For administrative regions at some levels, input the text of the address of the borrower into the third-party interface to obtain the missing administrative regions.
图5是本申请实施例提供的电子设备的一种结构框图。该电子设备600包括存储器602和至少一个处理器601。Fig. 5 is a structural block diagram of an electronic device provided by an embodiment of the present application. The electronic device 600 includes a memory 602 and at least one processor 601 .
其中,存储器602存储计算机执行指令。至少一个处理器601执行存储器602存储的计算机执行指令,使得电子设备601实现图2中的方法。Among them, the memory 602 stores computer-executable instructions. At least one processor 601 executes the computer-executed instructions stored in the memory 602, so that the electronic device 601 implements the method in FIG. 2 .
此外,该电子设备还可以包括接收器603和发送器604,接收器603用于接收从其余装置或设备的信息,并转发给处理器601,发送器604用于将信息发送到其余装置或设备。In addition, the electronic device may also include a receiver 603 and a transmitter 604, the receiver 603 is used to receive information from other devices or devices, and forwards it to the processor 601, and the transmitter 604 is used to send information to other devices or devices .
本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,使计算设备实现前述图2所述的方法。The embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores computer-executable instructions, and when the processor executes the computer-executable instructions, the computing device implements the method described in FIG. 2 .
本申请实施例还提供一种计算机程序,所述计算机程序用于实现前述图2所述的方法。An embodiment of the present application further provides a computer program, the computer program is used to implement the method described in FIG. 2 above.
最后应说明的是:以上各实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述各实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, rather than limiting them; although the application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: It is still possible to modify the technical solutions described in the foregoing embodiments, or perform equivalent replacements for some or all of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the various embodiments of the present application. scope.
为了方便解释,已经结合具体的实施方式进行了上述说明。但是,上述示例性的讨论不是意图穷尽或者将实施方式限定到上述公开的具体形式。根据上述的教导,可以得到多种修改和变形。上述实施方式的选择和描述是为了更好的解释原理以及实际的应用,从而使得本领域技术人员更好的使用所述实施方式以及适于具体使用考虑的各种不同的变形的实施方式。For convenience of explanation, the above description has been made in conjunction with specific implementation manners. However, the above exemplary discussion is not intended to be exhaustive or to limit the implementations to the precise forms disclosed above. Many modifications and variations are possible in light of the above teachings. The selection and description of the above embodiments are to better explain the principles and practical applications, so that those skilled in the art can better use the embodiments and various modified embodiments suitable for specific use considerations.

Claims (10)

  1. 一种异常申请识别方法,其特征在于,所述方法包括:A method for identifying abnormal applications, characterized in that the method includes:
    通过深度学习模型获取至少一个借款方的申请地址信息,所述申请地址信息中包括:贷款方地址和借款方地址,所述贷款方地址对应的行政区域和所述借款方地址对应的行政区域不同;Obtain the application address information of at least one borrower through a deep learning model, the application address information includes: the address of the lender and the address of the borrower, and the administrative area corresponding to the address of the lender is different from the administrative area corresponding to the address of the borrower ;
    根据所述贷款方地址和所述借款方地址对所述至少一个借款方进行聚类,得到至少一个借款方群;clustering the at least one borrower according to the address of the lender and the address of the borrower to obtain at least one group of borrowers;
    针对每个所述借款方群,根据所述借款方群对应的贷款方地址和借款方地址,从预设转移矩阵中获取所述借款方群对应的异地贷款概率,所述预设转移矩阵用于指示不同行政区域之间的异地贷款概率;For each of the borrower groups, according to the lender address and the borrower address corresponding to the borrower group, the probability of a foreign loan corresponding to the borrower group is obtained from a preset transfer matrix, and the preset transfer matrix is used Indicates the probability of inter-regional loans between different administrative regions;
    根据所述借款方群的异地贷款概率,从至少一个借款方群中确定存在异常申请的目标借款方群。A target borrower group with abnormal applications is determined from at least one borrower group according to the off-site loan probability of the borrower group.
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述借款方群的异地贷款概率,从至少一个借款方群中确定存在异常申请的目标借款方群,包括:The method according to claim 1, characterized in that, according to the off-site loan probability of the borrower group, determining the target borrower group with abnormal application from at least one borrower group includes:
    根据所述借款方群的异地贷款概率和所述借款方群中已出现的异常贷款数据,从至少一个借款方群中确定存在异常申请的目标借款方群,所述异常贷款数据包括以下至少一项:所述借款方群中各借款方已逾期的贷款总金额、所述借款方群中各借款方的贷款逾期总天数。According to the off-site loan probability of the borrower group and the abnormal loan data that have appeared in the borrower group, determine the target borrower group that has an abnormal application from at least one borrower group, and the abnormal loan data includes at least one of the following Item: the total overdue loan amount of each borrower in the borrower group, and the total number of days overdue loans of each borrower in the borrower group.
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述借款方群的异地贷款概率和所述借款方群中已出现的异常贷款数据,从至少一个借款方群中确定存在异常申请的目标借款方群,包括:The method according to claim 2, characterized in that, according to the foreign loan probability of the borrower group and the abnormal loan data that have appeared in the borrower group, it is determined from at least one borrower group that there is an abnormal application target borrower groups, including:
    确定所述借款方群中已出现的异常贷款数据与所述借款方群的异地贷款概率的比值;Determining the ratio of the abnormal loan data that has appeared in the borrower group to the probability of off-site loans of the borrower group;
    根据所述比值从至少一个借款方群中确定存在异常申请的目标借款方群。A target borrower group with abnormal applications is determined from at least one borrower group according to the ratio.
  4. 根据权利要求1至3任一项所述的方法,其特征在于,所述不同行政区域之间的异地贷款概率是通过以下步骤生成的:The method according to any one of claims 1 to 3, wherein the inter-regional loan probability between different administrative regions is generated through the following steps:
    对借款方地址对应的第一行政区域和贷款方地址对应的第二行政区域之间的至少一种关联属性进行加权运算得到所述异地贷款概率,所述至少一种关联属性包括以下至少一项:所述第一行政区域与所述第二行政区域之间的距离、所述第二行政区域的生产总值和所述第一行政区域的生产总值之间的比值、借款方地址属于所述第一行政区域且贷款方地址属于所述第二行政区域的未逾期贷款在异地贷款中的占比。Perform a weighted operation on at least one associated attribute between the first administrative region corresponding to the borrower's address and the second administrative region corresponding to the lender's address to obtain the probability of the off-site loan, and the at least one associated attribute includes at least one of the following : the distance between the first administrative region and the second administrative region, the ratio between the gross production value of the second administrative region and the gross production value of the first administrative region, the address of the borrower belongs to the The proportion of non-overdue loans in the above-mentioned first administrative region and the address of the lender belonging to the above-mentioned second administrative region among the non-local loans.
  5. 根据权利要求1至3任一项所述的方法,其特征在于,所述借款方地址中包括至少一个层级的行政区域,所述通过深度学习模型获取至少一个借款方的申请地址信息,包括:The method according to any one of claims 1 to 3, wherein the address of the borrower includes at least one level of administrative region, and the acquisition of the application address information of at least one borrower through a deep learning model includes:
    将所述借款方的借款方地址文本输入到深度学习模型中,得到所述至少一个层级的行政区域,所述深度学习模型是通过预设训练样本训练得到的,所述训练样本包括以下至少一项:样本地址文本,所述样本地址文本中的每个字符对应有所述字符的样本类型,所述样本类型为以下一项:一个层级的行政区域的开始字符、一个层级的行政区域的结束字符、其余字符。Inputting the borrower's address text of the borrower into the deep learning model to obtain the at least one level of administrative region, the deep learning model is obtained through training with preset training samples, and the training samples include at least one of the following Item: sample address text, each character in the sample address text corresponds to the sample type of the character, and the sample type is one of the following: the start character of a level of administrative area, the end of a level of administrative area characters, other characters.
  6. 根据权利要求5所述的方法,其特征在于,所述深度学习模型包括:输入层、双向LSTM层、CRF层,在训练过程中,所述输入层用于接收所述样本地址文本,所述双向LSTM层用于对所述样本地址文本进行处理得到向量,所述CRF层用于根据所述向量预测所述样本地址文本中每个字符的预测类型,所述样本类型和所述预测类型用于确定损失值,在所述损失值满足收敛条件时,结束训练,在所述损失值不满足收敛条件时,根据所述损失值对所述深度学习模型的参数进行调整,以进行以下一轮训练。The method according to claim 5, wherein the deep learning model comprises: an input layer, a bidirectional LSTM layer, and a CRF layer, and during training, the input layer is used to receive the sample address text, and the The bidirectional LSTM layer is used to process the sample address text to obtain a vector, and the CRF layer is used to predict the prediction type of each character in the sample address text according to the vector, and the sample type and the prediction type are used After determining the loss value, when the loss value satisfies the convergence condition, end the training, and when the loss value does not satisfy the convergence condition, adjust the parameters of the deep learning model according to the loss value to perform the next round train.
  7. 根据权利要求5所述的方法,其特征在于,所述将所述借款方的借款方地址文本输入到深度学习模型中,得到所述至少一个层级的行政区域之后,还包括:The method according to claim 5, characterized in that, after inputting the borrower address text of the borrower into the deep learning model and obtaining the at least one level of administrative region, further comprising:
    若所述至少一个层级的行政区域中缺失部分层级的行政区域,则根据预设行政区域树确定缺失的行政区域,所述预设行政区域树用于表示行政区域之间的层级关系;和/或,If some levels of administrative areas are missing in the at least one level of administrative areas, the missing administrative areas are determined according to a preset administrative area tree, and the preset administrative area tree is used to represent the hierarchical relationship between administrative areas; and/ or,
    若所述至少一个层级的行政区域中缺失部分层级的行政区域,则将所述借款方地址文本输入至第三方接口,得到缺失的行政区域。If some levels of administrative areas are missing in the at least one level of administrative areas, the text of the address of the borrower is input into the third-party interface to obtain the missing administrative areas.
  8. 一种异常申请识别装置,其特征在于,包括:An abnormal application identification device, characterized in that it includes:
    申请地址信息获取模块,用于通过深度学习模型获取至少一个借款方的申请地址信息,所述申请地址信息中包括:贷款方地址和借款方地址,所述贷款方地址对应的行政区域和所述借款方地址对应的行政区域不同;The application address information acquisition module is used to obtain the application address information of at least one borrower through a deep learning model, and the application address information includes: the address of the lender and the address of the borrower, the administrative area corresponding to the address of the lender and the The administrative region corresponding to the address of the borrower is different;
    地址聚类模块,用于根据所述贷款方地址和所述借款方地址对所述至少一个借款方进行聚类,得到至少一个借款方群;An address clustering module, configured to cluster the at least one borrower according to the address of the lender and the address of the borrower to obtain at least one group of borrowers;
    异地贷款概率获取模块,用于针对每个所述借款方群,根据所述借款方群对应的贷款方地址和借款方地址,从预设转移矩阵中获取所述借款方群对应的异地贷款概率,所述预设转移矩阵用于指示不同行政区域之间的异地贷款概率;The off-site loan probability acquisition module is used to obtain the off-site loan probability corresponding to the borrower group from the preset transfer matrix according to the lender address and the borrower address corresponding to the borrower group for each of the borrower groups , the preset transfer matrix is used to indicate the probability of inter-regional loans between different administrative regions;
    异常识别模块,用于根据所述借款方群的异地贷款概率,从至少一个借款方群中确定存在异常申请的目标借款方群。The abnormal identification module is used to determine the target borrower group with abnormal application from at least one borrower group according to the probability of borrowing in other places of the borrower group.
  9. 一种电子设备,其特征在于,所述电子设备包括:至少一个处理器和存储器;An electronic device, characterized in that the electronic device includes: at least one processor and a memory;
    所述存储器存储计算机执行指令;the memory stores computer-executable instructions;
    所述至少一个处理器执行所述存储器存储的计算机执行指令,使得所述电子设备实现如权利要求1至7任一项所述的方法。The at least one processor executes the computer-executed instructions stored in the memory, so that the electronic device implements the method according to any one of claims 1-7.
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,使计算设备实现如权利要求1至7任一项所述的方法。A computer-readable storage medium, characterized in that the computer-readable storage medium stores computer-executable instructions, and when the processor executes the computer-executable instructions, the computing device realizes any one of claims 1 to 7. the method described.
PCT/CN2022/100697 2021-12-27 2022-06-23 Abnormal application recognition method and device WO2023123929A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111609530.2A CN114282988A (en) 2021-12-27 2021-12-27 Abnormal application identification method and equipment
CN202111609530.2 2021-12-27

Publications (1)

Publication Number Publication Date
WO2023123929A1 true WO2023123929A1 (en) 2023-07-06

Family

ID=80875893

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/100697 WO2023123929A1 (en) 2021-12-27 2022-06-23 Abnormal application recognition method and device

Country Status (2)

Country Link
CN (1) CN114282988A (en)
WO (1) WO2023123929A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114282988A (en) * 2021-12-27 2022-04-05 深圳前海微众银行股份有限公司 Abnormal application identification method and equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160071208A1 (en) * 2012-07-03 2016-03-10 Lexisnexis Risk Solutions Fl Inc. Systems and Method for Improving Computation Efficiency in the Detection of Fraud Indicators for Loans with Multiple Applicants
CN107578331A (en) * 2017-09-19 2018-01-12 马上消费金融股份有限公司 The method and system of risk monitoring and control after a kind of loan
CN109711975A (en) * 2018-11-27 2019-05-03 深圳市买买提信息科技有限公司 A kind of debt-credit Risk Identification Method and device
CN111369342A (en) * 2020-03-05 2020-07-03 中国建设银行股份有限公司 Loan approval method, device, equipment and storage medium based on machine learning
US20210065160A1 (en) * 2019-08-30 2021-03-04 Comenity Llc Replacing a customer card payment with a one-time loan at a point of sale
CN114282988A (en) * 2021-12-27 2022-04-05 深圳前海微众银行股份有限公司 Abnormal application identification method and equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160071208A1 (en) * 2012-07-03 2016-03-10 Lexisnexis Risk Solutions Fl Inc. Systems and Method for Improving Computation Efficiency in the Detection of Fraud Indicators for Loans with Multiple Applicants
CN107578331A (en) * 2017-09-19 2018-01-12 马上消费金融股份有限公司 The method and system of risk monitoring and control after a kind of loan
CN109711975A (en) * 2018-11-27 2019-05-03 深圳市买买提信息科技有限公司 A kind of debt-credit Risk Identification Method and device
US20210065160A1 (en) * 2019-08-30 2021-03-04 Comenity Llc Replacing a customer card payment with a one-time loan at a point of sale
CN111369342A (en) * 2020-03-05 2020-07-03 中国建设银行股份有限公司 Loan approval method, device, equipment and storage medium based on machine learning
CN114282988A (en) * 2021-12-27 2022-04-05 深圳前海微众银行股份有限公司 Abnormal application identification method and equipment

Also Published As

Publication number Publication date
CN114282988A (en) 2022-04-05

Similar Documents

Publication Publication Date Title
CN110210538B (en) Household image multi-target identification method and device
CN110096634B (en) House property data vector alignment method based on particle swarm optimization
US20210263957A1 (en) Method and apparatus for dividing region, storage medium, and electronic device
JP2019512764A (en) Method and apparatus for identifying the type of user geographical location
CN111709244A (en) Deep learning method for identifying causal relationship of contradictory dispute events
WO2022134829A1 (en) Method and apparatus for identifying same user, and computer device and storage medium
WO2021098652A1 (en) Data processing method and device
CN112131261B (en) Community query method and device based on community network and computer equipment
CN110751416A (en) Method, device and equipment for predicting water consumption
Huang et al. Research on urban modern architectural art based on artificial intelligence and GIS image recognition system
WO2023123929A1 (en) Abnormal application recognition method and device
CN111126422B (en) Method, device, equipment and medium for establishing industry model and determining industry
CN113627977A (en) House value prediction method based on heteromorphic graph
CN115952770B (en) Data standardization processing method and device, electronic equipment and storage medium
CN108182496A (en) A kind of city internet opens data acquisition process analysis method
Guan et al. Understanding China’s urban functional patterns at the county scale by using time-series social media data
CN113886602B (en) Domain knowledge base entity identification method based on multi-granularity cognition
ABBAS A survey of research into artificial neural networks for crime prediction
CN115965466A (en) Sub-graph comparison-based Ethernet room account identity inference method and system
CN111835861B (en) Examination system data processing method and device, computer equipment and storage medium
Kauko Using the self-organising map to identify regularities across country-specific housing-market contexts
CN113221558A (en) Express delivery address error correction method and device, storage medium and electronic equipment
TW202201335A (en) Real-time statistical computing system of market value for custom search area’s surroundings
CN115907770B (en) Ethernet phishing fraud identification and early warning method based on time sequence feature fusion
CN111062000B (en) Criminal perpetrator criminal land identification method based on discrete selection model

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22913218

Country of ref document: EP

Kind code of ref document: A1