CN114154682A - Customer loan yield grade prediction method and system - Google Patents

Customer loan yield grade prediction method and system Download PDF

Info

Publication number
CN114154682A
CN114154682A CN202111327341.6A CN202111327341A CN114154682A CN 114154682 A CN114154682 A CN 114154682A CN 202111327341 A CN202111327341 A CN 202111327341A CN 114154682 A CN114154682 A CN 114154682A
Authority
CN
China
Prior art keywords
loan
client
customer
grade
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111327341.6A
Other languages
Chinese (zh)
Inventor
雷文烨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
Original Assignee
China Construction Bank Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp filed Critical China Construction Bank Corp
Priority to CN202111327341.6A priority Critical patent/CN114154682A/en
Publication of CN114154682A publication Critical patent/CN114154682A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Evolutionary Computation (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Accounting & Taxation (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Game Theory and Decision Science (AREA)
  • Technology Law (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The disclosure provides a customer loan yield grade prediction method and a system, and the method comprises the following steps: acquiring customer data of a historical paying customer; establishing a modeling sample based on the customer data of the historical loan clients, calculating the client loan profitability grade of the historical loan clients, and performing fitting training by using a machine learning algorithm based on the modeling sample and the client loan profitability grade of the historical loan clients to establish a client loan profitability grade prediction model; and acquiring the client data of the non-limit client, and predicting the client loan income rate grade of the non-limit client by using the client loan income rate grade prediction model according to the client data of the non-limit client. The method can predict the loan income rate grade of the client, has high prediction accuracy and ensures the loan income of the bank.

Description

Customer loan yield grade prediction method and system
Technical Field
The disclosure relates to the technical field of big data intelligent analysis, in particular to a customer loan yield grade prediction method and system.
Background
The non-limit client of the bank can apply for loan service to the bank, and after the bank approves, the client is given a credit line, and the client can use the loan at any time within the line range within a preset time period (for example, within one year), wherein for the bank, the larger the client's payment line is, the longer the payment time is, and the more loan income is generated.
However, in the related art, after the approval of the client for the loan is passed, the loan payment of the client is usually affected by a short payment period, a small payment amount, or no loan payment. Therefore, a method for predicting the client loan profit rate grade is needed to predict the approximate use condition of the client for the loan (i.e. the loan profit that the client can bring to the bank) before the client applies for the loan, so that the client can perform corresponding loan management based on the prediction result in the following period, thereby improving the loan profit of the bank.
Disclosure of Invention
The disclosure provides a customer loan income rate grade prediction method and a customer loan income rate grade prediction system, which are used for improving the loan income of banks.
An embodiment of a first aspect of the present disclosure provides a method for predicting a client loan profitability level, including:
acquiring customer data of a historical paying customer;
establishing a modeling sample based on the customer data of the historical loan clients, calculating the client loan profitability grade of the historical loan clients, and performing fitting training by using a machine learning algorithm based on the modeling sample and the client loan profitability grade of the historical loan clients to establish a client loan profitability grade prediction model;
and acquiring the client data of the non-limit client, and predicting the client loan income rate grade of the non-limit client by using the client loan income rate grade prediction model according to the client data of the non-limit client.
In a second aspect of the disclosure, a customer loan yield rate grade prediction system is provided, the system includes:
the acquisition module is used for acquiring the client data of the historical deposit client;
the establishing module is used for establishing a modeling sample based on the customer data of the historical loan clients, calculating the client loan profitability grade of the historical loan clients, and performing fitting training by using a machine learning algorithm based on the modeling sample and the client loan profitability grade of the historical loan clients to establish a client loan profitability grade prediction model;
and the prediction module is used for acquiring the client data of the non-limit client and predicting the client loan income rate grade of the non-limit client by using the client loan income rate grade prediction model according to the client data of the non-limit client.
A third embodiment of the present disclosure provides a computer storage medium, where the computer storage medium stores computer-executable instructions; the computer-executable instructions, when executed by a processor, enable the method as described above.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:
in summary, the customer loan profitability grade prediction method and system provided by the embodiment of the disclosure establish a modeling sample based on the customer data of the historical loan customers, calculate the customer loan profitability grade of the historical loan customers, then perform fitting training by using a machine learning algorithm based on the modeling sample and the customer loan profitability grade of the historical loan customers to establish a customer loan profitability grade prediction model, predict the customer loan profitability grade of the non-quota customer by using the customer loan profitability grade prediction model, and then use the customer loan profitability grade of the non-quota customer as a reference for the subsequent loan support situation of the customer by a bank to perform corresponding promotion activity management on the customer, thereby achieving the purposes of maintaining the customer group with high loan profitability and improving the loan utilization rate of the customer group with low loan profitability, the method is beneficial to realizing more efficient and fine operation by subsequent oriented execution of marketing strategies, and ensures the loan income of banks.
Additional aspects and advantages of the disclosure will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the disclosure.
Drawings
The foregoing and/or additional aspects and advantages of the present disclosure will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1a is a schematic flow chart diagram of a customer loan yield level prediction method according to one embodiment of the disclosure;
fig. 1b is a schematic flow chart of a client applying for loan transaction according to an embodiment of the disclosure;
fig. 2 is a schematic diagram of a customer loan yield level prediction system according to an embodiment of the disclosure.
Detailed Description
Reference will now be made in detail to the embodiments of the present disclosure, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the drawings are exemplary and intended to be illustrative of the present disclosure, and should not be construed as limiting the present disclosure.
The customer loan earnings rate grade prediction method and system of the disclosed embodiments are described below with reference to the accompanying drawings.
First embodiment
Fig. 1a is a schematic flow chart of a customer loan yield rate level prediction method according to an embodiment of the disclosure, and as shown in fig. 1a, the method may include:
step 101, obtaining customer data of a historical deposit customer.
Wherein, a client who has a historical deposit and is not overdue after the past deposit or is overdue within a preset time (for example, 30 days) can be determined as a historical deposit client.
And, the customer data may include various dimensional data of the historical paying customer, such as inline data, access third party credit data, various scene data, and the like. For the historical loan clients of the small micro enterprise type, the client data may be: basic information, business information, tax data, agency wage information, liability data, public water, enterprise credit and the like of the small and micro enterprise; for a history loan client of a personal client, the client data mainly comprises: personal credit, AUM (customer contribution) data, debit card data, credit card data, liability data, and the like.
The customer data may then be used to generate corresponding customer characteristics via characteristic engineering for subsequent modeling sample creation based on the customer characteristics.
102, establishing a modeling sample based on the customer data of the historical loan customer, calculating the customer loan profitability grade of the historical loan customer, and performing fitting training by using a machine learning algorithm based on the modeling sample and the customer loan profitability grade of the historical loan customer to establish a customer loan profitability grade prediction model.
The method for establishing the modeling sample based on the customer data of the historical loan customer specifically comprises the following steps:
step a, carrying out association, integration and cleaning processing on the customer data of the historical paying customer to obtain an effective data set.
The valid data set includes at least: data information of the business owner and data information of the business. The data information of the enterprise owner comprises: basic information, credit card information, loan information and property information of the enterprise owner; the data information of the enterprise owner comprises: basic information, transaction information, and liability information of the enterprise.
And table 1 is a schematic table of an effective data set provided by the embodiments of the present disclosure.
TABLE 1
Figure BDA0003347671270000041
As shown in table 1, the basic information of the business owner may include personal information of the customer (such as name, age, academic calendar, etc.) and in-line basic information of the business owner (such as account opening line, account number of account opening line, etc.) of the business owner; the credit card information of the enterprise owner may include credit card account information of the enterprise owner, credit card usage, credit card payment, and credit card overdue.
And b, according to the effective data set, obtaining derived characteristics suitable for the customer loan income rate grade prediction model through characteristic derivation.
Specifically, the derived features may be formed by performing transparent transmission processing, statistical aggregation processing, and feature intersection processing on the effective data set to perform feature derivation.
The transparent transmission processing may include: for data which belong to numerical variables in the effective data set and correspond to only one data value (such as data of age, academic calendar and the like), directly carrying out transparent transmission to serve as derivative characteristics; and performing type combination on data (such as data of student majors, company types, professions and the like) belonging to the classified variables in the effective data set, and then transmitting the data into the derivative features.
The statistical aggregation process described above may include: for the running data or detail type data in the effective data set, dividing different time windows and calculating statistical variables in each window to derive new characteristics; for example, for pipelining data such as loan specifications, transaction pipelining, development payroll, etc., where each client may have multiple pipelining records that occur at different times, the pipelining data may be divided into different time windows by time, and statistical variables may be calculated within each time window (e.g., a sum, mean, quantile, minimum, maximum, standard deviation may be calculated for the pipelining data within each time window). And for the data belonging to the typing variables in the effective data set, calculating the occurrence times and types of the types corresponding to the data of the typing variables to derive new characteristics.
The feature intersection processing includes: and carrying out multi-dimensional crossing on different types of data in the effective data set to derive new features. Specifically, new features can be derived by combining data of categorical variables and numerical variables and statistically aggregating numerical variables of different categories. For example, derived characteristics such as car loan balance, house loan balance, consumption loan balance, etc. may be derived by combining the loan type and the loan balance.
Therefore, the data in the effective data set can be further enriched through the characteristic derivation processing, and the accuracy of the established client loan income rate grade prediction model can be ensured when the client loan income rate grade prediction model is established based on the data.
And c, carrying out feature screening on the derived features.
Specifically, the screened features are obtained by performing preset screening processing, correlation screening processing, missing value screening processing, single value rate screening processing and excessive value condition screening processing on the derived features.
Wherein, the preset screening process may include: analyzing the prediction ability of each derived feature on the client loan profitability grade by using IV (Information Value), and eliminating the features with the prediction ability lower than a first threshold Value, wherein the prediction ability is used for expressing the influence degree of the derived feature on the client loan profitability grade, and the higher the IV of the derived feature is, the stronger the prediction ability is, the higher the influence degree of the derived feature on the client loan profitability grade is. And, the first threshold may be preset, for example, may be 0.05.
Through the preset screening processing, the screened derived features can be ensured to be the features with higher influence degree on the client loan profitability grade, so that the accuracy of a subsequently established client loan profitability grade prediction model can be further ensured.
The above-mentioned relevance screening may include: and calculating the correlation among the various derivative features, and reserving the derivative feature with higher correlation with the client loan income rate grade in the two derivative features with the correlation larger than the second threshold value. Wherein the second threshold may be 0.9.
When the correlation between the two derived features is greater than the second threshold, it indicates that the repetition degree of the two derived features is high, and in order to ensure the reduction degree of the data, one of the derived features may be deleted. And in the disclosure, by reserving the derivative feature with higher correlation with the client loan profitability grade from the two derivative features with the correlation larger than the second threshold value and rejecting the derivative feature with lower correlation with the client loan profitability grade, the accuracy of the subsequently established client loan profitability grade model can be ensured not to be influenced after the derivative feature is rejected.
For example, assuming that the correlation coefficients of two derivative variables, namely the "number of taxes month for approximately 5 months" in line and the "number of taxes month for approximately 3 months" in line, are 0.97, and the correlation coefficients of the "number of taxes month for approximately 5 months" in line and the "number of taxes month for approximately 3 months" in line and the client loan rate of return level are 0.226 and 0.221, respectively, the "number of taxes month for approximately 5 months in line" is retained, and the "number of taxes month for approximately 3 months in line" is rejected.
The deletion value screening treatment comprises the following steps: and calculating the data missing rate of each derivative feature, and removing the derivative features with the data missing rate larger than a third threshold value. Wherein the third threshold may be 80%. For example, assuming that the default value of the derivative variable of "the number of enterprise-initiated payrolls" is 90%, the number of enterprise-initiated payrolls "can be eliminated.
The single value rate screening treatment comprises the following steps: and calculating the value condition of the discrete derivative features, and rejecting the discrete derivative features with unique values. For example, the derived feature of 'national region code of business owner' has unique value: for China, the modeling is useless, and the 'nationality area code of the enterprise owner' is removed.
The screening treatment of the excessive value taking condition comprises the following steps: and calculating the value conditions of the discrete derivative variable characteristics, and eliminating the discrete derivative characteristics of which the value number exceeds a fourth threshold value. Wherein the fourth threshold may be 50. In the example, the derived variable of the industry subclass has very many values, and has no great effect on modeling, the industry subclass can be removed, and only the industry subclass is used.
Therefore, useless and repeated data in the derivative features can be removed through the feature screening processing on the basis of not influencing the modeling accuracy of the subsequent customer loan profitability grade prediction model, so that the derivative features are more simplified, and the subsequent processing efficiency of establishing the customer loan profitability grade prediction model based on the derivative features is improved.
And d, preprocessing the screened features, and taking the preprocessed features as modeling samples.
Specifically, the feature after the screening may be subjected to outlier processing and missing value padding processing to obtain the feature after the preprocessing.
The above-mentioned outlier processing may include: and determining the value of the derived feature beyond the service specified range as an abnormal value, and modifying the abnormal value. For example, the service provision owner's age should be 18 years old, and if the value corresponding to the feature of "the owner's age" is less than 18 years old, the value of "the owner's age" may be determined as an abnormal value, and the abnormal value may be modified to a value in the service provision range.
The missing value filling process described above may include: and filling missing values of the derivative characteristics belonging to the discrete variables by using default character strings, and filling missing values of the derivative characteristics belonging to the continuous variables by using specific character strings (-99999).
Then through the steps a-d, a modeling sample can be established based on the customer data of the historical loan clients, so that a client loan profitability grade prediction model can be established based on the modeling sample in the following process.
Further, the method for calculating the client loan profitability rating of the historical loan clients as described above may include:
and step 1, calculating the potential earning rate of the historical paying customer.
Wherein, the potential earning rate of the client is (client interest + penalty)/(credit line × loan term × interest rate);
step 2, determining the client loan profitability grade based on the client potential profitability;
when the potential earning rate of the client is less than or equal to a first preset value, determining that the client loan earning rate grade is high;
when the first threshold value is smaller than the potential earning rate of the client and smaller than or equal to the second preset value, determining the client loan earning rate grade as medium;
and when the potential income rate of the client is larger than a second preset value, determining that the loan income rate grade of the client is low.
The first preset value and the second preset value may be preset, for example, the first preset value may be 0.3, and the second preset value may be 0.8.
Further, after the modeling sample is built and the customer loan profitability grade of the historical loan customer is calculated and calculated, fitting training can be carried out by utilizing a machine learning algorithm based on the modeling sample and the customer loan profitability grade of the historical loan customer so as to build a customer loan profitability grade prediction model. In the embodiment of the present disclosure, the machine learning algorithm may specifically be a LightGBM algorithm.
The LigthGBM algorithm is a set algorithm of machine learning and is efficient in implementation of the GBDT algorithm, the GBDT algorithm is based on the main idea that iterative training is performed by using a weak classifier (decision tree) to obtain an optimal model, the LigthGBM algorithm adopts a negative gradient of a loss function as a residual error approximate value of a current decision tree to fit a new decision tree, and the model has the advantages of being good in training effect, not prone to overfitting, high in speed and the like.
In addition, in the embodiment of the disclosure, in the process of establishing the client loan profitability grade prediction model, a development set and a verification set can be established for the client loan profitability grade prediction model. The development set can be, for example, client data of all signed clients from 2018-09-01 to 2018-11-30 on the application date, and the verification set can be, for example, client data of all signed clients from 2018-12-01 to 2018-12-31 on the application date.
Therefore, in the embodiment of the disclosure, the big data and the machine learning algorithm are used for constructing the client loan profitability grade prediction model, so that the constructed client loan profitability grade prediction model has the characteristics of higher accuracy, higher training efficiency, capability of processing large-scale data and the like, and the accuracy, consistency, objectivity and timeliness of client loan profitability grade prediction can be realized when the client loan profitability grade prediction model is subsequently used for performing client loan profitability grade prediction.
And 103, acquiring the client data of the non-limit client, and predicting the client loan income rate grade of the non-limit client by using a client loan income rate grade prediction model according to the client data of the non-limit client.
Wherein the non-limit client may be a client applying for loan transaction with the bank. Fig. 1b is a schematic flow chart of a client applying for loan service according to an embodiment of the disclosure, and as shown in fig. 1b, the flow mainly includes:
1. the client applies for: the customer submits a loan application and authorizes the bank to query for relevant information,
2. and (3) admission regulation: the bank judges the client qualification according to the loan policy and the risk rule; if the customer proceeds to step 3 through the admission rules, the loan will be denied if the rules requirements are not met.
3. And (3) line judgment: for a customer passing the risk admission rules, the bank needs to calculate the credit limit that can be given to the customer, and when the limit is greater than 0, the customer is shown the loanable amount and prompted to proceed with the next operation.
4. Submitting and approving: the customer inputs information such as application amount, time limit and the like according to the requirement, and submits the information to the system for examination and approval.
5. Signing and opening an account: after approval, the client can use the loan at any time.
And the customer loan income rate grade prediction method is used as a loop in a loan application process and is used for judging the potential income of the customer after the customer submits examination and approval. If the potential income of the client is high, the possibility that the client completely applies the loan is more likely. If the potential income of the client is small, which indicates that the client may have a small credit demand or is not satisfied with the price of the loan product, etc., differential pricing can be tried to improve the client's willingness to loan.
The customer group with the customer loan income rate level of middle or low level can understand that the customer loan support will not be strong, and the interest rate price can be adjusted for the customer group to improve the customer loan support rate and realize the general finance.
In summary, in the customer loan profitability grade prediction method provided by the embodiment of the disclosure, a modeling sample is established based on the customer data of the historical loan customer, the customer loan profitability grade of the historical loan customer is calculated, then fitting training is performed by using a machine learning algorithm based on the modeling sample and the customer loan profitability grade of the historical loan customer to establish a customer loan profitability grade prediction model, the customer loan profitability grade of the non-quota customer can be predicted by using the customer loan profitability grade prediction model, then, the bank can use the customer loan profitability grade of the non-quota customer as a reference for the subsequent loan support situation of the customer, so as to perform corresponding promotion activity management on the customer, thereby achieving the purposes of maintaining a customer group with high loan profitability and improving the loan utilization rate of the customer group with low loan profitability, the method is beneficial to realizing more efficient and fine operation by subsequent oriented execution of marketing strategies, and ensures the loan income of banks.
In addition, in the embodiment of the disclosure, the modeling sample for establishing the client loan profitability grade prediction model contains the data information of each dimension of the client, the source is rich, and the data information of the client is used fully and flexibly in the process of establishing the client loan profitability grade prediction model, so that the prediction accuracy of the established client loan profitability grade prediction model is higher.
In addition, in the embodiment of the disclosure, the big data and the machine learning algorithm are utilized to construct the client loan profitability grade prediction model, so that the constructed client loan profitability grade prediction model has the characteristics of higher accuracy, higher training efficiency, capability of processing large-scale data and the like, and the accuracy, consistency, objectivity and timeliness of client loan profitability grade prediction can be realized when the client loan profitability grade prediction model is subsequently used for client loan profitability grade prediction.
Second embodiment
Fig. 2 is a schematic diagram of a customer loan yield level prediction system 200 according to an embodiment of the disclosure, which may include, as shown in fig. 2:
an obtaining module 201, configured to obtain client data of a historical deposit client;
the establishing module 202 is used for establishing a modeling sample based on the customer data of the historical loan clients, calculating the client loan profitability grade of the historical loan clients, and performing fitting training by using a machine learning algorithm based on the modeling sample and the client loan profitability grade of the historical loan clients to establish a client loan profitability grade prediction model;
the prediction module 203 is used for obtaining the client data of the non-limit client and predicting the client loan income rate grade of the non-limit client by using the client loan income rate grade prediction model according to the client data of the non-limit client.
In summary, in the customer loan income rate grade prediction system provided by the embodiment of the disclosure, a modeling sample is established based on the customer data of the historical loan customer, the customer loan income rate grade of the historical loan customer is calculated, then fitting training is performed by using a machine learning algorithm based on the modeling sample and the customer loan income rate grade of the historical loan customer, so as to establish a customer loan income rate grade prediction model, the customer loan income rate grade of the non-quota customer can be predicted by using the customer loan income rate grade prediction model, then, the bank can use the customer loan income rate grade of the non-quota customer as a reference for the subsequent loan supporting situation of the customer, so as to perform corresponding promotion activity management on the customer, thereby achieving the purposes of maintaining the customer group with high loan income rate and improving the loan utilization rate of the customer group with low loan income rate, the method is beneficial to realizing more efficient and fine operation by subsequent oriented execution of marketing strategies, and ensures the loan income of banks.
In addition, in the embodiment of the disclosure, the modeling sample for establishing the client loan profitability grade prediction model contains the data information of each dimension of the client, the source is rich, and the data information of the client is used fully and flexibly in the process of establishing the client loan profitability grade prediction model, so that the prediction accuracy of the established client loan profitability grade prediction model is higher.
In addition, in the embodiment of the disclosure, the big data and the machine learning algorithm are utilized to construct the client loan profitability grade prediction model, so that the constructed client loan profitability grade prediction model has the characteristics of higher accuracy, higher training efficiency, capability of processing large-scale data and the like, and the accuracy, consistency, objectivity and timeliness of client loan profitability grade prediction can be realized when the client loan profitability grade prediction model is subsequently used for client loan profitability grade prediction.
Optionally, the obtaining module is further configured to:
and acquiring the client data of the historical deposit which is not overdue after the historical deposit is expired or is overdue within the preset time.
Optionally, the establishing module is further configured to:
carrying out association, integration and cleaning processing on the customer data of the historical paying customer to obtain an effective data set;
according to the effective data set, deriving derived characteristics suitable for a customer loan yield grade prediction model through characteristic derivation;
carrying out feature screening on the derived features;
and preprocessing the screened features, and taking the preprocessed features as modeling samples.
Optionally, the valid data set at least includes: data information of business owners and data information of enterprises;
the data information of the enterprise owner comprises: basic information, credit card information, loan information and property information of the enterprise owner; the data information of the enterprise owner comprises: basic information, transaction information, and liability information of the enterprise.
Optionally, the establishing module is further configured to:
and carrying out transparent transmission processing, statistical aggregation processing and feature cross processing on the effective data set to carry out feature derivation so as to form the derived feature.
Optionally, the transparent transmission processing includes: directly carrying out transparent transmission on data which belong to numerical type variables in an effective data set and correspond to one data value only to serve as derivative characteristics; carrying out type combination on data belonging to the type-divided variables in the effective data set, and then carrying out transparent transmission to obtain derivative characteristics;
the statistical aggregation processing includes: for the running data or detail type data in the effective data set, dividing different time windows and calculating statistical variables in each window to derive new characteristics; for data belonging to a typing variable in an effective data set, calculating the occurrence times and types of the types corresponding to the data of the typing variable to derive new characteristics;
the feature intersection processing comprises: and carrying out multi-dimensional crossing on different types of data in the effective data set to derive new features.
Optionally, the establishing module is further configured to:
and performing preset screening treatment, correlation screening treatment, missing value screening treatment, single value rate screening treatment and excessive value screening treatment on the derivative characteristics to obtain screened characteristics.
Optionally, the preset screening process includes: analyzing the prediction ability of each derivative feature on the client loan profitability grade by using the information value IV, and eliminating the features with the prediction ability lower than a first threshold, wherein the prediction ability is used for expressing the influence degree of the derivative features on the client loan profitability grade;
the relevance screening comprises: calculating the correlation between each derivative characteristic and the client loan profitability grade, and reserving the derivative characteristics of which the correlation is greater than a second threshold value;
the deletion value screening process comprises the following steps: calculating the data loss rate of each derivative feature, and removing the derivative features with the data loss rate larger than a third threshold value;
the single value rate screening process comprises: calculating the value situation of the discrete derivative features, and eliminating the discrete derivative features with unique values;
the screening treatment of the over-value condition comprises the following steps: and calculating the value conditions of the discrete derivative variable characteristics, and eliminating the discrete derivative characteristics of which the value number exceeds a fourth threshold value.
Optionally, the establishing module is further configured to:
and carrying out abnormal value processing and missing value filling processing on the screened features to obtain the preprocessed features.
Optionally, the outlier processing comprises: determining the value of the derived feature beyond the service specified range as an abnormal value, and modifying the abnormal value;
the missing value padding process includes: missing values of derived features belonging to discrete variables are filled with default strings.
Optionally, the establishing module is further configured to:
calculating the potential client profitability of the historical paying client; the potential earning rate of the customer is (customer interest + penalty)/(credit line x loan term x interest rate);
determining a customer loan profitability level based on the customer potential profitability;
when the potential earning rate of the client is less than or equal to a first preset value, determining that the client loan earning rate grade is high;
when the first threshold value is smaller than the potential earning rate of the client and smaller than or equal to the second preset value, determining the client loan earning rate grade as medium;
and when the potential income rate of the client is larger than a second preset value, determining that the loan income rate grade of the client is low.
Optionally, the establishing module is further configured to:
and performing fitting training by using a LightGBM algorithm based on the modeling sample and the customer loan profitability grade of the historical loan customer to establish a customer loan profitability grade prediction model.
In order to implement the above embodiments, the present disclosure also provides a computer storage medium.
The computer storage medium provided by the embodiment of the disclosure stores an executable program; the executable program, when executed by a processor, is capable of implementing the method as shown in figure 1 a.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present disclosure. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present disclosure in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present disclosure.
Although embodiments of the present disclosure have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present disclosure, and that changes, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present disclosure.

Claims (25)

1. A method for predicting a client loan profitability level, the method comprising:
acquiring customer data of a historical paying customer;
establishing a modeling sample based on the customer data of the historical loan clients, calculating the client loan profitability grade of the historical loan clients, and performing fitting training by using a machine learning algorithm based on the modeling sample and the client loan profitability grade of the historical loan clients to establish a client loan profitability grade prediction model;
and acquiring the client data of the non-limit client, and predicting the client loan income rate grade of the non-limit client by using the client loan income rate grade prediction model according to the client data of the non-limit client.
2. The method of claim 1, wherein the obtaining customer data of the historical loan clients comprises:
and acquiring the client data of the historical deposit which is not overdue after the historical deposit is expired or is overdue within the preset time.
3. The customer loan profitability grade prediction method of claim 1, wherein the establishing of the modeling sample based on the customer data of the historical loan customer comprises:
carrying out association, integration and cleaning processing on the customer data of the historical paying customer to obtain an effective data set;
according to the effective data set, deriving derived characteristics suitable for a customer loan yield grade prediction model through characteristic derivation;
carrying out feature screening on the derived features;
and preprocessing the screened features, and taking the preprocessed features as modeling samples.
4. The method of claim 3, wherein the valid data set comprises at least: data information of business owners and data information of enterprises;
the data information of the enterprise owner comprises: basic information, credit card information, loan information and property information of the enterprise owner; the data information of the enterprise owner comprises: basic information, transaction information, and liability information of the enterprise.
5. The method of claim 3, wherein deriving the derived characteristics from the valid data set for the model of client loan profitability rating prediction by characteristic derivation comprises:
and carrying out transparent transmission processing, statistical aggregation processing and feature cross processing on the effective data set to carry out feature derivation so as to form the derived feature.
6. The client loan yield rate grade prediction method according to claim 5,
the transparent transmission processing comprises the following steps: directly carrying out transparent transmission on data which belong to numerical type variables in an effective data set and correspond to one data value only to serve as derivative characteristics; carrying out type combination on data belonging to the type-divided variables in the effective data set, and then carrying out transparent transmission to obtain derivative characteristics;
the statistical aggregation processing includes: for the running data or detail type data in the effective data set, dividing different time windows and calculating statistical variables in each window to derive new characteristics; for data belonging to a typing variable in an effective data set, calculating the occurrence times and types of the types corresponding to the data of the typing variable to derive new characteristics;
the feature intersection processing comprises: and carrying out multi-dimensional crossing on different types of data in the effective data set to derive new features.
7. The method of claim 3, wherein the feature filtering of the derived features comprises:
and performing preset screening treatment, correlation screening treatment, missing value screening treatment, single value rate screening treatment and excessive value screening treatment on the derivative characteristics to obtain screened characteristics.
8. The customer loan yield rate grade prediction method according to claim 7,
the preset screening treatment comprises the following steps: analyzing the prediction ability of each derivative feature on the client loan profitability grade by using the information value IV, and eliminating the features with the prediction ability lower than a first threshold, wherein the prediction ability is used for expressing the influence degree of the derivative features on the client loan profitability grade;
the relevance screening comprises: calculating the correlation among all the derivative features, and reserving the derivative features with higher correlation with the client loan income rate grade in the two derivative features with the correlation larger than a second threshold value;
the deletion value screening process comprises the following steps: calculating the data loss rate of each derivative feature, and removing the derivative features with the data loss rate larger than a third threshold value;
the single value rate screening process comprises: calculating the value situation of the discrete derivative features, and eliminating the discrete derivative features with unique values;
the screening treatment of the over-value condition comprises the following steps: and calculating the value conditions of the discrete derivative variable characteristics, and eliminating the discrete derivative characteristics of which the value number exceeds a fourth threshold value.
9. The method of claim 3, wherein preprocessing the filtered characteristics comprises:
and carrying out abnormal value processing and missing value filling processing on the screened features to obtain the preprocessed features.
10. The customer loan yield rate grade prediction method according to claim 9,
the outlier processing includes: determining the value of the derived feature beyond the service specified range as an abnormal value, and modifying the abnormal value;
the missing value padding process includes: and filling missing values of the derivative features belonging to the discrete variables with default character strings, and filling missing values of the derivative features belonging to the continuous variables with specific character strings.
11. The method of claim 1, wherein the calculating the customer loan profitability rating of the historical lending customer comprises:
calculating the potential client profitability of the historical paying client; the potential earning rate of the customer is (customer interest + penalty)/(credit line x loan term x interest rate);
determining a customer loan profitability level based on the customer potential profitability;
when the potential earning rate of the client is less than or equal to a first preset value, determining that the client loan earning rate grade is high;
when the first threshold value is smaller than the potential earning rate of the client and smaller than or equal to the second preset value, determining the client loan earning rate grade as medium;
and when the potential income rate of the client is larger than a second preset value, determining that the loan income rate grade of the client is low.
12. The method for predicting the client loan profitability grade according to claim 1, wherein the fitting training is performed by using a machine learning algorithm based on the modeling sample and the client loan profitability grade of the historical loan client to establish a client loan profitability grade prediction model, and the method comprises the following steps:
and performing fitting training by using a LightGBM algorithm based on the modeling sample and the customer loan profitability grade of the historical loan customer to establish a customer loan profitability grade prediction model.
13. A customer loan yield rate level prediction system, the system comprising:
the acquisition module is used for acquiring the client data of the historical deposit client;
the establishing module is used for establishing a modeling sample based on the customer data of the historical loan clients, calculating the client loan profitability grade of the historical loan clients, and performing fitting training by using a machine learning algorithm based on the modeling sample and the client loan profitability grade of the historical loan clients to establish a client loan profitability grade prediction model;
and the prediction module is used for acquiring the client data of the non-limit client and predicting the client loan income rate grade of the non-limit client by using the client loan income rate grade prediction model according to the client data of the non-limit client.
14. The customer loan yield rate grade prediction system of claim 13, wherein the obtaining module is further configured to:
and acquiring the client data of the historical deposit which is not overdue after the historical deposit is expired or is overdue within the preset time.
15. The customer loan yield rate grade prediction system of claim 13, the establishment module further to:
carrying out association, integration and cleaning processing on the customer data of the historical paying customer to obtain an effective data set;
according to the effective data set, deriving derived characteristics suitable for a customer loan yield grade prediction model through characteristic derivation;
carrying out feature screening on the derived features;
and preprocessing the screened features, and taking the preprocessed features as modeling samples.
16. The customer loan yield rate tier prediction system of claim 15, wherein the valid data set comprises at least: data information of business owners and data information of enterprises;
the data information of the enterprise owner comprises: basic information, credit card information, loan information and property information of the enterprise owner; the data information of the enterprise owner comprises: basic information, transaction information, and liability information of the enterprise.
17. The customer loan yield rate grade prediction system of claim 15, wherein the establishment module is further configured to:
and carrying out transparent transmission processing, statistical aggregation processing and feature cross processing on the effective data set to carry out feature derivation so as to form the derived feature.
18. The customer loan yield tier prediction system of claim 17,
the transparent transmission processing comprises the following steps: directly carrying out transparent transmission on data which belong to numerical type variables in an effective data set and correspond to one data value only to serve as derivative characteristics; carrying out type combination on data belonging to the type-divided variables in the effective data set, and then carrying out transparent transmission to obtain derivative characteristics;
the statistical aggregation processing includes: for the running data or detail type data in the effective data set, dividing different time windows and calculating statistical variables in each window to derive new characteristics; for data belonging to a typing variable in an effective data set, calculating the occurrence times and types of the types corresponding to the data of the typing variable to derive new characteristics;
the feature intersection processing comprises: and carrying out multi-dimensional crossing on different types of data in the effective data set to derive new features.
19. The customer loan yield rate grade prediction system of claim 15, wherein the establishment module is further configured to:
and performing preset screening treatment, correlation screening treatment, missing value screening treatment, single value rate screening treatment and excessive value screening treatment on the derivative characteristics to obtain screened characteristics.
20. The customer loan yield tier prediction system of claim 19,
the preset screening treatment comprises the following steps: analyzing the prediction ability of each derivative feature on the client loan profitability grade by using the information value IV, and eliminating the features with the prediction ability lower than a first threshold, wherein the prediction ability is used for expressing the influence degree of the derivative features on the client loan profitability grade;
the relevance screening comprises: calculating the correlation between each derivative characteristic and the client loan profitability grade, and reserving the derivative characteristics of which the correlation is greater than a second threshold value;
the deletion value screening process comprises the following steps: calculating the data loss rate of each derivative feature, and removing the derivative features with the data loss rate larger than a third threshold value;
the single value rate screening process comprises: calculating the value situation of the discrete derivative features, and eliminating the discrete derivative features with unique values;
the screening treatment of the over-value condition comprises the following steps: and calculating the value conditions of the discrete derivative variable characteristics, and eliminating the discrete derivative characteristics of which the value number exceeds a fourth threshold value.
21. The customer loan yield rate grade prediction system of claim 15, wherein the establishment module is further configured to:
and carrying out abnormal value processing and missing value filling processing on the screened features to obtain the preprocessed features.
22. The customer loan yield tier prediction system of claim 21,
the outlier processing includes: determining the value of the derived feature beyond the service specified range as an abnormal value, and modifying the abnormal value;
the missing value padding process includes: missing values of derived features belonging to discrete variables are filled with default strings.
23. The customer loan yield rate grade prediction system of claim 13, wherein the establishment module is further configured to:
calculating the potential client profitability of the historical paying client; the potential earning rate of the customer is (customer interest + penalty)/(credit line x loan term x interest rate);
determining a customer loan profitability level based on the customer potential profitability;
when the potential earning rate of the client is less than or equal to a first preset value, determining that the client loan earning rate grade is high;
when the first threshold value is smaller than the potential earning rate of the client and smaller than or equal to the second preset value, determining the client loan earning rate grade as medium;
and when the potential income rate of the client is larger than a second preset value, determining that the loan income rate grade of the client is low.
24. The customer loan yield rate grade prediction system of claim 13, wherein the establishment module is further configured to:
and performing fitting training by using a LightGBM algorithm based on the modeling sample and the customer loan profitability grade of the historical loan customer to establish a customer loan profitability grade prediction model.
25. A computer storage medium, wherein the computer storage medium stores computer-executable instructions; the computer-executable instructions, when executed by a processor, are capable of performing the method of any one of claims 1-12.
CN202111327341.6A 2021-11-10 2021-11-10 Customer loan yield grade prediction method and system Pending CN114154682A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111327341.6A CN114154682A (en) 2021-11-10 2021-11-10 Customer loan yield grade prediction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111327341.6A CN114154682A (en) 2021-11-10 2021-11-10 Customer loan yield grade prediction method and system

Publications (1)

Publication Number Publication Date
CN114154682A true CN114154682A (en) 2022-03-08

Family

ID=80459391

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111327341.6A Pending CN114154682A (en) 2021-11-10 2021-11-10 Customer loan yield grade prediction method and system

Country Status (1)

Country Link
CN (1) CN114154682A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115731023A (en) * 2022-11-23 2023-03-03 联洋国融(北京)科技有限公司 Method and system for predicting amount of cash flow for loan recovery

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112598500A (en) * 2020-12-21 2021-04-02 中国建设银行股份有限公司 Credit processing method and system for non-limit client
CN113095567A (en) * 2021-04-09 2021-07-09 重庆农村商业银行股份有限公司 Loan yield prediction method, device, equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112598500A (en) * 2020-12-21 2021-04-02 中国建设银行股份有限公司 Credit processing method and system for non-limit client
CN113095567A (en) * 2021-04-09 2021-07-09 重庆农村商业银行股份有限公司 Loan yield prediction method, device, equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115731023A (en) * 2022-11-23 2023-03-03 联洋国融(北京)科技有限公司 Method and system for predicting amount of cash flow for loan recovery

Similar Documents

Publication Publication Date Title
Agarwal et al. Distance and private information in lending
Koh et al. A two-step method to construct credit scoring models with data mining techniques
Nayak et al. Disentangling the dividend information in splits: A decomposition using conditional event-study methods
CN109977151A (en) A kind of data analysing method and system
Li et al. Conditional superior predictive ability
CN111476660B (en) Intelligent wind control system and method based on data analysis
KR20210127512A (en) System for small business credit loan based on artificial intelligence, and method thereof
CN112598500A (en) Credit processing method and system for non-limit client
Kumar et al. Optimal capital structure and investment with real options and endogenous debt costs
Peci et al. Small and medium enterprises facing institutional barriers in Kosovo
CN113609193A (en) Method and device for training prediction model for predicting customer transaction behavior
Sohag et al. Capital market deepening, Governor’s characteristics and Russian regional enterprises: A big data analysis
Naik Asset prices in dynamic production economies with time-varying risk
CN114154682A (en) Customer loan yield grade prediction method and system
Atadouanla Segning et al. Financial inclusion and income inequality in sub-Saharan Africa: taking socio-cultural particularities into account
Diamendia et al. Compliance risk management implementation in directorate general of taxation Republic of Indonesia
CN113674075A (en) Method for credit rating of enterprise debt financing tool
Feltenstein et al. “The Impact of Micro-simulation and CGE modeling on Tax Reform and Tax Advice in Developing Countries”: A Survey of Alternative Approaches and an Application to Pakistan
Obare et al. Analysis of individual loan defaults using logit under supervised machine learning approach
Visser et al. Customer comfort limit utilisation: Management tool informing credit limit-setting strategy decisions to improve profitability
CN114693428A (en) Data determination method and device, computer readable storage medium and electronic equipment
CN113724068A (en) Method for constructing debtor decision strategy based on knowledge graph and related equipment
Veres et al. The Concept of Using Artificial Intelligence Methods in Debt Financing of Business Entities.
Guo Research on the factors affecting the successful borrowing rate of P2P network lending in China—Taking the case of renrendai online lending as an example
Theuri et al. The impact of Artficial Intelligence and how it is shaping banking

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination