CN112785095A - Loan prediction method, loan prediction device, electronic device, and computer-readable storage medium - Google Patents

Loan prediction method, loan prediction device, electronic device, and computer-readable storage medium Download PDF

Info

Publication number
CN112785095A
CN112785095A CN202110273787.9A CN202110273787A CN112785095A CN 112785095 A CN112785095 A CN 112785095A CN 202110273787 A CN202110273787 A CN 202110273787A CN 112785095 A CN112785095 A CN 112785095A
Authority
CN
China
Prior art keywords
loan
user
sample
information
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110273787.9A
Other languages
Chinese (zh)
Inventor
徐英浩
尚朝
姚峥洁
陈树华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dingxiang Technology Co ltd
Original Assignee
Beijing Dingxiang Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dingxiang Technology Co ltd filed Critical Beijing Dingxiang Technology Co ltd
Priority to CN202110273787.9A priority Critical patent/CN112785095A/en
Publication of CN112785095A publication Critical patent/CN112785095A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Accounting & Taxation (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Software Systems (AREA)
  • Tourism & Hospitality (AREA)
  • Data Mining & Analysis (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Technology Law (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Game Theory and Decision Science (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The invention provides a loan prediction method, a loan prediction device, electronic equipment and a computer-readable storage medium, which relate to the technical field of computers, and the method comprises the steps of obtaining a user sample to be predicted; inputting a user sample to be predicted into a pre-trained loan prediction model; and predicting the user samples to be predicted through a pre-trained loan prediction model to obtain the probability of loan application of each user sample to be predicted so as to predict the loan tendency of the user. According to the invention, the prediction capability of the model is improved, and the accuracy of the prediction result is improved.

Description

Loan prediction method, loan prediction device, electronic device, and computer-readable storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a loan prediction method, apparatus, electronic device, and computer-readable storage medium.
Background
Commercial banking informatization is rapidly developing to generate a great deal of business data, intermediate data and unstructured data, with the rise of big data. To extract valuable information from these mass data, it is important to mine the commercial banks for potential loan clients that are needed. The simple statistical model of the traditional commercial bank cannot accurately excavate customers with potential loan demands from massive users, so that only recommendation strength can be increased, but the final recommendation effect is not good, and a great amount of manpower and material resources are wasted.
At present, when most banks carry out loan prediction, the adopted model is still established on the traditional rule model, however, the rule of the rule model is usually formulated by business experts, and different business experts have different domain insights, so that the rule has subjectivity, and the effect of the rule is easy to decay along with time, so that the accuracy of the rule model is usually low; and the rule model processes the text data too coarsely or does not process the text data, so that the information of the text cannot be utilized by the model.
Disclosure of Invention
The invention aims to provide a loan prediction method, a loan prediction device, electronic equipment and a computer-readable storage medium, which improve the accuracy of a prediction result by improving the prediction capability of a model.
In a first aspect, the present invention provides a loan prediction method, comprising: obtaining a user sample to be predicted; inputting a user sample to be predicted into a pre-trained loan prediction model; and predicting the user samples to be predicted through a pre-trained loan prediction model to obtain the probability of loan application of each user sample to be predicted so as to predict the loan tendency of the user.
In an alternative embodiment, the training step of the pre-trained loan prediction model comprises: collecting user loan information; the user loan information comprises static data and dynamic data; the static data comprises user personal information; the dynamic data comprises user operation data of a user logging in a mobile phone bank APP; carrying out data preprocessing and merging processing on the user loan information to obtain a feature set; and training the preselected loan prediction model based on the characteristic set until the model performance of the loan prediction model is verified to meet the preset index, and obtaining the pre-trained loan prediction model.
In an alternative embodiment, the user operation data includes at least login data and historical loan application data; the method further comprises the following steps: determining an observation point and an observation appearance period; determining a sample observation interval based on the observation point and the observation expression period; in a sample observation interval, determining loan sample information based on the login data and the historical loan application data; and if the user is determined to have applied for the loan in the sample observation interval, determining the loan sample information as a positive sample, and otherwise, determining the loan sample information as a negative sample.
In an optional embodiment, each data in the user loan information corresponds to a corresponding main key; the method comprises the following steps of carrying out data preprocessing and merging processing on user loan information to obtain a feature set, wherein the step comprises the following steps: performing data exploration operation on the user loan information; performing data cleaning operation on the user loan information after the data exploration operation so as to process dirty data, missing values and abnormal values; merging the user loan information after the data cleaning operation based on the corresponding main key of each data in the user loan information so as to obtain the merged user loan information; and performing feature construction on the combined user loan information to obtain a feature set.
In an optional embodiment, the step of performing feature construction on the combined user loan information to obtain a feature set includes: counting click operation information of a user after logging in a mobile phone bank APP; performing feature construction operation on the click operation information to obtain a feature set; wherein the feature construction operation comprises a feature derivation operation and a feature selection operation.
In an optional embodiment, the step of performing a feature construction operation on the click operation information to obtain a feature set includes: performing feature extraction on the click operation information based on a preselected natural language processing model to obtain word vector features obtained after derivation; the clicking operation information comprises a service module clicked after the user clicks the mobile phone bank APP and a corresponding clicking moment; and performing feature selection on the derived word vector features based on a preset feature information value threshold to obtain a derived feature set.
In an alternative embodiment, the pre-selected loan prediction model comprises an extreme gradient elevation model; training the preselected loan prediction model based on the feature set until the model performance of the loan prediction model is verified to meet the preset index, and the method comprises the following steps: dividing the feature set to obtain a training sample and a verification sample; fitting the training sample based on the extreme gradient lifting model to obtain a well-fitted extreme gradient lifting model; predicting the verification sample based on the fitted extreme gradient lifting model until the preset index accords with the preset model effect; wherein the preset index comprises an AUC index.
In a second aspect, the present invention provides a loan prediction apparatus, comprising: the sample acquisition module is used for acquiring a user sample to be predicted; the input module is used for inputting the user sample to be predicted into the pre-trained loan prediction model; and the model prediction module is used for predicting the user samples to be predicted through the pre-trained loan prediction model to obtain the probability of loan application of each user sample to be predicted so as to predict the loan tendency of the user.
In a third aspect, the present invention provides an electronic device comprising a processor and a memory; the memory stores computer-executable instructions executable by the processor to perform the steps of the loan prediction method of any of the preceding embodiments.
In a fourth aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, the computer program being executed by a processor to perform the steps of the loan prediction method according to any one of the preceding embodiments.
According to the loan prediction method, the loan prediction device, the electronic equipment and the computer readable storage medium, the user sample to be predicted is obtained firstly, and then the user sample to be predicted is input into the pre-trained loan prediction model, so that the user sample to be predicted is predicted through the pre-trained loan prediction model, and the probability of loan application of each user sample to be predicted is obtained, so that the loan tendency of a user is predicted. According to the method, the prediction capability of the model is improved by pre-training the loan prediction model, and the accuracy of the prediction result is improved by predicting the user sample to be tested through the pre-trained loan prediction model.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a schematic flow chart of a loan prediction method according to an embodiment of the invention;
FIG. 2 is a flow chart illustrating another loan prediction method according to an embodiment of the invention;
fig. 3 is a schematic structural diagram of a loan prediction apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
In the description of the present invention, it should be noted that the terms "first", "second", "third", and the like are used only for distinguishing the description, and are not intended to indicate or imply relative importance.
Some embodiments of the invention are described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.
Considering that the model adopted by most banks during loan prediction is still established in the traditional rule model at present, however, the rule of the rule model is usually formulated by business experts, and different business experts have different domain insights, so that the rule has subjectivity, and the effect of the rule is easy to decay with time, so that the accuracy of the rule model is usually low; and the rule model processes the text data too coarsely or does not process the text data, so that the information of the text cannot be utilized by the model. Based on this, the embodiment of the invention provides a loan prediction method, a loan prediction device, an electronic device and a computer-readable storage medium, which improve the prediction capability of a model and further improve the accuracy of a prediction result.
For convenience of understanding, a detailed description will be given to a loan prediction method provided in an embodiment of the invention, referring to a schematic flow chart of a loan prediction method shown in fig. 1, the method mainly includes the following steps S102 to S106:
and step S102, obtaining a user sample to be predicted.
In order to ensure that a bank can accurately determine recommendable target loan clients for loan transaction, when the bank needs to recommend loan transaction, a user sample to be predicted needs to be obtained first. In one embodiment, the user sample to be predicted may be a target customer group to be recommended for the bank loan service, where each user sample to be predicted may correspond to login data of a user login mobile banking Application (Application) APP, operation data (such as whether to click a relevant module of the loan passing function, click time, click frequency, and the like).
And step S104, inputting the user sample to be predicted into a pre-trained loan prediction model.
In one embodiment, the pre-trained loan prediction model may be an eXtreme Gradient Boosting (XGBoost) model, and the XGBoost model is a boosted tree model, and the model training aims to minimize an objective function, which is shown in the following formula (1):
Figure BDA0002973533780000061
wherein, giRepresenting the first order gradient of the ith sample, ft(xi) Representing the output value, h, of sample i on the t-th treeiDenotes the second order gradient of the ith sample, gamma denotes a penalty coefficient, T denotes the number of leaf nodes, lambda denotes a penalty coefficient, wjRepresenting the score of the leaf node.
And step S106, predicting the user samples to be predicted through the pre-trained loan prediction model to obtain the probability of loan application of each user sample to be predicted so as to predict the loan tendency of the user.
In one embodiment, the determined user sample to be predicted is input into a pre-trained loan prediction model, and the probability of whether the user applies for a loan or not can be predicted according to login data and operation data corresponding to the user sample to be predicted, so that the loan tendency of the user can be predicted.
According to the loan prediction method provided by the embodiment of the invention, the prediction capability of the model is improved by pre-training the loan prediction model, and the obtained user sample to be predicted is input into the pre-trained loan prediction model, so that the user sample to be predicted is predicted by the pre-trained loan prediction model, the probability of loan application of each user sample to be predicted is obtained, the loan tendency of the user is predicted conveniently, and the accuracy of the prediction result is improved.
The embodiment of the invention also provides another loan prediction method, which comprises the steps of firstly carrying out a model training process, and then carrying out prediction through a trained model to obtain a final prediction list, and the final prediction list is shown in figure 2. The method specifically comprises the steps of carrying out data acquisition, target definition and data exploration, data cleaning, feature engineering, model training and judging the relation between AUC and a threshold value, training to obtain a trained loan prediction model, and then carrying out model prediction through the trained loan prediction model to obtain a final list.
For the sake of understanding, the training step of the pre-trained loan prediction model provided in this embodiment is described in detail, and the training step mainly includes the following steps 1 to 3:
step 1, collecting user loan information; the user loan information comprises static data and dynamic data; static data includes user personal information such as data that may include the user's name, gender, occupation, home address, etc.; the dynamic data comprises user operation data of a user logging in a mobile banking APP, the user operation data at least comprises login data and historical loan application data, and in one embodiment, the user operation data can comprise login data of the mobile banking APP, credit card transaction data, financing transaction data, inline property data, historical loan application data and the like. In addition, after the data sources in the user loan information are collected, a main key in each data source can be established, so that various data of each user can be combined according to the main key in each data source.
Further, in order to determine whether the user applies for a loan within a certain time window after logging in the mobile phone bank, the determination can be made by determining loan sample information. In one embodiment, an observation point (also referred to as an observation point) and an observation period may be determined first, the observation point may be a specific time point, or may be a certain time period, and the selection of the time period may be set according to actual needs, such as the time period may be 2020.12.1-2020.12.30. The observation performance period is the length of a time window for observing the performance of the sample. And determining a sample observation interval based on the observation point and the observation presentation period, further determining loan sample information based on the login data and the historical loan application data in the sample observation interval, and determining the loan sample information as a positive sample if the user applies for a loan in the sample observation interval, otherwise, determining the loan sample information as a negative sample.
For ease of understanding, for example, it may be assumed that the observation point is T, such as 2020.12.1 days, and the presentation period window is 15 days long, then the user sample (i.e., loan sample information) that has logged in the mobile phone bank for T days and has not yet recorded a loan application before T days and has applied a loan within [ T, T +15] days is assigned a value of 1 as a positive sample, and the user sample (i.e., loan sample information) that has logged in the mobile phone bank for T days and has not yet recorded a loan application before T days and has not yet applied a loan within [ T, T +15] days is assigned a value of 0 as a negative sample.
And 2, carrying out data preprocessing and merging processing on the user loan information to obtain a feature set. In one embodiment, each data in the user loan information corresponds to a corresponding primary key. The step may specifically include steps 2.1 to 2.4:
and 2.1, performing data exploration operation on the user loan information. The main purpose of the Data exploration operation, namely, Data Exploratory Analysis (EDA), is to understand the general situation of the Data so as to perform a preprocessing operation on the Data more accurately. The general condition of the data may include, for example, a missing value condition, an abnormal value condition, an average value, a median, a maximum value, a minimum value, a distribution condition, etc. of each field.
And 2.2, performing data cleaning operation on the user loan information after the data exploration operation so as to process dirty data, missing values and abnormal values. The data cleaning operation is to process dirty data, missing values and abnormal values in the original data, for example, the processing method for the missing values is to delete variable columns with missing rates exceeding a given threshold, and for the missing rates less than the threshold, the missing samples can be used as predicted values, and random forests are used to predict the values for filling, or the missing values can be directly filled. The abnormal value is processed by filling the abnormal value as a state by using a special identifier, and the abnormal value can be directly removed.
And 2.3, merging the user loan information after the data cleaning operation based on the corresponding main key of each data in the user loan information so as to obtain the merged user loan information. The combined loan information of the user also comprises the personal information of the user and the data information related to the loan, and the combined loan information of the user is combined by the main key, and the various data information of each user is combined into one piece of information corresponding to the user.
And 2.4, performing feature construction on the combined user loan information to obtain a feature set. This step may further include step 2.4.1 and step 2.4.2:
and 2.4.1, counting click operation information of the user after logging in the mobile phone bank APP, wherein the click operation information comprises a service module clicked after the user clicks the mobile phone bank APP and a corresponding click time, the service module may or may not comprise a loan service module, and the click operation information is determined according to the actual click condition of the user.
Step 2.4.2, performing feature construction operation on the click operation information to obtain a feature set; the feature construction operation is also the feature engineering of the data, and the feature engineering mainly comprises a feature derivation operation and a feature selection operation. The derived features mainly include the following three features: basic statistical characteristics: the total number of clicks of the user, the number of clicks of the user in days, the average number of clicks of the user per day, the number of various behaviors of the user, the ratio of the number of various behaviors of the user to the total number, and the like; discrete characteristics: the click rate of the user in each week, the click rate of the user in each hour, etc.; third, time sequence correlation characteristics: the time interval of the user clicks, the maximum number of continuous clicks of the user, the number of days from the last click of the user to the current day, and the like. When the above features are acquired, further processing may be performed by the following steps 2.4.2.1 to 2.4.2.2:
and 2.4.2.1, extracting the characteristics of the click operation information based on a preselected natural language processing model to obtain the word vector characteristics obtained after derivation. Since the APP login data of the mobile banking is text data, a Natural Language Processing (NLP) technology needs to be introduced to extract features. Generally, for feature extraction of text data, a bag-of-word model such as TF-IDF (Term Frequency-Inverse Document Frequency) is a common weighting technique for information retrieval and data mining, where TF is Term Frequency (Term Frequency) and IDF is Inverse text Frequency index (Inverse Document Frequency), but since TF-IDF cannot consider the order of behaviors and simultaneously extracts feature dimensions and dictionary table size hooks of the bag-of-word model, and also faces sparsity (sparsity) problem (that is, most of features are values 0 and only a few values are non-0 elements), the pre-selected natural language processing model in this embodiment may be a word2vec (word to vector) model, and feature extraction is performed through word2 vec. During specific feature extraction, a window function is used for counting the login behavior context (context) of each user in a past period of time to obtain a behavior sequence, each text context is regarded as a word, and all operation behaviors of one user form the behavior sequence. Such as see table 1:
TABLE 1 contextual behavioral sequence construction schematic Table
USER ID TIME CONTEXTID Conversion to words
2 2020/12/1 23:26 338-115-119 Word1
2 2020/12/2 23:26 520-908-987 Word2
2 2020/12/3 23:26 338-115-119 Word1
2 2020/12/4 23:26 520-908-987 Word2
2 2020/12/5 23:26 338-115-119 Word1
2 2020/12/6 23:26 556-098-123 Word3
2 2020/12/7 23:26 520-908-987 Word2
2 2020/12/8 23:26 338-115-119 Word1
The above table illustrates an example: when the user2 logs in and clicks the loan query module through the mobile phone bank app at 2020/12/1/23:26, the CONTEXTD of the loan query module is 338-115-119, and the user continues to operate in the mobile phone bank at the later time, and sequentially clicks the modules 520-908-987, 338-115-119, 556-098-123, 520-908-987 and 338-115-119. Therefore, the user2 click sequence [ 338-. Observing the sequence, user2 clicked a total of 3 different module names, thus renaming 338 + 115 + 119 as word1, 520 + 908 + 987 as word2, 556 + 098 + 123 as word 3. The above-mentioned click behavior sequence after the Word vector conversion is performed on the Word2vec by the user2 becomes [ Word1, Word2, Word1, Word2, Word1, Word3, Word2, and Word1 ].
And 2.4.2.2, performing feature selection on the derived word vector features based on a preset feature information value threshold to obtain a derived feature set.
In one embodiment, the feature selection may be performed by first training a LightGBM model to obtain the importance score of each feature, and then filtering out the features with the importance score of 0. Then, the characteristic information value, i.e. the IV (information value) value, is calculated, and then the characteristics with the IV value smaller than the preset characteristic information value threshold (such as 0.02) are filtered. The IV value is calculated as shown in the following formula (2):
Figure BDA0002973533780000111
in which WOEiThe formula (3) is shown below:
Figure BDA0002973533780000112
the corresponding IV value is the sum of the IV values of each variable bin, as shown in equation (4) below:
Figure BDA0002973533780000113
wherein the content of the first and second substances,
pi1the proportion of the positive samples in the ith box to all the positive samples is shown;
pi0the proportion of the negative samples in the ith box to all the negative samples is shown;
#Bi: is the number of positive samples in the ith bin;
#Gi: is the negative number of samples in the ith bin;
#BTall positive samples;
#GTis the number of all negative samples.
So far, a final feature set is obtained through feature derivation and feature selection operations.
And 3, training the preselected loan prediction model based on the feature set until the model performance of the loan prediction model is verified to meet the preset index, and obtaining the pre-trained loan prediction model. In one embodiment, the pre-selected loan prediction model comprises an extreme gradient elevation model, and the predetermined metric comprises an AUC metric, i.e., the area under the ROC curve (AUC). When the extreme gradient lifting model is trained and verified, the feature set can be divided firstly to obtain a training sample and a verification sample, then the training sample is subjected to fitting operation based on the extreme gradient lifting model (namely, XGboost model) until the target function of the XGboost model is minimum to obtain a well-fitted XGboost model, then the verification sample is predicted based on the well-fitted XGboost model until the preset index accords with the preset model effect, namely the AUC index accords with the preset model effect, the preset model effect can be set according to actual requirements, such as in order to improve the prediction accuracy of the model, the preset model effect can be that the confidence rate of the prediction result of the model is as high as possible. In one embodiment, the model training is terminated if the AUC measure meets expectations, otherwise the model is retrained or the derivation of the features is performed again until the model training meets expectations.
Further, after a pre-trained loan prediction model is obtained, the user samples to be predicted are predicted, the probability of loan application of each sample in a specified time period (such as 15 days in the future, corresponding adjustment can be performed according to the observation and presentation period during model training) after the sample logs in a mobile phone bank APP is obtained, and then a plurality of samples with the highest probability can be selected according to the obtained probability to serve as a sample list of business actions required to be performed in the follow-up process.
The invention also provides a loan prediction device, which is shown in a schematic structural diagram of the loan prediction device shown in fig. 3 and comprises the following parts:
a sample obtaining module 302, configured to obtain a user sample to be predicted;
the input module 304 is used for inputting the user sample to be predicted into the pre-trained loan prediction model;
the model prediction module 306 is configured to predict user samples to be predicted through a pre-trained loan prediction model, and obtain a probability of applying a loan for each user sample to be predicted, so as to predict a loan tendency of a user.
The loan prediction device provided by the embodiment of the invention promotes the prediction capability of the model by pre-training the loan prediction model, and inputs the obtained user sample to be predicted into the pre-trained loan prediction model, so that the user sample to be predicted is predicted by the pre-trained loan prediction model, and the probability of loan application of each user sample to be predicted is obtained, thereby predicting the loan tendency of the user and promoting the accuracy of the prediction result.
In one embodiment, the apparatus further comprises a model training module for collecting loan information of the user; the user loan information comprises static data and dynamic data; the static data comprises user personal information; the dynamic data comprises user operation data of a user logging in a mobile phone bank APP; carrying out data preprocessing and merging processing on the user loan information to obtain a feature set; and training the preselected loan prediction model based on the characteristic set until the model performance of the loan prediction model is verified to meet the preset index, and obtaining the pre-trained loan prediction model.
In one embodiment, the user operation data includes at least login data and historical loan application data; the model training module is further used for determining an observation point and an observation presentation period; determining a sample observation interval based on the observation point and the observation expression period; in a sample observation interval, determining loan sample information based on the login data and the historical loan application data; and if the user is determined to have applied for the loan in the sample observation interval, determining the loan sample information as a positive sample, and otherwise, determining the loan sample information as a negative sample.
In one embodiment, each data in the user loan information corresponds to a corresponding main key; the model training module is further used for carrying out data exploration operation on the user loan information; performing data cleaning operation on the user loan information after the data exploration operation so as to process dirty data, missing values and abnormal values; merging the user loan information after the data cleaning operation based on the corresponding main key of each data in the user loan information so as to obtain the merged user loan information; and performing feature construction on the combined user loan information to obtain a feature set.
In an embodiment, the model training module is further configured to count click operation information of a user after logging in an APP of a mobile phone bank; performing feature construction operation on the click operation information to obtain a feature set; wherein the feature construction operation comprises a feature derivation operation and a feature selection operation.
In an embodiment, the model training module is further configured to perform feature extraction on click operation information based on a pre-selected natural language processing model to obtain word vector features obtained after derivation; the clicking operation information comprises a service module clicked after the user clicks the mobile phone bank APP and a corresponding clicking moment; and performing feature selection on the derived word vector features based on a preset feature information value threshold to obtain a derived feature set.
In one embodiment, the pre-selected loan prediction model comprises an extreme gradient elevation model; the model training module is further used for dividing the feature set to obtain a training sample and a verification sample; fitting the training sample based on the extreme gradient lifting model to obtain a well-fitted extreme gradient lifting model; predicting the verification sample based on the fitted extreme gradient lifting model until the preset index accords with the preset model effect; wherein the preset index comprises an AUC index.
The device provided by the embodiment of the present invention has the same implementation principle and technical effect as the method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the method embodiments without reference to the device embodiments.
The embodiment of the invention provides electronic equipment, which particularly comprises a processor and a storage device; the storage means has stored thereon a computer program which, when executed by the processor, performs the method of any of the above described embodiments. The electronic device may be, for example, a smart phone, a PC computer, a banking mobile terminal, or the like.
Fig. 4 is a schematic structural diagram of an electronic device 100 according to an embodiment of the present invention, where the electronic device 100 includes: a processor 40, a memory 41, a bus 42 and a communication interface 43, wherein the processor 40, the communication interface 43 and the memory 41 are connected through the bus 42; the processor 40 is arranged to execute executable modules, such as computer programs, stored in the memory 41.
The Memory 41 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 43 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, etc. may be used.
The bus 42 may be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 4, but that does not indicate only one bus or one type of bus.
The memory 41 is used for storing a program, the processor 40 executes the program after receiving an execution instruction, and the method executed by the apparatus defined by the flow process disclosed in any of the foregoing embodiments of the present invention may be applied to the processor 40, or implemented by the processor 40.
The processor 40 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 40. The Processor 40 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA), or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory 41, and the processor 40 reads the information in the memory 41 and completes the steps of the method in combination with the hardware thereof.
The loan prediction method, apparatus, electronic device, and computer program product of a computer-readable storage medium according to embodiments of the present invention include a computer-readable storage medium storing non-volatile program code executable by a processor, where the computer-readable storage medium stores a computer program, and when the computer program is executed by the processor, the method described in the foregoing method embodiments is executed.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the system described above may refer to the corresponding process in the foregoing embodiments, and is not described herein again.
The computer program product of the readable storage medium provided in the embodiment of the present invention includes a computer readable storage medium storing a program code, where instructions included in the program code may be used to execute the method described in the foregoing method embodiment, and specific implementation may refer to the method embodiment, which is not described herein again.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A loan prediction method, the method comprising:
obtaining a user sample to be predicted;
inputting the user sample to be predicted into a pre-trained loan prediction model;
and predicting the user samples to be predicted through the pre-trained loan prediction model to obtain the probability of applying for loan of each user sample to be predicted so as to predict the loan tendency of the user.
2. The loan prediction method of claim 1, wherein the training step of the pre-trained loan prediction model comprises:
collecting user loan information; the user loan information comprises static data and dynamic data; the static data comprises user personal information; the dynamic data comprises user operation data of a user logging in an APP of a mobile phone bank;
carrying out data preprocessing and merging processing on the user loan information to obtain a feature set;
and training the pre-selected loan prediction model based on the feature set until the model performance of the loan prediction model is verified to meet the preset index, and obtaining the pre-trained loan prediction model.
3. The loan prediction method of claim 2, wherein the user operation data comprises at least login data and historical loan application data; the method further comprises the following steps:
determining an observation point and an observation appearance period;
determining a sample observation interval based on the observation point and the observation expression period;
in the sample observation interval, determining loan sample information based on the login data and the historical loan application data;
and if the user is determined to have applied for the loan in the sample observation interval, determining the loan sample information as a positive sample, and otherwise, determining the loan sample information as a negative sample.
4. The loan prediction method of claim 2, wherein each data in the user loan information corresponds to a corresponding primary key; the step of carrying out data preprocessing and merging processing on the user loan information to obtain a feature set comprises the following steps:
performing data exploration operation on the user loan information;
performing data cleaning operation on the user loan information after the data exploration operation so as to process dirty data, missing values and abnormal values;
merging the user loan information after the data cleaning operation based on the corresponding main key of each data in the user loan information so as to obtain the merged user loan information;
and performing feature construction on the combined user loan information to obtain the feature set.
5. The loan prediction method of claim 4, wherein the step of performing feature construction on the combined user loan information to obtain the feature set comprises:
counting click operation information of a user after logging in a mobile phone bank APP;
performing feature construction operation on the click operation information to obtain the feature set; wherein the feature construction operation comprises a feature derivation operation and a feature selection operation.
6. The loan prediction method according to claim 5, wherein the step of performing a feature construction operation on the click operation information to obtain the feature set includes:
performing feature extraction on the click operation information based on a preselected natural language processing model to obtain word vector features obtained after derivation; the clicking operation information comprises a service module clicked after a user clicks the mobile phone bank APP and a corresponding clicking moment;
and performing feature selection on the derived word vector features based on a preset feature information value threshold to obtain the derived feature set.
7. The loan prediction method of claim 6, wherein the pre-selected loan prediction model comprises an extreme gradient elevation model; the step of training the preselected loan prediction model based on the feature set until the model performance of the loan prediction model is verified to meet the preset index comprises the following steps:
dividing the feature set to obtain a training sample and a verification sample;
fitting the training sample based on the extreme gradient lifting model to obtain the fitted extreme gradient lifting model;
predicting the verification sample based on the fitted extreme gradient lifting model until the preset index meets the preset model effect; wherein the preset index comprises an AUC index.
8. A loan prediction apparatus, characterized in that the apparatus comprises:
the sample acquisition module is used for acquiring a user sample to be predicted;
the input module is used for inputting the user sample to be predicted into a pre-trained loan prediction model;
and the model prediction module is used for predicting the user samples to be predicted through the pre-trained loan prediction model to obtain the probability of loan application of each user sample to be predicted so as to predict the loan tendency of the user.
9. An electronic device comprising a processor and a memory;
the memory stores computer-executable instructions executable by the processor to perform the steps of the loan prediction method of any of claims 1 to 7.
10. A computer-readable storage medium, having a computer program stored thereon, which, when executed by a processor, performs the steps of the loan prediction method of any of the claims 1 to 7.
CN202110273787.9A 2021-03-12 2021-03-12 Loan prediction method, loan prediction device, electronic device, and computer-readable storage medium Pending CN112785095A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110273787.9A CN112785095A (en) 2021-03-12 2021-03-12 Loan prediction method, loan prediction device, electronic device, and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110273787.9A CN112785095A (en) 2021-03-12 2021-03-12 Loan prediction method, loan prediction device, electronic device, and computer-readable storage medium

Publications (1)

Publication Number Publication Date
CN112785095A true CN112785095A (en) 2021-05-11

Family

ID=75762630

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110273787.9A Pending CN112785095A (en) 2021-03-12 2021-03-12 Loan prediction method, loan prediction device, electronic device, and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN112785095A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113269355A (en) * 2021-05-12 2021-08-17 广州市全民钱包科技有限公司 User loan prediction method, device and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108108861A (en) * 2018-03-06 2018-06-01 中国银行股份有限公司 The Forecasting Methodology and device of a kind of potential customers
CN109255506A (en) * 2018-11-22 2019-01-22 重庆邮电大学 A kind of internet finance user's overdue loan prediction technique based on big data
CN109509033A (en) * 2018-12-14 2019-03-22 重庆邮电大学 A kind of user buying behavior big data prediction technique under consumer finance scene
CN110930038A (en) * 2019-11-28 2020-03-27 中国建设银行股份有限公司 Loan demand identification method, loan demand identification device, loan demand identification terminal and loan demand identification storage medium
CN111861698A (en) * 2020-07-02 2020-10-30 北京睿知图远科技有限公司 Pre-loan approval early warning method and system based on loan multi-head data
CN112328869A (en) * 2020-09-28 2021-02-05 苏宁金融科技(南京)有限公司 User loan willingness prediction method and device and computer system
CN113269355A (en) * 2021-05-12 2021-08-17 广州市全民钱包科技有限公司 User loan prediction method, device and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108108861A (en) * 2018-03-06 2018-06-01 中国银行股份有限公司 The Forecasting Methodology and device of a kind of potential customers
CN109255506A (en) * 2018-11-22 2019-01-22 重庆邮电大学 A kind of internet finance user's overdue loan prediction technique based on big data
CN109509033A (en) * 2018-12-14 2019-03-22 重庆邮电大学 A kind of user buying behavior big data prediction technique under consumer finance scene
CN110930038A (en) * 2019-11-28 2020-03-27 中国建设银行股份有限公司 Loan demand identification method, loan demand identification device, loan demand identification terminal and loan demand identification storage medium
CN111861698A (en) * 2020-07-02 2020-10-30 北京睿知图远科技有限公司 Pre-loan approval early warning method and system based on loan multi-head data
CN112328869A (en) * 2020-09-28 2021-02-05 苏宁金融科技(南京)有限公司 User loan willingness prediction method and device and computer system
CN113269355A (en) * 2021-05-12 2021-08-17 广州市全民钱包科技有限公司 User loan prediction method, device and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113269355A (en) * 2021-05-12 2021-08-17 广州市全民钱包科技有限公司 User loan prediction method, device and storage medium

Similar Documents

Publication Publication Date Title
US20230222366A1 (en) Systems and methods for semantic analysis based on knowledge graph
US20200302540A1 (en) Applying a trained model to predict a future value using contextualized sentiment data
CN110569427B (en) Multi-target sequencing model training and user behavior prediction method and device
US11636341B2 (en) Processing sequential interaction data
CN112347367B (en) Information service providing method, apparatus, electronic device and storage medium
CN111984792A (en) Website classification method and device, computer equipment and storage medium
CN110598109A (en) Information recommendation method, device, equipment and storage medium
CN116109373A (en) Recommendation method and device for financial products, electronic equipment and medium
CN113190702A (en) Method and apparatus for generating information
CN112785095A (en) Loan prediction method, loan prediction device, electronic device, and computer-readable storage medium
CN117593089A (en) Credit card recommendation method, apparatus, device, storage medium and program product
CN107644042B (en) Software program click rate pre-estimation sorting method and server
CN112801784A (en) Bit currency address mining method and device for digital currency exchange
CN108920492B (en) Webpage classification method, system, terminal and storage medium
CN114897607A (en) Data processing method and device for product resources, electronic equipment and storage medium
US11880394B2 (en) System and method for machine learning architecture for interdependence detection
CN114925275A (en) Product recommendation method and device, computer equipment and storage medium
CN109308565B (en) Crowd performance grade identification method and device, storage medium and computer equipment
CN113688206A (en) Text recognition-based trend analysis method, device, equipment and medium
CN112529319A (en) Grading method and device based on multi-dimensional features, computer equipment and storage medium
CN113220947A (en) Method and device for encoding event characteristics
CN112016975A (en) Product screening method and device, computer equipment and readable storage medium
CN110610378A (en) Product demand analysis method and device, computer equipment and storage medium
CN108628861B (en) Method and device for pushing information
CN113052677A (en) Method and device for constructing two-stage loan prediction model based on machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination