CN110060144B

CN110060144B - Method for training credit model, method, device, equipment and medium for evaluating credit

Info

Publication number: CN110060144B
Application number: CN201910203514.XA
Authority: CN
Inventors: 王睿之
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-03-18
Filing date: 2019-03-18
Publication date: 2024-01-30
Anticipated expiration: 2039-03-18
Also published as: CN110060144A

Abstract

The invention discloses a method for training a credit model, a method for evaluating the credit, a device, equipment and a medium, wherein the method for training the credit model comprises the steps of screening original user data to obtain basic user data which accords with training standards, wherein the basic user data comprises credit limit, basic information data, basic asset data and basic consumption data; preprocessing basic information data to obtain corresponding life stage categories; preprocessing basic asset data to obtain repayment capability levels corresponding to basic user data; preprocessing basic consumption data to obtain a consumption capacity grade corresponding to the basic user data; marking the credit line, the life stage category, the repayment capability level and the consumption capability level of the basic user data to obtain training data; and training the training data by using a GBDT algorithm to obtain a pre-credit limit model, so as to solve the problem of inaccurate pre-credit limit obtained by the pre-credit limit model.

Description

Method for training credit model, method, device, equipment and medium for evaluating credit

Technical Field

The invention relates to the field of intelligent decision making, in particular to a method for training a credit model, a method for evaluating credit, a device, equipment and a medium.

Background

Currently, each big bank evaluates according to personal information, acquires a pre-credit limit (namely, a maximum credit limit) and feeds back the pre-credit limit to a user, and the user can conduct business application transaction according to the pre-credit limit. The current industry evaluates personal information to obtain pre-credit limit by a pre-credit limit model for small loan of users. The pre-credit line model is usually modeled according to user basic information and user asset conditions, however, most of the user asset conditions are difficult to acquire, or the acquired asset conditions are relatively one-sided, so that the user data integrity is low, and the pre-credit line acquired through the pre-credit line model is inaccurate.

Disclosure of Invention

The embodiment of the invention provides a method, a device, equipment and a medium for training a credit model, so as to solve the problem of inaccurate pre-credit limit.

A method of training a credit model, comprising:

the method comprises the steps of obtaining original user data in a database, screening the original user data, and obtaining basic user data which accords with training standards, wherein the basic user data comprises credit line, basic information data, basic asset data and basic consumption data;

Preprocessing the basic information data to obtain a life stage class corresponding to the basic user data;

preprocessing the basic asset data to obtain repayment capability levels corresponding to the basic user data;

preprocessing the basic consumption data to obtain a consumption capacity grade corresponding to the basic user data;

marking the credit line, the life stage category, the repayment capability level and the consumption capability level of the basic user data, and acquiring training data formed based on each basic user data;

and training each training data by adopting a GBDT algorithm to obtain a pre-credit limit model.

A method of credit assessment, comprising:

acquiring a pre-credit line viewing request, wherein the pre-credit line viewing request comprises user attribute data and target user data, and the user attribute data comprises user age, a region where a user is located and a user identifier;

identifying the age of the user, the region where the user is located and the user identification according to preset pre-trust evaluation conditions, and determining whether the user attribute data is pre-trust data or not;

If the user attribute data is pre-trust data, inputting the target user data into the pre-trust line model, and acquiring an initial pre-trust line corresponding to the target user data;

and inquiring a comparison table according to the initial pre-credit limit, and acquiring a target pre-credit limit corresponding to the initial pre-credit limit.

A credit model training device, comprising:

the basic user data acquisition module is used for acquiring original user data in a database, screening the original user data and acquiring basic user data which accords with training standards, wherein the basic user data comprises credit line, basic information data, basic asset data and basic consumption data;

the life stage category acquisition module is used for preprocessing the basic information data to acquire life stage categories corresponding to the basic user data;

the repayment capability grade acquisition module is used for preprocessing the basic asset data and acquiring repayment capability grade corresponding to the basic user data;

the consumption capability grade acquisition module is used for preprocessing the basic consumption data and acquiring the consumption capability grade corresponding to the basic user data;

The training data forming module is used for marking the credit limit, the life stage category, the repayment capability level and the consumption capability level of the basic user data and obtaining training data formed based on each basic user data;

and the pre-credit limit model acquisition module is used for training each training data by adopting the GBDT algorithm to acquire the pre-credit limit model.

A credit assessment transfer device, comprising:

the request acquisition module is used for acquiring a pre-credit line viewing request, wherein the pre-credit line viewing request comprises user attribute data and target user data, and the user attribute data comprises user age, a region where a user is located and a user identifier;

the pre-trust data determining module is used for identifying the age of the user, the region where the user is located and the user identifier according to pre-trust evaluation conditions set in advance and determining whether the user attribute data is pre-trust data or not;

the initial pre-credit line acquisition module is used for inputting the target user data into the pre-credit line model to acquire the initial pre-credit line corresponding to the target user data if the user attribute data is pre-credit data;

And the target pre-credit line acquisition module is used for inquiring the comparison table according to the initial pre-credit line to acquire the target pre-credit line corresponding to the initial pre-credit line.

A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the above-mentioned credit model training method when executing the computer program; alternatively, the processor may implement the above-mentioned credit assessment method when executing the computer program.

A computer readable storage medium storing a computer program which when executed by a processor implements the credit model training method described above; alternatively, the computer program, when executed by a processor, implements the credit assessment method described above.

According to the method, the device, the equipment and the medium for training the quota model, the basic user data meeting the training standard is obtained by screening the original user data, so that the training speed of the model is improved. Preprocessing basic information data, obtaining a life stage class, a repayment capability class and a consumption capability class corresponding to the basic user data, marking the credit limit, the life stage class, the repayment capability class and the consumption capability class of the basic user data, and forming training data so as to adjust the pre-credit limit model parameters according to the marking data. And each training data is trained by adopting the GBDT algorithm, so that the accuracy of the pre-credit line model obtained after the consumption data is added is higher.

The credit evaluation method, the credit evaluation device, the credit evaluation equipment and the credit evaluation medium identify the age of the user, the region where the user is located and the user identification according to the preset pre-credit evaluation conditions, and determine whether the user attribute data is pre-credit data or not so as to preliminarily determine whether the user is a user capable of carrying out credit. If the user attribute data is pre-trust data, inputting the target user data into a pre-trust line model, acquiring initial pre-trust line corresponding to the target user data, determining the initial pre-trust line, and outputting the initial pre-trust line through the pre-trust line model more accurately. And according to the initial pre-credit limit query comparison table, acquiring a target pre-credit limit corresponding to the initial pre-credit limit, so that the target pre-credit limit fed back to the user is a specified credit limit, and the difference between the credit limit and the target pre-credit limit is not great when the subsequent user applies for service.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram illustrating an application environment of a credit model training method or a credit evaluation method according to an embodiment of the invention;

FIG. 2 is a flowchart of a method for training a credit model according to an embodiment of the invention;

FIG. 3 is a flowchart of a method for training a credit model according to an embodiment of the invention;

FIG. 4 is a flowchart of a method for training a credit model according to an embodiment of the invention;

FIG. 5 is a flowchart of a method for training a credit model according to an embodiment of the invention;

FIG. 6 is a flowchart of a method for training a credit model according to an embodiment of the invention;

FIG. 7 is a flowchart of a credit evaluation method according to an embodiment of the invention;

FIG. 8 is a schematic block diagram of a credit model training apparatus in accordance with an embodiment of the invention;

FIG. 9 is a schematic block diagram of a credit assessment device according to an embodiment of the present invention;

FIG. 10 is a schematic diagram of a computer device in accordance with an embodiment of the invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The method for training the credit line model provided by the embodiment of the invention can be applied to an application environment as shown in fig. 1, is applied to a server, determines the classification of a life stage, the repayment capability level and the consumption capability level through the original user data by acquiring the original user data of an authorized user in a database, marks and trains the original user data to acquire the pre-credit line model, and can quickly output a good-effect prediction result through the pre-credit line model to improve the accuracy of the pre-credit line model. The server may be implemented by an independent server or a server cluster formed by a plurality of servers.

In an embodiment, as shown in fig. 2, a method for training a credit model is provided, and the method is applied to the server in fig. 1 for illustration, and specifically includes the following steps:

s10: and acquiring original user data in the database, screening the original user data, and acquiring basic user data which accords with the training standard, wherein the basic user data comprises credit, basic information data, basic asset data and basic consumption data.

The original user data refers to user data corresponding to each trusted user stored in the database. The basic user data refers to user data after screening processing of original user data of trusted users stored in a database. The credit limit is the credit limit stored in the database and corresponding to the user. The basic information data refers to data corresponding to all specific fields capable of determining the life stage of the user, for example, sex, age, marital status, educational background, living status, work status, child status, spouse information, and the like. The basic asset data refers to data corresponding to all specific fields capable of determining the repayment capability of the user, such as deposit amount, car as a house valuation, month average deposit, and policy. Basic consumption data refers to data corresponding to all specific fields capable of determining the consumption capability of a user, for example, month average consumption data, month average consumption frequency, high-volume consumption data, and electronic product data. It should be noted that the specific field refers to a preconfigured field capable of determining the life stage category, the consumption capability level, and the repayment capability level, for example, the phone number cannot determine the life stage category, the consumption capability level, and the repayment capability level, but is not a specific field.

Specifically, the database stores original user data corresponding to each trusted user, and the original user data is screened through preset rules. For example, some data which affects the model training time and does not affect the model training accuracy are screened out, and if a missing value exists in the original user data and the missing value is a value which affects the model training accuracy, interpolation processing is performed on the missing value. In this embodiment, basic user data capable of training is obtained, where the basic user data includes credit line, basic information data, basic asset data and basic consumption data corresponding to a trusted user. The original user data corresponding to the credit limit in the database is obtained, the original user data is screened to obtain the basic user data, the model training speed is improved, and the basic user data is trained to improve the accuracy of the pre-credit limit model.

S20: and preprocessing the basic information data to obtain the life stage category corresponding to the basic user data.

The preprocessing in this embodiment refers to acquiring data corresponding to a specific field from basic information data to determine a life stage category of a user. The life stage category refers to a category in which the user is determined according to basic information data of the user, and comprises a learning stage, a struggling stage, a maintenance stage, an elderly stage and the like.

Specifically, the server acquires basic information data from basic user data, and preprocesses the basic information data to acquire basic information data corresponding to the basic user data for determining the life stage, and determines the life stage category corresponding to the basic user data through the basic information data. Further, the basic user data comprises three blocks of basic information data, basic asset data and basic consumption data, the basic information data is firstly obtained from the basic user data, then the basic information data is preprocessed, if only data corresponding to specific fields are obtained, data corresponding to names, telephone numbers and the like and not corresponding to the specific fields are filtered, the data corresponding to the specific fields are input into a pre-trained decision tree model, and classification decision is carried out on the data through the decision tree model, so that the life stage category corresponding to the basic user data is obtained. By determining the type of life stage to which the base user data corresponds, the user's loan needs, e.g., the loan needs of the struggle stage are given higher than those of the senior stage. It will be appreciated that the higher the loan requirement, the greater the corresponding pre-credit limit.

S30: and preprocessing the basic asset data to obtain repayment capability levels corresponding to the basic user data.

The preprocessing in this embodiment refers to acquiring data corresponding to a specific field from the basic asset data to determine the repayment capability of the user, and performing normalization processing to determine the repayment capability level thereof. The repayment capability level refers to a level at which the user repayment capability is determined according to the user's base asset data.

Specifically, the server acquires basic asset data from basic user data, pre-processes the basic asset data to acquire basic asset data corresponding to the basic user data and used for determining the repayment capability level, normalizes the basic asset data, and determines the repayment capability level corresponding to the user through the normalized basic asset data. It can be understood that the basic user data includes three blocks of basic information data, basic asset data and basic consumption data, firstly, basic asset data is obtained from the basic user data, then, the basic asset data is preprocessed, for example, only data corresponding to a specific field is obtained, then, the basic asset data corresponding to the specific field is normalized, the normalized basic asset data is input into a pre-trained repayment capability prediction model, the basic asset data is identified through the repayment capability prediction model, and a repayment capability grade corresponding to the basic user data is output. Determining the repayment capacity grade corresponding to the basic user data so as to determine the repayment capacity of the user, wherein the larger the asset value corresponding to the basic asset data of the user is, the stronger the repayment capacity is; conversely, the smaller the asset value corresponding to the user's base asset data, the weaker the repayment capability is indicated. It can be appreciated that the greater the level of repayment capability corresponding to the user, the greater the pre-credit limit.

S40: and preprocessing the basic consumption data to obtain the consumption capability grade corresponding to the basic user data.

The preprocessing in this embodiment refers to obtaining data corresponding to a specific field from the basic consumption complex number to determine the consumption capability of the user, and performing normalization processing. The consumption capability level refers to a level at which the user's consumption capability is determined according to the user's basic consumption data.

Specifically, the server acquires basic consumption data from basic user data, preprocesses the basic consumption data to acquire data basic consumption data which corresponds to the basic user data and is used for determining consumption capacity, normalizes the basic consumption data, and determines the consumption capacity grade corresponding to the user through the normalized basic consumption data. It can be understood that the basic user data includes three blocks of basic information data, basic asset data and basic consumption data, firstly, basic consumption data is obtained from the basic user data, then the basic consumption data is preprocessed, for example, only data corresponding to a specific field is obtained, and then normalization processing is performed on the data corresponding to the specific field. The server side stores a pre-configured consumption evaluation table, and the consumption evaluation table stores the corresponding relation between the target consumption value corresponding to the basic consumption data and the consumption capability level. In this embodiment, the consumption evaluation table is searched based on the target consumption value corresponding to the basic consumption data after normalization processing, and the consumption capability level is obtained. The consumption capability level may include four levels, namely, a tidal current date tag, a high-volume consumption tag, an electronic type tag, and a saving date tag, respectively. The consumer capability of the user is determined by determining the level of consumer capability corresponding to the base user data. It can be appreciated that the greater the corresponding level of consumer capability, the greater the pre-credit limit.

S50: marking the credit limit, the life stage category, the repayment capability level and the consumption capability level of the basic user data, and acquiring training data formed based on each basic user data.

Specifically, the server side obtains the credit limit, the life stage category, the repayment capability level and the consumption capability level corresponding to each basic user data, marks each basic user data, and obtains training data formed by each basic user data after marking. And labeling each basic user data so as to adjust corresponding weight according to labeling data when the pre-credit limit model is trained later, so that the predicted value output by the pre-credit limit model is the same as the labeling data, and the accuracy of the pre-credit limit model is improved.

S60: and training each training data by adopting a GBDT algorithm to obtain a pre-credit limit model.

Among these, GBDT (Gradient Boost Decision Tree, gradient-lifted decision tree) is a decision tree algorithm constructed based on iterations, which may be abbreviated as MART (Multiple Additive Regression Tree) or GBRT (Gradient Boosting Regression Tree). The eromer h-Friedman proposed a gradient-lifting decision tree in 1999 that could be used for classification and regression, and GBDT algorithm could handle various types of data, including continuous and discrete, and have more capability to handle outliers. The GBDT algorithm generates a plurality of decision trees in the actual problem, namely generates decision trees corresponding to the life stage category, the repayment capability level and the consumption capability level, and gathers the results of all the decision trees to obtain the final pre-trust credit.

Specifically, the server acquires training data, wherein the training data comprises each basic user data and corresponding trusted credit limit, a life stage category corresponding to basic information data, a repayment capability level corresponding to basic asset data and a consumption capability level corresponding to basic consumption data, and trains the training data by using a GBDT algorithm to acquire a pre-trusted credit limit model. Training the training data by adopting GBDT algorithm, specifically comprising: (1) Training an initial weak learner from training data by using initial weights, and comparing a true value (namely the marked credit limit) with a predicted value (a value predicted by the initial weak learner) to obtain a learning error rate; according to the learning error rate of the initial weak learner, updating the weight of the training data, so that the weight of the training data with high learning error rate of the initial weak learner becomes high, and the weak learner with high error rate after the initial weak learner receives more attention; (2) Training a second weak learner based on the weighted training data; (3) By continuous iteration until the weak learner number reaches a preset number T; (4) And finally integrating the T weak learners through an aggregation strategy to obtain a final strong learner, namely obtaining the pre-credit limit model. The GBDT algorithm is adopted for training, so that feature selection and abnormal point processing can be effectively and automatically performed, and the problem of model overfitting can be avoided to a certain extent. And constructing a pre-credit limit model through a CBDT algorithm, so that a prediction result (namely an initial pre-credit limit) obtained by the pre-credit limit model later is more accurate.

In step S10-S60, the basic user data meeting the training standard is obtained by screening the original user data, so as to increase the training speed of the model. Preprocessing basic information data, obtaining a life stage class, a repayment capability class and a consumption capability class corresponding to the basic user data, marking the credit limit, the life stage class, the repayment capability class and the consumption capability class of the basic user data, and forming training data so as to adjust the pre-credit limit model parameters according to the marking data. And each training data is trained by adopting the GBDT algorithm, so that the accuracy of the pre-credit line model obtained after the consumption data is added is higher. It should be noted that, the acquiring of the part sequence of the life stage category, the repayment capability level and the consumption capability level may be performed simultaneously, so as to improve the model training speed.

In one embodiment, as shown in fig. 3, in step S10, original user data in a database is acquired, and screening processing is performed on the original user data to acquire basic user data meeting training standards, which specifically includes the following steps:

s11: and acquiring original user data in the database, and judging whether the original user data has a missing value or not.

Wherein a missing value refers to that the value of a field or fields in the original user data due to the lack of information is incomplete. For example, in the original user data of a certain user, if the value corresponding to the age field is null or the value corresponding to the telephone number field is incomplete, the value corresponding to the age field and the telephone number field is a missing value.

Specifically, the server side judges each piece of obtained original user data, and determines whether each piece of original user data is complete, namely whether a missing value exists, wherein the judging result can be divided into two types, one type is that the original user data is complete, namely, the missing value does not exist, and the other type is that the original user data is not complete, and the missing value exists.

S12: and if the original user data does not have the missing value, taking the original user data as the basic user data.

Specifically, the server side judges that the original user data does not have a missing value, namely, the data of each field in the original user data is complete data, and takes the original user data as the basic user data. The original user data without missing values is used as the basic user data, so that the basic user data is complete data.

S13: and if the original user data has the missing value, acquiring a field corresponding to the missing value.

The field refers to a field corresponding to a value in the original user data, for example, if a certain field is an age, then the value is a specific numerical value.

Specifically, the server side judges that missing values exist in the original user data, acquires each missing value in the original user data, and acquires a corresponding field according to the missing value. And determining a field corresponding to the missing value so as to determine whether interpolation processing is needed for the missing value or not.

S14: if the field is a specific field, performing interpolation missing value processing on the missing value to obtain basic user data.

Specifically, a specific field table is stored in the database, wherein a field of data required by training the pre-credit limit model is stored in the specific field table, and the field of the data required by training the pre-credit limit model is stored in the specific field table as a specific field. And searching a specific field table through a field corresponding to the missing value, and determining whether the field is a specific field, namely determining whether the missing value corresponding to the field is data required by model training. If the field is a specific field, the missing value is subjected to interpolation processing, and specifically, the missing value can be subjected to interpolation missing value processing by adopting mean value interpolation, homogeneous mean value interpolation, maximum likelihood estimation and multiple interpolation methods. In this embodiment, the same-class mean value interpolation method is used to perform interpolation missing value processing. The similar mean value interpolation method mainly predicts the type of the missing variable by using a hierarchical clustering model, and then interpolates by using the mean value of the type. For example, if the field corresponding to a missing value is "age", the specific field table is searched by the "age", and if the specific field table indicates that the "age" can be used to determine the life stage, the "age" is the specific field, and the missing value is subjected to interpolation missing value processing to complement the missing value, so that the basic user data is more complete. The interpolation processing is carried out on the missing value corresponding to the specific field, namely, only the data required by model training is carried out, so that the model training speed is improved.

S15: if the field is a non-specific field, the missing value is not processed.

Specifically, if the specific field table is searched according to the field, and the field is determined to be a non-specific field, that is, the missing value corresponding to the field is determined not to be the data required by model training, then the missing value does not need to be processed. For example, if a field corresponding to a missing value is a "name", a specific field table is searched by the "name", and if the specific field table indicates that the "name" is a non-specific field, the missing value does not need to be processed, so as to speed up the screening of the original user data.

In step S11-S15, whether the original user data has missing values is judged to ensure the integrity of the model training data. If the original user data does not have the missing value, the original user data is complete data and can be directly used as the basic user data for model training. If the original user data has a missing value, acquiring a field corresponding to the missing value; if the field is a specific field, performing interpolation missing value processing on the missing value to ensure that the data of model training is complete data, and improving the accuracy of model training. If the field is a non-specific field, the missing value is not processed to increase the model training speed.

In one embodiment, as shown in fig. 4, in step S20, the basic information data is preprocessed to obtain the life stage category corresponding to the basic user data, which specifically includes the following steps:

s21: and screening the basic information data by adopting screening rules related to the life stage category to acquire target information data.

The screening rule is a preset rule and is used for extracting target information data of a specified field from the basic information data. The target information data is data required for model training obtained by filtering the basic information data.

Specifically, the server acquires screening rules related to the life stage, and screens basic information data in each basic user data through the screening rules to acquire target information data meeting standards such as pre-credit limit model training content and format. For example, telephone numbers, names, etc. may be purged to obtain target information data of a specified field.

S22: and identifying the target information data by adopting a decision tree model corresponding to the life stage class, and obtaining the life stage class corresponding to the basic user data.

The decision tree model is a model which is obtained by training historical user data in a database by adopting a decision tree algorithm in advance.

The decision tree algorithm can specifically train historical user data in the database by adopting an ID3 algorithm, wherein the ID3 (Iterative Dichotomiser, iterative decision tree) algorithm is an algorithm for constructing a decision tree, and the attribute selection is carried out according to the information gain. In each iteration of the algorithm, traversing feature dimensions in each unused user data in the database, calculating entropy or information gain of the feature dimensions, selecting the feature dimension with the minimum entropy or the maximum information gain as a root node of the decision tree, dividing the feature values in the feature dimensions into different attribute values by using the selected feature dimension, and continuing to recursively process the feature weight table by using the ID3 algorithm, wherein only the attribute which is not selected before is considered each time until the decision tree is built. Specifically, the process of training the historical user data in the database by adopting the ID3 algorithm, namely the growth process of the decision tree, is completed, so as to obtain a decision tree model corresponding to the classification of the life stage.

Specifically, the server obtains a decision tree model corresponding to the life stage category, and identifies the target information data through the decision tree model to obtain the life stage category corresponding to the target information data, namely, obtain the life stage category corresponding to the basic user data. The life stage category corresponding to the basic user data is determined through the decision tree model, so that the obtained life stage category is more accurate.

In the steps S21-S22, screening is carried out on the basic information data by adopting screening rules related to the category of the life stage, so that the acquired target information data is complete data. And identifying the target information data by adopting a decision tree model corresponding to the life stage category, rapidly acquiring the life stage category corresponding to the basic user data, and rapidly and accurately acquiring the life stage category.

In one embodiment, as shown in fig. 5, in step S30, the basic asset data is preprocessed to obtain the repayment capability level corresponding to the basic user data, which specifically includes the following steps:

s31: and carrying out normalization processing on the basic asset data to obtain target asset data after normalization processing.

Specifically, the server side obtains basic asset data in the basic user data, wherein the basic asset data comprises deposit amount, caravan valuation, month average deposit, policy and the like, and the repayment capability of the user can be determined according to the basic asset data. And carrying out normalization processing on the deposit amount, the caravan estimated value, the month deposit, the policy and other data, mapping the basic asset data into a range from 0 to 1 through the normalization processing, namely changing the basic asset data into decimal between (0 and 1), and taking the decimal between (0 and 1) as target asset data. The basic asset data is normalized to change data of different orders of magnitude to the same order of magnitude for subsequent determination of the payback capability level.

S32: and predicting the target asset data by adopting a pre-trained repayment capability prediction model to acquire repayment capability levels corresponding to the basic user data.

The repayment capability prediction model is a model which is obtained by training through a large amount of historical sample data in advance. Specifically, a large number of historical sample data corresponding to different repayment capability grades can be obtained first, repayment capability grade labeling is carried out on each historical sample data, and curve fitting is carried out by adopting a multiple logistic regression algorithm so as to obtain a repayment capability prediction model. The multiple logistic regression algorithm is used to estimate the probability of something, or determine the probability that a sample belongs to a certain class.

Further, the repayment capability prediction model training step specifically includes: (1) And acquiring historical sample data, wherein the historical sample data comprises deposit amount, car as a house valuation, average month deposit and quantity of protection corresponding to each historical user. (2) And performing curve fitting on deposit amount, car as a house valuation, month average deposit and policy number corresponding to each historical user through a prediction function. Wherein, the prediction function is:let->ThenWherein->Where x represents the input history sample, contains n-dimensional features, x _i Represents the ith sample and θ represents the model parameters. (3) And carrying out iterative optimization by adopting an optimization formula to obtain the repayment capability prediction model. Wherein the optimization formula comprises a loss function, a likelihood function, a gradient descent iteration function and the like. In the iterative optimization process, a loss function is firstly adopted for processing, wherein the loss function is +.>To minimize the value of J (θ) to be approximately 0, i.e., the more the credit line matches the predicted value, i.e., the maximum likelihood estimation is performed using a likelihood function of +.>Taking the logarithm of the likelihood function +.>Iteration is specifically performed through a gradient descent iteration formula, wherein the iteration formula is +.>And when the model parameters are converged to a certain degree, stopping iterative computation, wherein the obtained theta is the final model parameter so as to obtain a repayment capacity prediction model. It should be noted that the gradient descent iteration formula further includes iteration formulas corresponding to the conjugate gradient method, the quasi-newton method, and the like. In a specific embodiment, the optimal model parameters of the logistic regression model can be calculated through any one of the iterative algorithms, and the repayment capability prediction model containing the optimal model parameters is trained.

Specifically, the server inputs the obtained target asset data into a pre-trained repayment capability prediction model, predicts the target asset data through the repayment capability prediction model, obtains the probability of repayment capability level corresponding to the target asset data, and determines the repayment capability level corresponding to the target asset data through the probability, namely obtains the repayment capability level corresponding to the basic user data.

In step S31-S32, the basic asset data is normalized to obtain normalized target asset data, and the basic asset data in different orders of magnitude are changed into target asset data in the same order of magnitude through normalization, so that the obtained repayment capacity level is quicker and more accurate. The target asset data is predicted by adopting the pre-trained repayment capability prediction model, so that the repayment capability level can be rapidly acquired, the repayment capability prediction model can be repeatedly utilized in the interim, and the utilization rate of the repayment capability prediction model is improved.

In one embodiment, the basic consumption data includes at least one consumption factor, wherein the consumption factor refers to a month average consumption amount, a consumption frequency, high consumption data, daily purchase products, and the like.

As shown in fig. 6, in step S40, the basic consumption data is preprocessed to obtain the consumption capability level corresponding to the basic user data, which specifically includes the following steps:

s41: and carrying out normalization processing on at least one consumption factor to obtain a corresponding normalization factor value.

Specifically, the server acquires basic consumption data in basic user data, wherein the basic consumption data comprises at least one consumption factor, normalizes the at least one consumption factor to change consumption data of different orders of magnitude into consumption data of the same order of magnitude, and acquires normalized factor values corresponding to the at least one consumption factor, specifically values between 0 and 1 after normalization.

S42: and weighting at least one normalization factor value to obtain a corresponding target consumption value.

Specifically, the database stores preset weights corresponding to each consumption factor, and the preset weights are determined according to the importance of the consumption factors. For example, the weight corresponding to the large-amount consumption data is larger, and the preset weight corresponding to the electronic product in daily purchased products is larger than that of the daily necessities. And obtaining a preset weight value corresponding to each consumption factor and a normalization factor value corresponding to the consumption factor, and carrying out weighting treatment on each consumption factor through a weighting formula to obtain a target consumption value. Wherein the weighting formula is as follows y is the target consumption value, n is the number of consumption factors, A _i Representing normalized factor value, w, corresponding to the ith consumption factor _i And representing the preset weight corresponding to the ith consumption factor.

S43: and inquiring the consumption evaluation table based on the target consumption value, and acquiring the consumption capability grade corresponding to the basic user data.

Specifically, the corresponding relation between the target consumption value and the consumption capability level is stored in the database, and the consumption evaluation table in the database is searched according to the target consumption value to determine the consumption capability level corresponding to the target consumption value, namely, the consumption capability level corresponding to the basic user data is obtained.

In the steps S41-S43, normalization processing is performed on at least one consumption factor, and corresponding normalization factor values are obtained, so that the subsequent determination of the consumption capability level is facilitated. And weighting at least one normalization factor value to obtain a corresponding target consumption value, inquiring a consumption evaluation table according to the target consumption value, and obtaining a consumption capacity grade corresponding to the basic user data to realize the determination of the consumption capacity grade.

In an embodiment, the credit assessment method provided by the embodiment of the invention can be applied to an application environment as shown in fig. 1, and the credit assessment method is applied to a server side, and the pre-credit check request sent by a user side is obtained, wherein the pre-credit check request comprises user attribute data and target user data, and the pre-credit is determined according to the user attribute data and the target user data. The user side may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server may be implemented by an independent server or a server cluster formed by a plurality of servers.

In an embodiment, as shown in fig. 7, a method for evaluating a credit is provided, which is illustrated by taking a server side in fig. 1 as an example, and specifically includes the following steps:

s101: and acquiring a pre-credit line viewing request, wherein the pre-credit line viewing request comprises user attribute data and target user data, and the user attribute data comprises user age, a region where the user is located and a user identifier.

The user identification refers to an identification corresponding to a user, and a unique user is determined through the user identification, wherein the user identification can be specifically a user identity card.

Specifically, the server provides a data acquisition interface, and the interface is linked with the network of the user terminal. The user sends a pre-credit line checking request to the server based on the user terminal, and the server acquires the pre-credit line checking request according to the data acquisition interface, wherein the pre-credit line checking request comprises user attribute data and target user data, and the user attribute data comprises user age, a region where the user is located and a user identifier. Wherein the target user data comprises target information data, target asset data, and target consumption data.

Further, the obtaining the target consumption data specifically includes the following steps: (1) The pre-credit line pay-off request comprises a user account number, and historical consumption data corresponding to the user account number in a preset period is obtained. The user account refers to an account corresponding to the user identifier, specifically may be an account of a third party platform authorized by the user, and historical consumption data of the user can be obtained through the user account. The service end is preconfigured with a preset period. The server side obtains historical consumption data corresponding to the user account number within a preset period, wherein the preset period can be year consumption data, season consumption data and the like. (2) And performing duplicate removal cleaning treatment on repeated data in the historical consumption data to obtain effective transaction data. The repeated historical consumption data is avoided through the duplicate removal cleaning treatment, so that the obtained effective transaction data is more accurate, namely the obtained target consumption data is more accurate (3) the effective transaction data is analyzed, and the average consumption amount, the consumption frequency, the large consumption data and the daily purchase products are obtained and used as the target consumption data. And acquiring target consumption data through the user account, so that the acquisition speed is improved.

Further, the server is preset with filling areas of user attribute data and target user data, a user can fill data in the corresponding filling areas based on the user attribute data and the target user data, and the filling areas of the target user data can further comprise filling-necessary options and filling-selecting options. Specifically, the weights of the filled areas corresponding to the user basic information data, the basic asset data and the basic consumption data are predetermined, the filled areas with larger weights are used as filling-in options, and the filled options with smaller weights are used as filling-in options.

S102: and identifying the age of the user, the region where the user is located and the user identification according to preset pre-trust evaluation conditions, and determining whether the user attribute data is pre-trust data.

The pre-trust evaluation conditions are preset evaluation rules, for example, preset evaluation rules corresponding to ages (18 years old and having complete civil behaviors), territories (whether or not they are trusted areas) and blacklists (blacklists corresponding to large banks and poor credit records of borrowers or direct relatives).

Specifically, the server judges the acquired age of the user, the area where the user is located and the user identification through preset pre-trust evaluation conditions, judges whether the age of the user is 18 years old and has complete civil behavior, judges whether the area where the user is located is a trusted area, searches a blacklist of each big bank according to the user identification, and judges whether the user, borrower or direct relatives corresponding to the user identification have no bad records so as to determine whether the user attribute data is pre-trust data.

S103: if the user attribute data is pre-trust data, inputting the target user data into a pre-trust line model, and obtaining an initial pre-trust line corresponding to the target user data.

The initial pre-credit limit refers to credit limit corresponding to target user data output through a pre-credit limit model.

Specifically, if the user attribute data meets the pre-trust evaluation condition, the user attribute data is pre-trust data, the target user data is input into a pre-trust line model, and the target user data is analyzed and processed through the pre-trust line model to obtain an initial pre-trust line corresponding to the target user data. It should be noted that the initial pre-credit limit is a specific value.

Further, if the user attribute data is not pre-trust data, feedback information is generated and fed back to the user side. The feedback information is a user reminding the user not to pre-trust.

S104: and inquiring a comparison table according to the initial pre-credit limit, and acquiring a target pre-credit limit corresponding to the initial pre-credit limit.

The target pre-credit limit refers to a credit limit determined according to the initial pre-credit limit, and the target pre-credit limit is a preset integer value.

Specifically, the correspondence between the initial pre-credit limit and the target pre-credit limit is stored in the comparison table. The initial pre-credit line obtained through the pre-credit line model is a specific value, and the server side searches the comparison table according to the initial pre-credit line to obtain the target pre-credit line corresponding to the initial pre-credit line as the target pre-credit line corresponding to the target user data. For example, the initial pre-credit line obtained through the pre-credit line model is 2 thousands, and the reference table is searched through the initial pre-credit line, and if the target pre-credit line corresponding to the initial pre-credit line is specified to be twenty thousands in the reference table, the twenty thousands are used as the target pre-credit line corresponding to the target user data.

In steps S101-S104, the age of the user, the region where the user is located, and the user identifier are identified according to preset pre-trust evaluation conditions, and whether the user attribute data is pre-trust data is determined, so as to preliminarily determine whether the user is a user capable of performing trust. If the user attribute data is pre-trust data, inputting the target user data into a pre-trust line model, acquiring initial pre-trust line corresponding to the target user data, determining the initial pre-trust line, and outputting the initial pre-trust line through the pre-trust line model more accurately. And according to the initial pre-credit limit query comparison table, acquiring a target pre-credit limit corresponding to the initial pre-credit limit, so that the target pre-credit limit fed back to the user is a specified credit limit, and the difference between the credit limit and the target pre-credit limit is not great when the subsequent user applies for service.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.

In an embodiment, a device for training a credit model is provided, where the device for training a credit model corresponds to the method for training a credit model in the above embodiment one by one. As shown in fig. 8, the credit line model training apparatus includes a basic user data obtaining module 10, a life stage category obtaining module 20, a repayment capability level obtaining module 30, a consumption capability level obtaining module 40, a training data forming module 50 and a pre-credit line model obtaining module 60. The functional modules are described in detail as follows:

the basic user data acquisition module 10 is configured to acquire original user data in the database, perform filtering processing on the original user data, and acquire basic user data that meets training standards, where the basic user data includes credit, basic information data, basic asset data, and basic consumption data.

The life stage category obtaining module 20 is configured to pre-process the basic information data, and obtain a life stage category corresponding to the basic user data.

And the repayment capability level acquisition module 30 is used for preprocessing the basic asset data and acquiring the repayment capability level corresponding to the basic user data.

The consumption capability level obtaining module 40 is configured to pre-process the basic consumption data, and obtain a consumption capability level corresponding to the basic user data.

The training data forming module 50 is configured to mark the credit line, the life stage category, the repayment capability level and the consumption capability level of the basic user data, and obtain training data formed based on each basic user data.

The pre-credit line model obtaining module 60 is configured to use GBDT algorithm to train each training data, and obtain a pre-credit line model.

In an embodiment, the basic user data acquisition module 10 includes a missing value judgment unit, a first determination unit, a second determination unit, a missing value processing unit, and a third determination unit.

And the missing value judging unit is used for acquiring the original user data in the database and judging whether the original user data has missing values or not.

And the first determining unit is used for taking the original user data as the basic user data if the original user data does not have the missing value.

And the second determining unit is used for acquiring a field corresponding to the missing value if the missing value exists in the original user data.

And the missing value processing unit is used for performing interpolation missing value processing on the missing value if the field is a specific field to acquire the basic user data.

And the third determining unit is used for not processing the missing value if the field is a non-specific field.

In one embodiment, the life stage category obtaining module 20 includes a target information data obtaining unit and a life stage category obtaining unit.

And the target information data acquisition unit is used for screening the basic information data by adopting screening rules related to the category of the life stage to acquire target information data.

And the life stage category obtaining unit is used for identifying the target information data by adopting a decision tree model corresponding to the life stage category to obtain the life stage category corresponding to the basic user data.

In one embodiment, the repayment capability level obtaining module 30 includes: a target asset data acquisition unit and a repayment capability level acquisition unit.

And the target asset data acquisition unit is used for carrying out normalization processing on the basic asset data and acquiring the target asset data after normalization processing.

And the repayment capability grade obtaining unit is used for carrying out prediction processing on the target asset data by adopting a pre-trained repayment capability prediction model to obtain the repayment capability grade corresponding to the basic user data.

In one embodiment, the base consumption data includes at least one consumption factor.

The consumption capability level acquisition module 40 includes: the system comprises a normalization factor value acquisition unit, a target consumption value acquisition unit and a consumption capability class acquisition unit.

The normalization factor value acquisition unit is used for carrying out normalization processing on at least one consumption factor to acquire a corresponding normalization factor value.

The target consumption value acquisition unit is used for carrying out weighting processing on at least one normalization factor value to acquire a corresponding target consumption value.

And the consumption capability grade acquisition unit is used for inquiring the consumption evaluation table based on the target consumption value and acquiring the consumption capability grade corresponding to the basic user data.

For specific limitations of the credit model training apparatus, reference may be made to the above limitation of the credit model training method, and no further description is given here. The modules in the quota model training device can be realized in whole or in part by software, hardware and a group of the modules. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In an embodiment, a credit assessment device is provided, and the credit assessment device corresponds to the credit assessment method in the above embodiment one by one. As shown in fig. 9, the credit evaluation device includes a request acquisition module 101, a pre-credit data determination module 102, an initial pre-credit acquisition module 103, and a target pre-credit acquisition module 104. The functional modules are described in detail as follows:

the request obtaining module 101 is configured to obtain a pre-credit line viewing request, where the pre-credit line viewing request includes user attribute data and target user data, and the user attribute data includes a user age, a region where the user is located, and a user identifier.

The pre-trust data determining module 102 is configured to identify the age of the user, the region where the user is located, and the user identifier according to pre-trust evaluation conditions set in advance, and determine whether the user attribute data is pre-trust data.

The initial pre-credit line obtaining module 103 is configured to input the target user data into the pre-credit line model if the user attribute data is pre-credit data, and obtain an initial pre-credit line corresponding to the target user data.

The target pre-credit line obtaining module 104 is configured to obtain a target pre-credit line corresponding to the initial pre-credit line according to the initial pre-credit line query comparison table.

In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 10. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing original user data, a specific field table, a consumption evaluation table and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program when executed by the processor implements a credit model training method; alternatively, the computer program is executed by the processor to implement a credit assessment method.

In an embodiment, a computer device is provided, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program to implement the steps of the method for training a credit model in the foregoing embodiment, for example, steps S10 to S60 shown in fig. 2, or steps shown in fig. 3 to 6. The processor, when executing the computer program, implements the functions of each module in the credit model training apparatus in the above embodiment, for example, the functions of the modules 10 to 60 shown in fig. 8. To avoid repetition, no further description is provided here.

In an embodiment, a computer device is provided, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor executes the computer program to implement the steps of the method for rating a credit in the above embodiment, for example, steps S101 to S104 shown in fig. 7. The processor, when executing the computer program, implements the functions of each module in the credit evaluation device in the above embodiment, for example, the functions of the modules 101 to 104 shown in fig. 9. To avoid repetition, no further description is provided here.

In one embodiment, a computer readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the method for training the credit model in the above method embodiment, for example, step S10 to step S60 shown in fig. 2; or the steps shown in fig. 3 to 6. The computer program, when executed by the processor, implements the functions of the modules in the credit model training apparatus of the above embodiment, for example, the functions of the modules 10 to 60 shown in fig. 8. To avoid repetition, no further description is provided here.

In an embodiment, a computer readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the credit evaluation method in the above method embodiment, for example, step S101 to step S104 shown in fig. 7. Alternatively, the computer program, when executed by the processor, implements the functions of each module in the credit assessment device in the above embodiment, for example, the functions of the modules 101 to 104 shown in fig. 9. To avoid repetition, no further description is provided here.

Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of a computer program, which may be stored on a non-transitory computer readable storage medium and which, when executed, may comprise the steps of the above-described embodiments of the methods. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions.

The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims

1. A method for training a credit model, comprising:

acquiring original user data in a database, screening the original user data, judging whether the original user data has a missing value, acquiring a field corresponding to the missing value if the original user data has the missing value, and performing interpolation missing value processing on the missing value if the field is a preconfigured field capable of determining a life stage class, a consumption capability class and a repayment capability class, so as to acquire basic user data which accords with a training standard, wherein the basic user data comprises credit line, basic information data, basic asset data and basic consumption data;

2. The method for training the credit model according to claim 1, wherein the acquiring original user data in the database, filtering the original user data, determining whether the original user data has a missing value, acquiring a field corresponding to the missing value if the original user data has a missing value, and performing interpolation missing value processing on the missing value if the field is a preconfigured field capable of determining a life stage category, a consumption capability level and a repayment capability level, to acquire basic user data meeting training standards, includes:

If the original user data does not have the missing value, the original user data is used as basic user data;

if the original user data has a missing value, acquiring a field corresponding to the missing value;

if the field is not a preconfigured field capable of determining the life stage category, the consumption capability level and the repayment capability level, the missing value is not processed.

3. The method for training a credit model according to claim 1, wherein preprocessing the basic information data to obtain a life stage category corresponding to the basic user data includes:

screening the basic information data by adopting screening rules related to the life stage category to acquire target information data;

and identifying the target information data by adopting a decision tree model corresponding to the life stage class, and acquiring the life stage class corresponding to the basic user data.

4. The method for training a credit model according to claim 1, wherein preprocessing the basic asset data to obtain a repayment capability level corresponding to the basic user data includes:

Normalizing the basic asset data to obtain normalized target asset data;

and predicting the target asset data by adopting a pre-trained repayment capability prediction model, and obtaining repayment capability level corresponding to the basic user data.

5. The credit model training method of claim 1, wherein said basic consumption data includes at least one consumption factor;

the preprocessing the basic consumption data to obtain the consumption capability level corresponding to the basic user data comprises the following steps:

normalizing at least one consumption factor to obtain a corresponding normalized factor value;

weighting at least one normalization factor value to obtain a corresponding target consumption value;

and inquiring a consumption evaluation table based on the target consumption value, and acquiring the consumption capability grade corresponding to the basic user data.

6. A method for credit assessment, comprising:

if the user attribute data is pre-trust data, inputting the target user data into the pre-trust line model according to any one of claims 1 to 5, and obtaining an initial pre-trust line corresponding to the target user data;

7. A credit model training device, comprising:

the basic user data acquisition module is used for acquiring original user data in a database, screening the original user data, judging whether the original user data has a missing value, acquiring a field corresponding to the missing value if the original user data has the missing value, and carrying out interpolation missing value processing on the missing value if the field is a preconfigured field capable of determining a life stage category, a consumption capability level and a repayment capability level, so as to acquire basic user data which accords with a training standard, wherein the basic user data comprises credit line, basic information data, basic asset data and basic consumption data;

8. A credit evaluation device, characterized by comprising:

an initial pre-credit line obtaining module, configured to input the target user data into the pre-credit line model according to any one of claims 1 to 5 if the user attribute data is pre-credit data, and obtain an initial pre-credit line corresponding to the target user data;

9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the credit model training method as claimed in any one of claims 1 to 5 when the computer program is executed; alternatively, the processor, when executing the computer program, implements the steps of the credit assessment method as defined in claim 6.

10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the credit model training method as claimed in any one of claims 1 to 5; alternatively, the computer program when executed by a processor implements the steps of the credit assessment method as claimed in claim 6.