CN114240598A - Credit line model generation method, credit line determination method and device - Google Patents

Credit line model generation method, credit line determination method and device Download PDF

Info

Publication number
CN114240598A
CN114240598A CN202111384470.9A CN202111384470A CN114240598A CN 114240598 A CN114240598 A CN 114240598A CN 202111384470 A CN202111384470 A CN 202111384470A CN 114240598 A CN114240598 A CN 114240598A
Authority
CN
China
Prior art keywords
model
credit
target object
module
modeling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111384470.9A
Other languages
Chinese (zh)
Inventor
邓强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
Original Assignee
China Construction Bank Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp filed Critical China Construction Bank Corp
Priority to CN202111384470.9A priority Critical patent/CN114240598A/en
Publication of CN114240598A publication Critical patent/CN114240598A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Finance (AREA)
  • Data Mining & Analysis (AREA)
  • Accounting & Taxation (AREA)
  • Educational Administration (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Technology Law (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The disclosure relates to a credit line model generation method, a credit line determination method and a credit line determination device. The method comprises the following steps: performing characteristic derivation processing according to the balance and expenditure streamline data of the target object and the loan service of the target object, and determining a modeling variable; obtaining a limit model according to a training set of a target object, a testing set of the target object and a linear regression model established by a ridge regression method through modeling variable training, wherein the training set of the target object, the testing set of the target object and the modeling variable training are obtained after object sample screening; and evaluating and verifying the credit line model according to preset model evaluation conditions, adjusting the credit line model after evaluation and verification according to preset credit line adjusting conditions after evaluation and verification, and obtaining the credit line model after adjustment. By adopting the method, the credit line can be accurately determined, and the credit risk can be reduced.

Description

Credit line model generation method, credit line determination method and device
Technical Field
The disclosure relates to the technical field of data analysis, in particular to a credit line model generation method, a credit line determination method and a credit line determination device.
Background
With the development of internet technology, credit line assessment technology appears, and currently, credit line assessment technology for enterprises of financial institutions usually establishes an enterprise credit line model through expert experience or a big data modeling mode, and may generally include a tobit (bit model) model, a linear regression model and the like, so as to determine the enterprise credit line through the model.
In the traditional credit line assessment technology used by financial institutions, the data for establishing the enterprise credit line model is usually the credit line of an enterprise in the financial institution, but different enterprises may have different enterprise credit conditions in multiple financial institutions, and the credit demand of a client may change within a certain time, at this time, if the data of a single financial institution and the data of a single certain dimensionality are adopted to establish the enterprise credit line model, and then the credit line is determined, the problem of inaccurate credit line possibly occurs, so that the credit line has risk is caused.
Disclosure of Invention
In view of the above, it is desirable to provide a credit line model generation method, a credit line determination method, and a credit line determination device that can accurately determine a credit line and reduce a credit risk.
A credit line model generation method comprises the following steps:
performing characteristic derivation processing according to the balance and expenditure streamline data of the target object and the loan service of the target object, and determining a modeling variable;
training a linear regression model established by a ridge regression method according to a training set of a target object obtained after screening of a target sample and a modeling variable in a test set of the target object to obtain a limit model;
and evaluating and verifying the credit line model according to preset model evaluation conditions, adjusting the credit line model after evaluation and verification according to preset credit line adjusting conditions after the evaluation and verification pass, and obtaining the credit line model after adjustment.
In one embodiment, the determining the modeling variable according to the characteristic derivation processing of the income and expenditure streamline data of the target object and the loan service of the target object comprises:
performing characteristic derivation according to the balance and expenditure streamline data of the target object and the loan service of the target object to obtain characteristic derivative variables, wherein the characteristic derivative variables comprise: the monthly average deposit balance of the target object, the tax payment amount of the target object, the income amount of the operation activity of the target object, the transaction payment amount of the target object and the establishment period of the target object;
and (5) carrying out feature screening on the feature derivative variables to obtain modeling variables after the feature screening.
In one embodiment, the feature screening is performed on the feature derivative variables, and the model variables obtained after the feature screening include:
determining loan data information of the target object according to the receiving and payment pipelining data;
performing first screening on the feature derivative variables according to the feature missing rate of the feature derivative variables and the correlation coefficient of the feature derivative variables and the loan data information;
and performing second screening on the feature derivative variables obtained after the first screening according to the business meaning to obtain modeling variables.
In one embodiment, feature screening is performed on the feature derived variables, and after the feature screening, modeling variables are obtained, the method further includes:
and under the condition that the modeling variable is larger than the preset characteristic derived variable limiting threshold, adjusting the modeling variable according to the preset characteristic derived variable limiting threshold.
In one embodiment, the screening of the object sample comprises:
screening the object sample according to the object sample screening conditions to obtain the object sample, wherein the object sample screening conditions comprise: loan date of the subject, scale of the subject, loan amount of the subject, loan data information of the subject, abnormal value of the loan data information, bad credit;
determining a cross-time verification set and a collection of a training set and a testing set of a target object sample according to a preset time point;
and determining a training set of the target object in the collection set and a testing set of the target object according to a preset proportion.
In one embodiment, the obtaining of the credit model according to the linear regression model established by the ridge regression method based on the training set of the target object and the modeling variables in the test set of the target object, which are obtained after the object sample screening, includes:
training a linear regression model established by a ridge regression method according to the training set of the target object and the modeling variables in the test set of the target object to obtain a test result value and a significance level of the modeling variables and regression coefficients of the modeling variables in the linear regression model;
and obtaining a quota model through a trained linear regression model under the condition that the test result value and the significance level meet the conditions.
In one embodiment, the ridge regression method comprises:
adding a regular term into a loss function of a linear regression model, wherein the linear regression model is as follows:
y(x,w)=w1x1+w2x2+…wnxn
wherein x is a modeling variable, and w is a regression coefficient of the modeling variable in the linear regression model;
the penalty function for adding the regularization term is:
Figure BDA0003363868120000031
wherein n is the number of modeling variables in the training set,
Figure BDA0003363868120000032
for the regularization term, λ is the sum of the squares of the regression coefficients for each modeled variable, xiIs the ith modeling variable, y, in the training setiThe loan data information of the ith sample in the training set is shown, and w is the regression coefficient of the modeling variable in the linear regression model.
In one embodiment, the evaluation and verification of the credit model according to preset model evaluation conditions comprises the following steps:
and respectively evaluating and verifying the quota model according to the model evaluation index, the cross-time verification set and the modeling variable weight ratio.
In one embodiment, the method for evaluating and verifying the credit line model according to the model evaluation index, the cross-time verification set and the line weight ratio corresponding to the modeling variable comprises the following steps:
calculating and respectively comparing the determinable coefficients, the average absolute error and the absolute median difference of the training set, the testing set and the cross-time verification set, and evaluating and verifying the stability of the credit model according to the comparison result;
comparing the prediction quota calculated by the training set through the credit quota model with the cross-time verification set, and evaluating and verifying the prediction capability of the quota model according to the comparison result;
and calculating the weight ratio of the modeling variable, analyzing the weight ratio of the modeling variable according to a preset weight threshold, and evaluating and verifying the rationality of the quota model according to an analysis result.
A credit line determining method comprises the following steps:
acquiring receiving and distributing flow data of a credit object;
inputting the receiving and paying flow data into the generated credit line model for calculation to obtain an initial line;
and adjusting the initial limit according to the asset coverage rate of the credit object and the type of the credit object, and determining the credit limit of the credit object according to the adjusted initial limit.
An credit model generation device, comprising:
the modeling variable determining module is used for performing characteristic derivative processing according to the receiving and payment flow data of the target object and the loan service of the target object to determine a modeling variable;
the credit model training module is used for training a linear regression model established by a ridge regression method according to modeling variables in a training set of a target object and a test set of the target object, which are obtained after object samples are screened, so as to obtain a credit model;
the model evaluation and verification module is used for evaluating and verifying the credit model according to preset model evaluation conditions;
and the model adjusting module is used for adjusting the credit line model after the evaluation and verification according to the preset credit line adjusting conditions after the evaluation and verification of the credit line model, and obtaining the credit line model after the adjustment.
In one embodiment of the apparatus, the modeling variable determination module comprises: the system comprises a characteristic derivation module and a characteristic screening module;
the characteristic derivation module is used for carrying out characteristic derivation according to the income and expenditure streamline data of the target object and the loan service of the target object to obtain characteristic derivation variables, wherein the characteristic derivation variables comprise: the monthly average deposit balance of the target object, the tax payment amount of the target object, the income amount of the operation activity of the target object, the transaction payment amount of the target object and the establishment period of the target object.
And the characteristic screening module is used for carrying out characteristic screening on the characteristic derivative variables to obtain the modeling variables after the characteristic screening.
In one embodiment of the apparatus, the feature screening module comprises: the loan data determining module, the first screening module and the second screening module;
and the loan data determining module is used for determining the loan data information of the target object according to the receiving and payment pipelining data.
And the first screening module is used for carrying out first screening on the characteristic derivative variables according to the characteristic missing rate of the characteristic derivative variables and the correlation coefficient of the characteristic derivative variables and the loaned data information.
And the second screening module is used for carrying out second screening on the characteristic derivative variables obtained after the first screening according to the business meaning to obtain modeling variables.
In an embodiment of the apparatus, the modeling variable determination module further comprises: and the modeling variable adjusting module is used for adjusting the modeling variable according to the preset characteristic derived variable limit threshold value under the condition that the modeling variable is larger than the preset characteristic derived variable limit threshold value.
In one embodiment of the apparatus, the credit model training module comprises: the object sample screening module is used for screening the object sample according to the object sample screening conditions to obtain a target object sample, and the object sample screening conditions comprise: loan date of the subject, scale of the subject, loan amount of the subject, loan data information of the subject, abnormal value of the loan data information, bad credit;
the object sample screening module is also used for determining a cross-time verification set and a collection of a training set and a testing set of the target object sample according to a preset time point;
and the object sample screening module is also used for determining a training set of the target objects in the collection set and a testing set of the target objects according to a preset proportion.
In one embodiment of the device, the credit model training module further comprises a training calculation module and a model determination module;
and the training calculation module is used for training a linear regression model established by a ridge regression method according to the modeling variables in the training set and the test set of the target object to obtain the check result value and the significance level of the modeling variables and the regression coefficient of the modeling variables in the linear regression model.
And the model determining module is used for obtaining a quota model through the trained linear regression model under the condition that the test result value and the significance level meet the conditions.
In one embodiment of the apparatus, the training computation module comprises a ridge regression module for adding a regularization term to a loss function of a linear regression model, the linear regression model being:
y(x,w)=w1x1+w2x2+…wnxn
wherein x is a modeling variable, and w is a regression coefficient of the modeling variable in the linear regression model;
the penalty function for adding the regularization term is:
Figure BDA0003363868120000051
wherein n is the number of modeling variables in the training set,
Figure BDA0003363868120000052
for the regularization term, λ is the sum of the squares of the regression coefficients for each modeled variable, xiIs the ith modeling variable, y, in the training setiThe loan data information of the ith sample in the training set is shown, and w is the regression coefficient of the modeling variable in the linear regression model.
In an embodiment of the apparatus, the model evaluation verification module is further configured to evaluate and verify the credit model according to a model evaluation index, the cross-time verification set, and a modeling variable weight ratio, respectively.
In one embodiment of the apparatus, the model evaluation index verification module includes: the system comprises an evaluation index calculation and comparison module, a stability verification module, a comparison module, a prediction capability verification module, an analysis module and a rationality verification module;
the evaluation index calculation and comparison module is used for calculating and respectively comparing the determinable coefficients, the average absolute error and the absolute median difference of the training set, the test set and the cross-time verification set;
the stability verification module is used for evaluating and verifying the stability of the credit model according to the comparison result;
the comparison module is used for comparing the prediction quota calculated by the training set through the credit quota model with the cross-time verification set;
the prediction capability verification module is used for evaluating and verifying the prediction capability of the quota model according to the comparison result;
and the analysis module is used for calculating the weight ratio of the modeling variable and analyzing the weight ratio of the modeling variable according to a preset weight threshold.
And the rationality verification module is used for evaluating and verifying the rationality of the quota model according to the analysis result.
A credit limit determination device includes:
the flow data acquisition module is used for acquiring the receiving and distributing flow data of the credit object;
the initial quota calculation module is used for inputting the receiving and paying streamline data into the generated credit quota model for calculation to obtain an initial quota;
and the credit line determining module is used for adjusting the initial line according to the asset coverage rate of the credit object and the type of the credit object and determining the credit line of the credit object according to the adjusted initial line.
A computer device comprising a memory storing a computer program and a processor implementing the steps of the method as claimed above when the computer program is executed by the processor.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.
A computer program product comprising a computer program which, when being executed by a processor, carries out the steps of the above-mentioned method.
According to the credit line model generation method, the credit line determination method and the credit line determination device, the modeling variable is determined by performing characteristic derivation processing according to the receiving and paying flow data and the loan service, and the actual situation of the target object can be objectively and truly reflected by the receiving and paying flow data compared with invoices, taxes, social security and the like. After the characteristic derivation processing is carried out, the multidimensional characteristics can be integrated, and useful variables can be extracted from the receiving and paying running water data to train the model, so that the trained model can calculate the credit line more accurately. The linear regression model established by the ridge regression method is trained by obtaining the training set and the testing set after modeling variables and screening, because the linear regression model is established by the ridge regression method, the information of each modeling variable can be comprehensively considered by the limit model in order to avoid the excessive concentration of coefficients on a certain variable, the accuracy of the model for determining the credit line can be further improved, the credit risk can be reduced, finally, the limit model is adjusted and evaluated through the model evaluation condition and the limit adjustment condition, the credit line model can be finally obtained, the stability and the reasonability of the credit line model can reach the expectation, when the result of the credit line is calculated through the credit line model, the initial limit can be adjusted through the asset coverage rate of a credit object and the type of the credit object, and the adjusted initial limit can better accord with the actual economic condition of the current credit object, the credit line is more consistent with the expected result of the business, and the credit risk is further reduced.
Drawings
FIG. 1 is a schematic flow chart illustrating a credit line model generation method according to an embodiment;
FIG. 2 is a flowchart illustrating the step S102 according to an embodiment;
FIG. 3 is a flowchart illustrating step S204 according to an embodiment;
FIG. 4 is a schematic flow chart showing the screening step of the object sample in one embodiment;
FIG. 5 is a flowchart illustrating the step S104 according to an embodiment;
FIG. 6 is a schematic flow chart diagram illustrating the model evaluation validation step in one embodiment;
FIG. 7 is a diagram illustrating a comparison of initial credits and cross-time validation sets in one embodiment;
FIG. 8 is a flowchart illustrating a credit line model generation method according to another embodiment;
FIG. 9 is a block diagram schematically illustrating the structure of a credit model generating apparatus according to an embodiment of the present invention;
FIG. 10 is a diagram showing an internal configuration of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present disclosure more clearly understood, the present disclosure is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the disclosure and are not intended to limit the disclosure.
It should be noted that the terms "first," "second," and the like in the description and claims herein and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments herein described are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, apparatus, article, or device that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or device.
At present, in the process of credit granting to a small and micro enterprise, a traditional financial institution mainly determines the amount in a manner of "credit reference variable x expert coefficient" based on a certain dimension, for example, the amount is directly determined by daily average deposit of the enterprise in about 12 months × 0.6, or by credit inflow total amount of the enterprise in about 12 months × 0.3. Although the mode is intuitively very easy to understand, the operation situation of the small micro-enterprise is difficult to really distinguish, and the credit line is determined by adopting single certain dimension data, so that the problems that the credit line is inaccurate and the credit risk is relatively high can occur.
Therefore, in order to solve the above problem, as shown in fig. 1, the present disclosure provides a credit line model generation method, which is applied to a terminal for example in the embodiment, and it can be understood that the method can also be applied to a server, and can also be applied to a system including the terminal and the server, and is implemented through interaction between the terminal and the server. The terminal may be, but is not limited to, various personal computers, notebook computers, smart phones, and tablet computers. The server may be implemented as a stand-alone server or as a server cluster consisting of a plurality of servers.
In this embodiment, the method includes the steps of:
and S102, performing characteristic derivation processing according to the income and expenditure streamline data of the target object and the loan service of the target object, and determining a modeling variable.
The target object can be an enterprise, a financial institution, a land based producer and other companies with loan capability and repayment capability, the receiving and paying pipelining data can comprehensively and truly reflect the operating income condition, the financial asset condition and the expenditure condition of the enterprise, the method has great value for enriching the enterprise portrait, accurately evaluating the actual operation of the enterprise and perfecting the business crediting source, and can be used as one of the basis for the financial institution to credit the target object and can be obtained through a human bank, a bank supervisor or a union bank. Although the value of the revenue and expenditure stream data is high, since the part of data is the most original data, careful analysis is needed how to utilize the data, and the part of data has the following problems: (1) the data of each financial institution is difficult to be uniform and standard, so that the data needs to be uniformly acquired from a pedestrian, a bank supervisor or a union pay; (2) the quality of the individual financial institution data submissions may vary, so that a quality check of the critical data field information is required. The loan transaction may typically be a demand for a loan by the target subject. The modeling variables can be variables obtained after feature derivation processing, and can be used as samples of training models.
Specifically, the method comprises the steps of obtaining income and expenditure streamline data of a target object through a pedestrian, a bank supervisor or a union pay, carrying out feature derivation processing according to loan services of the target object to obtain modeling data, obtaining data which is the same as the modeling data in the income and expenditure streamline data, and finally obtaining the modeling data.
In some embodiments, for example, after the feature derivation processing, modeling data such as monthly average deposit balance of the target object in the last 12 months, tax payment amount of the target object in the last 12 months, income amount of the operation activities in the last 12 months, and the established years of the target object are obtained, and real data corresponding to the modeling data needs to be obtained from the income and expenditure pipelining data, so as to finally obtain the modeling data.
S104, training a linear regression model established by a ridge regression method according to the training set of the target object obtained after the object sample is screened and the modeling variable in the test set of the target object, and obtaining a limit model.
The object sample screening generally refers to a process of processing and deleting object samples which do not meet the training requirement. Ridge regression is one of the most commonly used regularization methods for regression analysis of ill-posed problems (ill-posed problemms).
Specifically, object samples stored in the financial institution are subjected to object sample screening, so that the screened samples can be subjected to a training model, target object samples meeting training conditions are obtained after screening, and the target object samples can contain various information of the object loan, such as loan starting date, loan balance, and financial information of the object, such as registered capital of an enterprise. And dividing the target object sample to obtain a training set of the target object and a test set of the target object. And establishing a linear regression model by a ridge regression method, training the linear regression model by modeling data in a training set and a testing set, and obtaining a credit model after training.
S106, evaluating and verifying the credit line model according to preset model evaluation conditions, adjusting the credit line model after evaluation and verification according to preset credit line adjusting conditions after evaluation and verification, and obtaining the credit line model after adjustment.
The model evaluation condition may be a method for evaluating the trained credit model, and the model evaluation condition may be capable of evaluating stability, rationality, and the like of the credit model. The credit line adjustment condition is generally a way to adjust the credit line of a single variable under abnormal conditions, and prevent the client from obtaining an excessively high credit line by increasing the value of the single variable.
Specifically, in an embodiment, the credit model is evaluated and verified according to a preset model evaluation condition, if the credit model meets the preset model evaluation condition, the evaluation and verification are passed, and if the credit model does not meet the preset model evaluation condition, it is described that the credit model cannot calculate the credit more accurately, the step of training the linear regression model established by the ridge regression method in the step S104 needs to be performed again until the evaluation and verification are passed. And after the evaluation verification passes, adjusting the credit line model according to preset credit line adjusting conditions, and obtaining the credit line model capable of accurately calculating the credit line after adjustment.
In another embodiment, the credit model may be evaluated and verified according to preset model evaluation conditions to obtain an evaluation verification result, and the reasonability, stability and the like of the credit model may be determined according to the evaluation verification result to determine the accuracy, reasonability and the like of the initial credit calculated by the credit model, so that the prediction capability of the model can be approximately obtained, the initial credit is calculated by using a portion with better prediction in the credit model, and the credit model after evaluation and verification is adjusted according to preset credit adjustment conditions to obtain a credit model capable of calculating the credit more accurately.
In the credit line model generation method, the modeling variable is determined by performing characteristic derivation processing according to the collection and payment flow data and the loan service, and the collection and payment flow data can objectively and truly reflect the actual situation of the target object compared with invoices, taxes, social security and the like. After the characteristic derivation processing is carried out, the multidimensional characteristics can be integrated, and useful variables can be extracted from the receiving and paying running water data to train the model, so that the trained model can calculate the credit line more accurately. The linear regression model is established by the ridge regression method, so that the problem that the coefficients are excessively concentrated on a certain variable can be avoided, the limit model comprehensively considers the information of each modeling variable, the accuracy of the model for determining the credit limit can be further improved, the credit risk can be reduced, finally, the limit model is adjusted and evaluated through the model evaluation condition and the limit adjustment condition, the credit limit model is finally obtained, the stability and the reasonability of the credit limit model can be expected, the result of calculating the credit limit is more consistent with the expected result of the business, and the credit risk is further reduced.
In one embodiment, as shown in fig. 2, the determining the modeling variable according to the characteristic derivation process performed by the target object's balance and expenditure streamline data and the target object's loan service includes:
s202, performing characteristic derivation according to the income and expenditure streamline data of the target object and the loan service of the target object to obtain characteristic derivative variables, wherein the characteristic derivative variables comprise: the monthly average deposit balance of the target object, the tax payment amount of the target object, the income amount of the operation activity of the target object, the transaction payment amount of the target object and the establishment period of the target object.
Specifically, feature derivation is performed according to the loan service of the target object to obtain feature derived variables, where the feature derived variables have a certain relationship with the receiving and payment pipelining data of the target object, and the feature derived variables may include: the monthly average deposit balance of the target object, the tax payment amount of the target object, the income amount of the operation activity of the target object, the transaction payment amount of the target object, the establishment period of the target object and the like. And matching data corresponding to the feature derivative variables from the balance and income flow data of the target object to finally obtain the feature derivative variables containing the target object data.
And S204, performing feature screening on the feature derivative variables to obtain the modeling variables after the feature screening.
Wherein, the feature screening can be an operation of screening the non-conforming feature derivative variables.
Specifically, feature screening is performed on the feature derivative variables according to the feature screening conditions, and the feature derivative variables meeting the conditions are obtained after the feature screening, namely the modeling variables.
In this embodiment, by performing the feature derivation, and the derived variables and the receiving and distributing pipeline data have a certain relationship, the receiving and distributing pipeline data can be comprehensively utilized, so that the modeling variables include different data of multiple dimensions, and the model is less affected by a single modeling variable when the model is trained.
In one embodiment, as shown in fig. 3, the performing feature screening on the feature-derived variables to obtain the modeling variables includes:
s302, determining loan data information of the target object according to the balance and income flow data.
The loan information data is usually a target variable, the target variable is essentially a reasonable loan amount required by the target object, and a variable for measuring the target variable is approximately found, such as the maximum value of the loan balance in 12 months of the target object, or the total repayment amount of 12 months of the loan of the target object, and the like. The maximum loan amount represented by the maximum value of the loan balance in the month of the target subject 12 indicates that the target subject requires at least the loan amount of that amount. The 12-month loan repayment sum of the target object represents the repayment capacity of the client and measures the amount of the loan which can be borne by the client.
Specifically, the loan data information of the target object, namely the reasonable loan amount, is determined by collecting and paying the maximum value of the loan balance of the target object within 12 months in the streamline data or the total repayment amount of the loan of the target object within 12 months.
S304, performing first screening on the feature derivative variables according to the feature missing rate of the feature derivative variables and the correlation coefficient between the feature derivative variables and loan data information.
The feature missing may be represented as a ratio of a feature value of the sample not taken to the total sample, for example, if the standing year of a certain sample is missing, data of the standing year is not taken. The correlation coefficient is a non-deterministic relationship, the correlation coefficient is a quantity for researching the linear correlation degree between the variables, in this embodiment, the correlation coefficient may be the size of the correlation coefficient between the feature in the feature derivative variable and the loan information data, and the larger the correlation coefficient is, the greater the influence of the feature on the loan information data is.
Specifically, the feature missing rate is calculated by dividing the number of variables in the feature derivative variables that are empty by the number of total variables of the feature derivative variables. And calculating correlation coefficients of the features in the feature derivative variables and the loan information data. And keeping the characteristic derivative variables of which the characteristic missing rate is less than or equal to the characteristic missing threshold and the absolute value of the correlation coefficient is greater than or equal to the correlation coefficient threshold.
In some preferred embodiments, the feature missing threshold may be 0.3 and the correlation coefficient threshold may be 0.2. It should be noted that, a person skilled in the art may also select and set the feature missing threshold and the correlation coefficient threshold according to actual situations, which is not limited in this embodiment.
S306, performing second screening on the feature derivative variables obtained after the first screening according to the business meaning to obtain the modeling variables.
Wherein the business meaning generally refers to the meaning of loan business.
Specifically, when a certain variable in the feature derivative variables does not meet the meaning of the loan service, such as the working time of the certain variable being the target variable, it does not meet the loan service, i.e., it is less associated with the loan service, so the variable is deleted.
In the present embodiment, the first feature screening and the second feature screening are performed to determine the modeling variables, and the modeling variables include variables useful for training the model.
In one embodiment, the feature screening is performed on the feature derivative variables, and after the feature screening, modeling variables are obtained, the method further includes:
in order to prevent the abnormal value in part of modeling variables from causing abnormal measurement and calculation of the quota, the upper limit of the modeling variable is limited, and the modeling variable is adjusted according to a preset characteristic derived variable limit threshold under the condition that the modeling variable is larger than the preset characteristic derived variable limit threshold.
In some embodiments, if the value of a variable in the modeling variables is greater than the 95 th quantile of the variable arranged from small to large, the 95 th quantile is taken as the modeling variable.
In one embodiment, as shown in fig. 4, the object sample screening includes:
s402, screening the object sample according to object sample screening conditions to obtain a target object sample, wherein the object sample screening conditions comprise: subject loan date, subject scale, subject loan amount, subject loan data information, abnormal value of loan data information, and bad credit.
Specifically, the object samples are screened according to the loan dates of the objects, the object samples which do not accord with the loan dates of the objects are screened out, and the object samples are deleted. In some embodiments, the subject's presentation period may take at least 12 months, and the subject's sample may need to be screened according to the subject's loan date, thus limiting the subject's loan start date and due date so that the subject may need to be greater than 12 months after the loan start date.
And screening the object samples according to the scale of the object, and screening the object samples according with the scale. In some embodiments, the object sample with credit-type operational loan in 12 months is screened out, the object size is judged according to three dimensions of total property, sales income and number of people of the object, the object sample with the object size of a large enterprise is excluded, and the object sample with the registration number of more than 1 million yuan is excluded. In the present embodiment, the target sample of the small micro-enterprise is mainly reserved, and the quota model is mainly suitable for the small micro-enterprise.
The object samples are screened according to the loan amount of the object, and the object samples with the loan amount of more than 1000 ten thousand of the object are excluded, because generally, the loan amount of the small micro enterprise is controlled within 1000 ten thousand, and the object samples of the small micro enterprise are further reserved.
And screening the object sample according to the loan data information of the object, and excluding clients with loan data information less than 1 ten thousand yuan, wherein the object with too small loan data information may interfere with the effect of model training.
And screening the object samples according to the abnormal values of the loan data information, and excluding the object samples with the loan data information values larger than the range of the abnormal values. The abnormal value calculation mode can be the average value of the loan data information plus 3 multiplied by the standard deviation of the loan data information or the loan data information is arranged from small to large, and the loan data information which is ranked in the 95 th percent position is taken as the abnormal value calculation mode.
And screening the object samples according to the bad credit, and excluding the object samples with overdue period of more than 30 days after the loan.
And finally obtaining a target object sample.
S404, determining a cross-time verification set of the target object sample and a collection set of the training set and the testing set according to a preset time point.
Specifically, segmentation is performed according to a preset certain time point, a target object sample before the preset certain time point may be used as a collection of a training set and a test set, and a target object sample after the preset certain time point may be used as a cross-time verification set. Or taking a target object sample after a preset certain time point as a collection of the training set and the test set, and taking a target object sample before the preset certain time point as a cross-time verification set.
S406, determining the training set of the target object and the testing set of the target object in the collection set according to a preset proportion.
Specifically, according to a preset ratio, for example, 7:3, the 7-scaled collection is used as a training set, and the remaining 3-scaled collection is used as a test set. It should be noted that, a person skilled in the art may select and set the ratio according to actual situations, the ratio is not limited in this embodiment, and the ratio may be 9:1, 8:2, 6:4, 5:5, and so on.
In some embodiments, as shown in fig. 5, the training a linear regression model established by a ridge regression method according to the modeling variables in the training set of the target object and the testing set of the target object obtained after the screening of the object sample to obtain a credit model includes:
s502, training a linear regression model established by a ridge regression method according to the modeling variables in the training set and the test set of the target object to obtain a test result value and a significance level of the modeling variables and regression coefficients of the modeling variables in the linear regression model.
The test result value may be a t value, and the t value is an index for determining whether the significance is significant. The level of significance is often referred to as the p-value. the t-value and the p-value are indexes for measuring whether the modeling variables are effective or not. Both the test set of target objects and the training set of target objects contain modeling variables.
Specifically, a linear regression model established by a ridge regression method is trained according to the modeling variables in the training set and the test set of the target object, and after training is completed, the test result value, the significance level and the regression coefficient of the modeling variables in the linear regression model are calculated and obtained.
S504, under the condition that the inspection result value and the significance level meet the conditions, obtaining a quota model through the trained linear regression model.
Specifically, in some embodiments, the check result value is converted into a significance level, and in the case that the check result value is converted into a significance level and the significance level is less than 0.05, it is proved that the condition is satisfied and the modeling variable has a significant influence on the loan data information, and the credit model is obtained by a trained linear regression model.
In other embodiments, the p-value alone may be used to measure an indicator of whether the modeled variable is valid. In the case where the significance level is less than 0.05, it turns out that the condition is satisfied and the modeling variables have a significant influence on the loan data information.
The way of calculating the quota by the quota model is as follows: and multiplying each variable in the modeling variables by the regression coefficient of the corresponding modeling variable and summing to obtain an initial limit.
In some embodiments, the test result values, significance levels, and regression coefficients of the modeling variables in the linear regression model are calculated to obtain a table of modeling variables and regression coefficients, as shown in table 1;
TABLE 1 modeling variables and regression coefficient Table
Figure BDA0003363868120000151
The initial credit calculated by the credit model may be: x1 × a3+ x2 × a5+ x3 × a6+ x4 × a 7.
It should be noted that, in this embodiment, only the four modeling variables are exemplified, but in the practical application process, a person skilled in the art may selectively replace, add, or delete the modeling variables, and the content included in a specific modeling variable is not limited in this embodiment as long as the condition of the significance level can be met.
In one embodiment, the ridge regression method includes:
adding an L2 regular term into a loss function of the linear regression model, wherein the L2 regular term is a penalty term of the model and aims to reduce the variance of the linear regression model by introducing a small amount of deviation so as to make the loss function as small as possible.
The linear regression model is:
y(x,w)=w1x1+w2x2+…wnxn
wherein x is a modeling variable, and w is a regression coefficient of the modeling variable in the linear regression model;
the penalty function for adding the regularization term is:
Figure BDA0003363868120000161
wherein n is the number of modeling variables in the training set,
Figure BDA0003363868120000162
λ is the sum of the squares of the regression coefficients for each of the modeled variables, which is a regular term L2 regular term, and is also a regular term strength control coefficient, the larger the sum, the more evenly the distribution of the regression coefficients for the modeled variables, the larger xiIs the ith modeling variable, y, in the training setiAnd w is the loan data information of the ith sample in the training set, and w is the regression coefficient of the modeling variable in the linear regression model.
In this embodiment, by introducing the L2 regular term, when solving the regression coefficient, the loss function needs to be made as small as possible, so that the coefficients of the modeling variables become average, and it is prevented that the loss function becomes large due to an excessively large regression coefficient of a certain modeling variable, and further the obtained credit model does not cause inaccurate credit model calculation due to an excessively large regression coefficient of a certain modeling variable, and it is ensured that the credit model comprehensively considers the information of each modeling variable.
In one embodiment, the evaluating and verifying the credit model according to a preset model evaluation condition includes:
and respectively evaluating and verifying the quota model according to quota weight ratios corresponding to model evaluation indexes, the cross-time verification set and the modeling variables.
The model evaluation index may be an index for evaluating and verifying the obtained quota model, and may include r2 (a solving coefficient), a mean absolute error and a mean absolute error, where the larger the solving coefficient is, the smaller the mean absolute error and the mean absolute error are, and the better the fitting degree of the representative model is. The modeling variable weight fraction may be a fraction of the modeling variable in the initial credit that is able to verify the validity of the credit model.
Specifically, the training set, the test set and the cross-time validation set of the training credit model can be validated through r2 (a coefficient of membership), mean absolute error and median absolute error, respectively, so as to evaluate the credit model.
The initial quota obtained by the time-crossing verification set and the training set through the quota calculation model can be compared and analyzed, so that the quota model can be evaluated.
The quota model can be evaluated according to the value of the weight ratio by calculating the weight ratio of each modeling variable in the quota model to the initial quota obtained by calculation.
In this embodiment, the credit model is evaluated in three ways, whether the initial credit distribution calculated by the credit model is reasonable and meets the business expectation or not can be determined according to the evaluation result, the stability of the credit model can be evaluated, and the calculation effect of the credit model can be obtained more accurately.
In some embodiments, as shown in fig. 6, the performing evaluation verification on the quota model according to quota weight ratios corresponding to model evaluation indexes, the cross-time verification set, and modeling variables respectively includes:
and S602, calculating and respectively comparing the determinable coefficients, the average absolute error and the median absolute difference of the training set, the testing set and the cross-time verification set, and evaluating and verifying the stability of the credit model according to the comparison result.
Specifically, the method comprises the steps of calculating the solvable coefficients of a training set, a testing set and a cross-time verification set, and comparing the solvable coefficients of the training set, the testing set and the cross-time verification set to obtain a first comparison result;
calculating the average absolute errors of the training set, the testing set and the cross-time verification set, and comparing the average absolute errors of the training set, the testing set and the cross-time verification set to obtain a second comparison result;
calculating the absolute median differences of the training set, the testing set and the cross-time verification set, and comparing the absolute median differences of the training set, the testing set and the cross-time verification set to obtain a third comparison result;
and under the condition that the first comparison result is smaller than or equal to a preset first result comparison threshold, the second comparison result is smaller than or equal to a preset second result comparison threshold, and the third comparison result is smaller than or equal to a preset third result comparison threshold, the stability of the quota model is proved to be better.
In some embodiments, after calculating the coefficients, mean absolute error, and absolute median difference of the training set, the test set, and the cross-time validation set, the model evaluation index value is obtained, as shown in table 2,
TABLE 2 evaluation index Table of model
Evaluation index Training set Test set Cross-time validation set
Coefficient of determinability 0.22 0.24 0.27
Mean absolute error 586512.97 578563 569252
Absolute median difference 280387.4 288967 261817
And under the condition that the difference among the values of the coefficient, the average absolute error and the median absolute difference of the three data sets is not large, namely the first comparison result is less than or equal to a preset first result comparison threshold, the second comparison result is less than or equal to a preset second result comparison threshold, and the third comparison result is less than or equal to a preset third result comparison threshold, the stability of the credit model is good.
S604, comparing the predicted quota calculated by the training set through the quota model with the cross-time verification set, and evaluating and verifying the prediction capability of the quota model according to the comparison result.
Specifically, the training set is input into the quota model, a predicted initial quota is obtained through quota model calculation, the initial quota is compared with the cross-time verification set to form a comparison graph, the cross-time verification set and the distribution of the initial quota are analyzed through the comparison graph, and therefore whether the predicted value distribution of the quota model is basically consistent with the actual service expected distribution or not is compared.
In some embodiments, the curve of the initial quota calculated by the training set is compared with the curve of the verification set across time, see fig. 7, where L1 is the curve of the verification set across time and L2 is the curve of the initial quota calculated by the training set. By comparing the initial quota with the distribution of the cross-time verification set, the predicted value distribution of the quota model obtained according to the comparison result is approximately consistent with the expected distribution of the real business, the initial quota predicted by the model is lower than the cross-time verification set, and the difference between the initial quota and the cross-time verification set is increased after 70 quantiles (the 70 quantiles represent the 70 th position of the initial quota predicted by the quota model in the training set, which is in small and large arrangement), from the quantile point, which shows that the capacity of the quota model for calculating the initial quota with large quota is poor.
S606, calculating the modeling variable weight ratio, analyzing the modeling variable weight ratio according to a preset weight threshold, and evaluating and verifying the rationality of the credit model according to an analysis result.
Specifically, the weight ratio corresponding to the modeling variable is obtained by multiplying the value corresponding to the modeling variable by the coefficient corresponding to the modeling variable and dividing the coefficient by the calculated initial amount, and when the weight ratio in a certain modeling variable is smaller than a preset weight ratio threshold value, the model is proved to be poor in rationality.
In some embodiments, after calculating the weight ratios of the modeling variables, a weight ratio table is obtained as shown in table 3,
table 3 weight ratio table
Figure BDA0003363868120000191
The weight percentage of 10%, 25%, 50%, 75% and 90% quantiles respectively represent the weight percentage of the corresponding initial credit line at the time of 10%, 25%, 50%, 75% and 90% quantiles.
By checking the weight ratio corresponding to each modeling variable, when the weight ratio in a certain modeling variable is smaller than a preset weight ratio threshold, the model is proved to be poor in rationality, and if the rationality of the credit model can be judged through the median of the monthly average deposit balance within 12 months, the median is 0.27, and the preset weight ratio threshold is 0.1, the credit model is good in rationality. It should be noted that, the person skilled in the art may preset the weight ratio threshold according to his own experience or actual situation. In the present embodiment, the median is used for the purpose of preventing the influence of the extreme value. Those skilled in the art can also select other ways to make the determination, such as mean, variance, etc., and can also select other modeling variables to make the determination, such as the age of establishment, the amount of the raw material transaction expenditure, etc.
In one embodiment, the adjusting conditions according to the preset quota may include: for the tax payment amount within 12 months, the tax payment amount of a target object with abnormal tax payment amount, such as a target object with the tax payment amount larger than 60 and the tax payment amount larger than 200 ten thousand, may be set to 0, or the tax payment transaction amount may be set to a median value.
It should be noted that, here, only the tax payment amount is taken as an example, and a transaction payment amount may also be abnormal, or other modeling variables may be abnormal, and a person skilled in the art may select to set the amount adjustment condition according to specific situations, which is not limited in this embodiment.
After the credit line model is adjusted according to the preset line adjusting condition, the credit line model is obtained, evaluation verification can be carried out on the credit line model according to the preset model evaluating condition again, and then the calculating effect of the adjusted credit line model is determined.
The steps of evaluating and verifying the credit line model can be referred to the above embodiments, and are not described in detail herein.
In another embodiment, the present disclosure further provides a credit line model generating method, as shown in fig. 8, the method includes the following steps:
s801, loan data information of the target object is determined according to the receiving and payment flow data.
And S802, screening the target sample according to preset sample screening conditions to obtain the target sample, wherein the sample screening conditions comprise the loan date of the target, the scale of the target, the loan amount of the target, the loan data information of the target, the abnormal value of the loan data information, the bad credit and the like.
S803, determining the training set of the target object and the test set of the target object in the collection set according to a preset proportion.
And S804, performing characteristic derivation processing according to the income and expenditure streamline data of the target object and the loan service of the target object, and determining a modeling variable.
S805, training a linear regression model established by a ridge regression method according to the training set of the target object obtained after object sample screening and the modeling variable in the test set of the target object, and obtaining a limit model.
And S806, evaluating and verifying the credit model according to preset model evaluation conditions.
S807, adjusting the credit line model after evaluation verification according to preset credit line adjustment conditions to obtain the credit line model after adjustment.
The embodiment of the credit limit model generation method in this embodiment may refer to the above-mentioned embodiments, and will not be described in detail here.
In another embodiment, the credit line model generated by any one of the above method embodiments of the present disclosure may be used to calculate the credit line of the credit object in the financial institution. Therefore, the present disclosure further provides a method for determining a credit line, including:
acquiring receiving and distributing flow data of a credit object;
inputting the receiving and paying running water data into the credit line model generated by any embodiment for calculation to obtain an initial line;
and adjusting the initial limit according to the asset coverage rate of the credit object and the type of the credit object, and determining the credit limit of the credit object according to the adjusted initial limit.
Specifically, the receiving and paying running water data of the object to be credited is obtained through a pedestrian, a bank supervisor or a unionpay, and the receiving and paying running water data is input to the credit line model generated in any one of the above embodiments to be calculated to obtain the initial line.
After the initial amount is obtained, when the initial amount is larger than a preset initial amount threshold, the asset coverage rate of the credit object needs to be judged, and when the initial amount is lower than a first percentage threshold which is not the asset coverage rate, the initial amount is adjusted to be the preset initial amount.
In some embodiments, when the initial credit is greater than 100 ten thousand yuan, the initial credit is adjusted to 100 ten thousand yuan when the initial credit is less than 10% of the asset coverage.
Since the credit object in the present disclosure is usually a small micro-business, an upper limit of the credit limit is set to the initial credit limit. In some embodiments, the upper limit of the credit may be between 300 ten thousand and 500 ten thousand dollars. Those skilled in the art can set the limit of other amounts according to different situations, and the limit is not limited in this embodiment. And finally, adjusting the initial limit according to the set limit, namely obtaining the credit limit of the credit object.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in the flow chart of the drawings may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the steps or stages in other steps.
In one embodiment, as shown in fig. 9, there is provided a credit model generating apparatus 900, including: a modeling variable determination module 902, a credit model training module 904, a model evaluation verification module 906, and a model adjustment module 908, wherein:
a modeling variable determining module 902, configured to perform feature derivation processing according to the balance and expenditure streamline data of the target object and the loan service of the target object, and determine a modeling variable;
the credit model training module 904 is used for training a linear regression model established by a ridge regression method according to a training set of a target object obtained after object sample screening and a modeling variable in a test set of the target object to obtain a credit model;
the model evaluation and verification module 906 is used for evaluating and verifying the credit model according to preset model evaluation conditions;
and the model adjusting module 908 is used for adjusting the credit model after the evaluation and verification according to the preset credit adjusting conditions after the evaluation and verification of the credit model, and obtaining the credit model after the adjustment.
In one embodiment of the apparatus, the modeling variable determination module 902 comprises: the system comprises a characteristic derivation module and a characteristic screening module;
the characteristic derivation module is used for carrying out characteristic derivation according to the income and expenditure streamline data of the target object and the loan service of the target object to obtain characteristic derivation variables, wherein the characteristic derivation variables comprise: the monthly average deposit balance of the target object, the tax payment amount of the target object, the income amount of the operation activity of the target object, the transaction payment amount of the target object and the establishment period of the target object.
And the characteristic screening module is used for carrying out characteristic screening on the characteristic derivative variables to obtain the modeling variables after the characteristic screening.
In one embodiment of the apparatus, the feature screening module comprises: the loan data determining module, the first screening module and the second screening module;
and the loan data determining module is used for determining the loan data information of the target object according to the receiving and payment pipelining data.
And the first screening module is used for carrying out first screening on the characteristic derivative variables according to the characteristic missing rate of the characteristic derivative variables and the correlation coefficient of the characteristic derivative variables and the loaned data information.
And the second screening module is used for carrying out second screening on the characteristic derivative variables obtained after the first screening according to the business meaning to obtain modeling variables.
In an embodiment of the apparatus, the modeling variable determining module 902 further comprises: and the modeling variable adjusting module is used for adjusting the modeling variable according to the preset characteristic derived variable limit threshold value under the condition that the modeling variable is larger than the preset characteristic derived variable limit threshold value.
In one embodiment of the apparatus, the credit model training module 904 comprises: the object sample screening module is used for screening the object sample according to the object sample screening conditions to obtain a target object sample, and the object sample screening conditions comprise: loan date of the subject, scale of the subject, loan amount of the subject, loan data information of the subject, abnormal value of the loan data information, bad credit;
the object sample screening module is also used for determining a cross-time verification set and a collection of a training set and a testing set of the target object sample according to a preset time point;
and the object sample screening module is also used for determining a training set of the target objects in the collection set and a testing set of the target objects according to a preset proportion.
In one embodiment of the device, the credit model training module 904 further comprises a training calculation module and a model determination module;
and the training calculation module is used for training a linear regression model established by a ridge regression method according to the modeling variables in the training set and the test set of the target object to obtain the check result value and the significance level of the modeling variables and the regression coefficient of the modeling variables in the linear regression model.
And the model determining module is used for obtaining a quota model through the trained linear regression model under the condition that the test result value and the significance level meet the conditions.
In one embodiment of the apparatus, the training computation module comprises a ridge regression module for adding a regularization term to a loss function of a linear regression model, the linear regression model being:
y(x,w)=w1x1+w2x2+…wnxn
wherein x is a modeling variable, and w is a regression coefficient of the modeling variable in the linear regression model;
the penalty function for adding the regularization term is:
Figure BDA0003363868120000231
wherein n is the number of modeling variables in the training set,
Figure BDA0003363868120000232
for the regularization term, λ is the sum of the squares of the regression coefficients for each modeled variable, xiIs the ith modeling variable, y, in the training setiThe loan data information of the ith sample in the training set is shown, and w is the regression coefficient of the modeling variable in the linear regression model.
In an embodiment of the apparatus, the model evaluation verification 906 module is further configured to perform evaluation verification on the credit model according to a model evaluation index, the cross-time verification set, and a modeling variable weight ratio, respectively.
In one embodiment of the apparatus, the model evaluation index verification module includes: the system comprises an evaluation index calculation and comparison module, a stability verification module, a comparison module, a prediction capability verification module, an analysis module and a rationality verification module;
the evaluation index calculation and comparison module is used for calculating and respectively comparing the determinable coefficients, the average absolute error and the absolute median difference of the training set, the test set and the cross-time verification set;
the stability verification module is used for evaluating and verifying the stability of the credit model according to the comparison result;
and the comparison module is used for comparing the prediction quota calculated by the training set through the credit quota model with the cross-time verification set.
And the prediction capability verification module is used for evaluating and verifying the prediction capability of the quota model according to the comparison result.
And the analysis module is used for calculating the weight ratio of the modeling variable and analyzing the weight ratio of the modeling variable according to a preset weight threshold.
And the rationality verification module is used for evaluating and verifying the rationality of the quota model according to the analysis result.
For the specific implementation of the credit line model generation device, reference may be made to the above embodiments of the credit line model generation method, which are not described herein again. All or part of the modules in the credit line model generation device can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In another embodiment, there is provided a credit limit determination device, including:
the flow data acquisition module is used for acquiring the receiving and distributing flow data of the credit object;
the initial quota calculation module is used for inputting the receiving and paying pipelining data into the credit quota model generated in the embodiment for calculation to obtain an initial quota;
and the credit line determining module is used for adjusting the initial line according to the asset coverage rate of the credit object and the type of the credit object and determining the credit line of the credit object according to the adjusted initial line.
For a specific implementation of the credit line determination device, refer to the above embodiments of the credit line determination method, which are not described herein again.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 10. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing training sets, test sets, cross-time verification sets, balance and balance stream data and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to realize a credit limit model training and generating method.
Those skilled in the art will appreciate that the architecture shown in fig. 10 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory in which a computer program is stored and a processor, which when executing the computer program performs the steps of the above-described method embodiments.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, performs the steps of the above-described method embodiments.
It should be noted that the object sample and the balance and expenditure flow data referred to in the present disclosure are both information and data authorized by the user or sufficiently authorized by each party.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in embodiments provided by the present disclosure may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), Magnetic Random Access Memory (MRAM), Ferroelectric Random Access Memory (FRAM), Phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases involved in embodiments provided by the present disclosure may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present disclosure, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for those skilled in the art, various changes and modifications can be made without departing from the concept of the present disclosure, and these changes and modifications are all within the scope of the present disclosure. Therefore, the protection scope of the present disclosure should be subject to the appended claims.

Claims (23)

1. A credit line model generation method is characterized by comprising the following steps:
performing characteristic derivation processing according to the balance and expenditure streamline data of the target object and the loan service of the target object, and determining a modeling variable;
training a linear regression model established by a ridge regression method according to the training set of the target object obtained after screening the object sample and the modeling variable in the test set of the target object to obtain a limit model;
and evaluating and verifying the credit line model according to preset model evaluation conditions, adjusting the credit line model after evaluation and verification according to preset credit line adjusting conditions after the evaluation and verification pass, and obtaining the credit line model after adjustment.
2. The credit line model generation method of claim 1, wherein the determining the modeling variables by performing feature derivation processing according to the income and expenditure streamline data of the target object and the loan service of the target object comprises:
performing characteristic derivation according to the balance and expenditure streamline data of the target object and the loan service of the target object to obtain characteristic derivative variables, wherein the characteristic derivative variables comprise: the monthly average deposit balance of the target object, the tax payment amount of the target object, the income amount of the operation activity of the target object, the transaction payment amount of the target object and the establishment period of the target object;
and carrying out feature screening on the feature derivative variables to obtain the modeling variables after feature screening.
3. The credit line model generation method of claim 2, wherein the feature screening the feature derived variables to obtain the modeling variables after feature screening comprises:
determining loan data information of the target object according to the receiving and paying streamline data;
performing first screening on the feature derivative variables according to the feature missing rate of the feature derivative variables and the correlation coefficient of the feature derivative variables and the loan data information;
and performing second screening on the feature derivative variables obtained after the first screening according to the business meaning to obtain the modeling variables.
4. The credit line model generation method of claim 2, wherein the feature derived variables are subjected to feature screening to obtain modeling variables, and then the method further comprises:
and under the condition that the modeling variable is larger than a preset feature derivative variable limiting threshold, adjusting the modeling variable according to the preset feature derivative variable limiting threshold.
5. The credit limit model generation method of claim 3, wherein the object sample screening comprises:
screening the object sample according to object sample screening conditions to obtain a target object sample, wherein the object sample screening conditions comprise: loan date of the subject, scale of the subject, loan amount of the subject, the loan data information of the subject, abnormal value of the loan data information, bad credit;
determining a cross-time verification set of the target object sample and a collection of the training set and the testing set according to a preset time point;
and determining the training set of the target object and the testing set of the target object in the collection set according to a preset proportion.
6. The credit limit model generation method of claim 5, wherein the obtaining of the limit model from the linear regression model established by the ridge regression method according to the training set of the target object obtained by screening the object sample and the modeling variables in the test set of the target object comprises:
training a linear regression model established by a ridge regression method according to the modeling variables in the training set and the test set of the target object to obtain a test result value and a significance level of the modeling variables and regression coefficients of the modeling variables in the linear regression model;
and under the condition that the test result value and the significance level meet the conditions, obtaining a quota model through the trained linear regression model.
7. The credit line model generation method of claim 6, wherein the ridge regression method comprises:
adding a regular term into a loss function of the linear regression model, wherein the linear regression model is as follows:
y(x,w)=w1x1+w2x2+…wnxn
wherein x is a modeling variable, and w is a regression coefficient of a linear regression model;
the penalty function for adding the regularization term is:
Figure FDA0003363868110000021
wherein n is the number of modeling variables in the training set,
Figure FDA0003363868110000022
for the regularization term, λ is the sum of the squares of the regression coefficients, x, of each of the modeled variablesiIs the ith modeling variable, y, in the training setiIs the loan data information of the ith sample in the training set, and w is the regression coefficient of the linear regression model.
8. The credit line model generation method of claim 5, wherein the evaluation and verification of the line model according to preset model evaluation conditions comprises:
and respectively evaluating and verifying the quota model according to model evaluation indexes, the cross-time verification set and the weight ratio of modeling variables.
9. The method of claim 8, wherein the evaluating and verifying the credit line model according to the model evaluation index, the cross-time verification set, and the line weight ratio corresponding to the modeling variable respectively comprises:
calculating and respectively comparing the determinable coefficients, the average absolute error and the absolute median difference of the training set, the testing set and the cross-time verification set, and evaluating and verifying the stability of the credit model according to the comparison result;
comparing the predicted quota calculated by the training set through the credit quota model with the cross-time verification set, and evaluating and verifying the prediction capability of the quota model according to the comparison result;
and calculating the modeling variable weight ratio, analyzing the modeling variable weight ratio according to a preset weight threshold, and evaluating and verifying the rationality of the credit model according to an analysis result.
10. A credit line determining method is characterized by comprising the following steps:
acquiring receiving and distributing flow data of a credit object;
inputting the receiving and paying running water data into the credit line model generated by any one of claims 1 to 9 for calculation to obtain an initial line;
and adjusting the initial limit according to the asset coverage rate of the credit object and the type of the credit object, and determining the credit limit of the credit object according to the adjusted initial limit.
11. An credit model generation device, comprising:
the modeling variable determining module is used for performing characteristic derivative processing according to the receiving and payment flow data of the target object and the loan service of the target object to determine a modeling variable;
the credit model training module is used for obtaining a credit model according to a training set of a target object, a testing set of the target object and a linear regression model established by a ridge regression method through the modeling variable training, wherein the training set of the target object, the testing set of the target object and the modeling variable training are obtained after object sample screening;
the model evaluation and verification module is used for evaluating and verifying the credit model according to preset model evaluation conditions;
and the model adjusting module is used for adjusting the credit line model after the evaluation and verification according to preset credit line adjusting conditions after the evaluation and verification of the credit line model passes, and obtaining the credit line model after the adjustment.
12. The credit limit model generation device of claim 11, wherein the modeling variable determination module comprises: a feature derivation module and a feature screening module;
the characteristic derivation module is used for performing characteristic derivation according to the income and expenditure streamline data of the target object and the loan service of the target object to obtain characteristic derivation variables, wherein the characteristic derivation variables comprise: the monthly average deposit balance of the target object, the tax payment amount of the target object, the income amount of the operation activity of the target object, the transaction payment amount of the target object and the establishment period of the target object;
and the characteristic screening module is used for carrying out characteristic screening on the characteristic derivative variables to obtain the modeling variables after the characteristic screening.
13. The credit line model generation device of claim 12, wherein the feature filtering module comprises: the loan data determining module, the first screening module and the second screening module;
the loan data determining module is used for determining loan data information of the target object according to the receiving and payment pipelining data;
the first screening module is used for carrying out first screening on the feature derivative variables according to the feature missing rate of the feature derivative variables and the correlation coefficient of the feature derivative variables and the loan data information;
and the second screening module is used for carrying out second screening on the feature derivative variables obtained after the first screening according to the business meaning to obtain the modeling variables.
14. The credit line model generation device of claim 12 or 13, wherein the modeling variable determination module further comprises: and the modeling variable adjusting module is used for adjusting the modeling variable according to a preset characteristic derived variable limit threshold value under the condition that the modeling variable is larger than the preset characteristic derived variable limit threshold value.
15. The credit model generation device of claim 11, wherein the credit model training module comprises: the object sample screening module is used for screening the object sample according to object sample screening conditions to obtain a target object sample, wherein the object sample screening conditions comprise: loan date of the subject, scale of the subject, loan amount of the subject, the loan data information of the subject, abnormal value of the loan data information, bad credit;
the object sample screening module is also used for determining a cross-time verification set of the target object sample and a collection set of the training set and the testing set according to a preset time point;
and the object sample screening module is also used for determining a training set of the target object and a testing set of the target object in the collection set according to a preset proportion.
16. The credit model generation device of claim 11 or 15, wherein the credit model training module further comprises: a training calculation module and a model determination module;
the training calculation module is used for training a linear regression model established by a ridge regression method according to the modeling variables in the training set and the testing set of the target object to obtain a test result value and a significance level of the modeling variables and regression coefficients of the modeling variables in the linear regression model;
and the model determining module is used for obtaining a quota model through the trained linear regression model under the condition that the inspection result value and the significance level meet the conditions.
17. The credit line model generation device of claim 16, wherein the training calculation module comprises: a ridge regression module, configured to add a regularization term to a loss function of the linear regression model, where the linear regression model is:
y(x,w)=w1x1+w2x2+…wnxn
wherein x is a modeling variable, and w is a regression coefficient of a linear regression model;
the penalty function for adding the regularization term is:
Figure FDA0003363868110000051
wherein n is the number of modeling variables in the training set,
Figure FDA0003363868110000052
for the regularization term, λ is the sum of the squares of the regression coefficients, x, of each of the modeled variablesiIs the ith modeling variable, y, in the training setiIs the loan data information of the ith sample in the training set, and w is the regression coefficient of the linear regression model.
18. The credit line model generation device of claim 11, wherein the model evaluation verification module is further configured to evaluate and verify the line model according to a model evaluation index, the cross-time verification set, and a modeling variable weight ratio, respectively.
19. The credit limit model generation device of claim 18, wherein the model evaluation index verification module comprises: the system comprises an evaluation index calculation and comparison module, a stability verification module, a comparison module, a prediction capability verification module, an analysis module and a rationality verification module;
the evaluation index calculation and comparison module is used for calculating and respectively comparing the determinable coefficients, the average absolute error and the absolute median difference of the training set, the test set and the cross-time verification set;
the stability verification module is used for evaluating and verifying the stability of the credit model according to the comparison result;
the comparison module is used for comparing the prediction limit obtained by the training set through calculation of the credit limit model with the cross-time verification set;
the prediction capability verification module is used for evaluating and verifying the prediction capability of the quota model according to the comparison result;
the analysis module is used for calculating the weight proportion of the modeling variable and analyzing the weight proportion of the modeling variable according to a preset weight threshold;
and the rationality verification module is used for evaluating and verifying the rationality of the quota model according to the analysis result.
20. An apparatus for determining an amount of credit, the apparatus comprising:
the flow data acquisition module is used for acquiring the receiving and distributing flow data of the credit object;
an initial quota calculation module, configured to input the receiving and paying pipelining data into the credit quota model generated in any one of claims 1 to 9 for calculation, so as to obtain an initial quota;
and the credit line determining module is used for adjusting the initial line according to the asset coverage rate of the credit object and the type of the credit object, and determining the credit line of the credit object according to the adjusted initial line.
21. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor realizes the steps of the method of any one of claims 1 to 9 when executing the computer program.
22. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 9.
23. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the method of any one of claims 1 to 9 when executed by a processor.
CN202111384470.9A 2021-11-19 2021-11-19 Credit line model generation method, credit line determination method and device Pending CN114240598A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111384470.9A CN114240598A (en) 2021-11-19 2021-11-19 Credit line model generation method, credit line determination method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111384470.9A CN114240598A (en) 2021-11-19 2021-11-19 Credit line model generation method, credit line determination method and device

Publications (1)

Publication Number Publication Date
CN114240598A true CN114240598A (en) 2022-03-25

Family

ID=80750336

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111384470.9A Pending CN114240598A (en) 2021-11-19 2021-11-19 Credit line model generation method, credit line determination method and device

Country Status (1)

Country Link
CN (1) CN114240598A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115018638A (en) * 2022-08-08 2022-09-06 平安银行股份有限公司 Service limit determining method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115018638A (en) * 2022-08-08 2022-09-06 平安银行股份有限公司 Service limit determining method and device

Similar Documents

Publication Publication Date Title
Zhu et al. Partial acquisitions in emerging markets: A test of the strategic market entry and corporate control hypotheses
Head et al. FDI as an outcome of the market for corporate control: Theory and evidence
EP1361526A1 (en) Electronic data processing system and method of using an electronic processing system for automatically determining a risk indicator value
Indriyanti The accuracy of financial distress prediction models: Empirical study on the world’s 25 biggest tech companies in 2015–2016 Forbes’s version
Overesch et al. The effects of company taxation in EU accession countries on German FDI 1
CN112598500A (en) Credit processing method and system for non-limit client
Da Silva et al. Selecting audit samples using Benford's Law
CN112446776A (en) Small and medium-sized enterprise credit evaluation system and method based on multi-source docking fusion data
Annan et al. Determinants of tax evasion in Ghana: 1970-2010
CN113919886A (en) Data characteristic combination pricing method and system based on summer pril value and electronic equipment
Reher et al. Automation and inequality in wealth management
Khalil Auditor choice and its impact on financial reporting quality: A case of banking industry of Pakistan
CN114240598A (en) Credit line model generation method, credit line determination method and device
Ashton et al. Valuation weights, linear dynamics and accounting conservatism: An empirical analysis
Gregory-Allen et al. The impact of portfolio holdings disclosure on fund returns
CN111292118A (en) Investor portrait construction method and device based on deep learning
CN116091200A (en) Scene credit granting system and method based on machine learning, electronic equipment and medium
Siripongvakin et al. Infrastructure project investment decision timing using a real options analysis framework with Rainbow option
Karam Measuring and managing operational risk in the insurance and banking sectors
CN117252677A (en) Credit line determination method and device, electronic equipment and storage medium
Chen et al. Quantifying impact factors of corporate financing: engineering consulting firms
Chiew et al. The predictive ability of the expected utility-entropy based fund rating approach: A comparison investigation with Morningstar ratings in US
CN114170000A (en) Credit card user risk category identification method, device, computer equipment and medium
Coyle et al. Potential social value from data: an application of discrete choice analysis
Kurniawan Examination of the variables affecting customers’ acceptance of online lending platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination