CN107633455A - Credit estimation method and device based on data model - Google Patents
Credit estimation method and device based on data model Download PDFInfo
- Publication number
- CN107633455A CN107633455A CN201710785997.XA CN201710785997A CN107633455A CN 107633455 A CN107633455 A CN 107633455A CN 201710785997 A CN201710785997 A CN 201710785997A CN 107633455 A CN107633455 A CN 107633455A
- Authority
- CN
- China
- Prior art keywords
- variable
- assessment models
- data
- completion
- failure
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
Abstract
The invention belongs to finance data processing technology field, there is provided a kind of credit estimation method and device based on data model.This method includes:The characteristic variable needed for assessment models is obtained from data to be assessed, whether each characteristic variable for judging data to be assessed is failure variable:If so, then using completion variable corresponding to the failure variable to carry out completion, and input assessment models, if it is not, then input assessment models, failure variable is loss of learning or the incomplete characteristic variable of information, assessment models are assessed according to the characteristic variable of input, and export evaluation result.Credit estimation method and device of the invention based on data model, it can carry out credit evaluation in the case where shortage of data, data are not complete using small set of data, improve credit violation correction effect.
Description
Technical field
The present invention relates to finance data processing technology field, and in particular to a kind of credit estimation method based on data model
And device.
Background technology
At present, on the market personal debt-credit software is more, different software towards target group it is different.In order to reduce wind
Danger is, it is necessary to assess the loan repayment capacity of user, for accurate lock onto target client, it is necessary to which the debt-credit tendency to user is carried out
Assess.
In actual application, bank debit and credit platform big data is adapted to legacy user to borrow or lend money model, and still, internet is put down
Though it is big that platform and newest mobile platform but face data, but many not accurate and comprehensive not to the utmost data.If believing
With, there occurs some missing or invalid data variables, the model may predict that borrower's credit level can occur in Rating Model
Deviation, in addition it is unpredictable, the estimation of bias is then produced to borrower.Also, in the loan platform initial start stage stage, because
Data are limited, and finance company may be unaware that the feature of which type of borrower is important in credit scoring pattern.Come from
The credit scoring pattern of large-scale finance company can might not accurately predict local user, moreover, the client of different regions
Because of regional disparity, it is impossible to local Credit Model is built according to strange land data, for example, same wage income is in a line city and three
The credit level in line city will not be identical, therefore can not effectively carry out the risk assessment of user.It is therefore, few for sample at initial stage,
If user data information is not complete, shortage of data, existing assessment models can not be assessed.For example, the assessment of loan repayment capacity
One of variable of model is wage income, can not be accurate if the wage income or kinsfolk's number of user can not be obtained
Really assess its loan repayment capacity.
How in the case where shortage of data, data are not complete, credit evaluation is carried out using small set of data, improves credit promise breaking
The problem of prediction effect is those skilled in the art's urgent need to resolve.
The content of the invention
For in the prior art the defects of, the present invention provide credit estimation method and device based on data model, can
In the case where shortage of data, data are not complete, credit evaluation is carried out using small set of data, improves credit violation correction effect.
In a first aspect, the present invention provides a kind of credit estimation method based on data model, this method includes:From to be assessed
Data in obtain characteristic variable needed for assessment models, assessment models are by training and by the model after inspection;
Whether each characteristic variable for judging data to be assessed is failure variable:
If so, then using completion variable corresponding to the failure variable to carry out completion, and assessment models are inputted,
If it is not, then input assessment models, failure variable is loss of learning or the incomplete characteristic variable of information;
Assessment models are assessed according to the characteristic variable of input, and export evaluation result.
Further, before the characteristic variable needed for assessment models is obtained from data to be assessed, this method also includes:
Sample data in training set is classified, obtains classification results;
According to classification results, logistic regression is carried out to the sample data in training set, establishes assessment models;
According to assessment models and the sample data of test set, test result is obtained;
According to test result, assessment models are examined.
Further, after examining assessment models, this method also includes:
According to cross validation method, the random sample data for splitting training set and test set;
Using the sample data Training valuation model after fractionation.
Further, according to classification results, before carrying out logistic regression to the sample data in training set, this method is also wrapped
Include:The distance of sample data in training set is calculated, determines associated variable;
Judge whether the distance between any two associated variable value is less than distance threshold, if so, then becoming two associations
Amount merges.
Further, calculate in training set after the distance of sample data, this method also includes:
Detect the distance between a certain variable and its dependent variable value;
The variable minimum with the variable distance value is arranged to the completion variable of the variable;
Completion is carried out using completion variable corresponding to the failure variable, specifically included:
The information completion failure variable of completion variable is corresponded to using the failure variable.
Further, after assessment models are established, before being replaced using completion variable corresponding to the failure variable,
This method also includes:Target variable is inputted into assessment models;
According to the information value of the existing characteristic variable of assessment models, examine each existing characteristic variable whether effective;
If the characteristic variable of failure be present, the target variable is arranged to the completion variable of the characteristic variable of failure;
Completion is carried out using completion variable corresponding to the failure variable, specifically included:
The information completion failure variable of completion variable is corresponded to using the failure variable.
Further, according to the information value of the existing characteristic variable of assessment models, each existing characteristic variable is examined to be
It is no effective, specifically include:
According to the allocation proportion of sample data in training set, the information value of each characteristic variable is calculated;
Tested according to predetermined value threshold value, judge whether each characteristic variable is effective.
Based on the above-mentioned credit estimation method embodiment arbitrarily based on data model, further, judge that a certain feature becomes
Measure after failure variable, to input before assessment models, this method also includes:Statistical calculation is carried out to failure variable, obtaining should
The average or intermediate value of failure variable;
Using the average of the failure variable or the failure variable of intermediate value completion loss of learning.
Second aspect, the present invention provide a kind of credit evaluation device based on data model, and the device includes characteristic variable
Acquisition module, failure variable completion module and evaluation module, characteristic variable acquisition module are used to obtain from data to be assessed
Characteristic variable needed for assessment models, assessment models are by training and by the model after inspection;The variable completion module that fails is used
In judge data to be assessed each characteristic variable whether be failure variable:If so, then use completion corresponding to the failure variable
Variable carries out completion, and inputs assessment models, if it is not, then inputting assessment models, failure variable is that loss of learning or information is not complete
Characteristic variable;Evaluation module is used to make assessment models be assessed according to the characteristic variable of input, and exports evaluation result.
Further, credit evaluation device of the present embodiment based on data model also establishes module including assessment models:With
Sample data in training set is classified, and obtains classification results;According to classification results, to the sample data in training set
Logistic regression is carried out, establishes assessment models;According to assessment models and the sample data of test set, test result is obtained;According to survey
Test result, examine assessment models.
As shown from the above technical solution, the credit estimation method and device based on data model that the present embodiment provides, are adopted
With pre-established assessment models, user's data to be assessed are handled, even if the failure variable that existence information lacks or information is not complete,
This method can also lose effect variable using completion variable completion, improve credit violation correction effect, completed using small set of data
Credit evaluation, the phenomenon for because processing data amount is small, causing assessment models not assess is avoided, save credit analysis cost, be
Credit decisions provides Informational support, reduces potential default risk, improves the efficiency and automatization level of credit evaluation.
Therefore, credit estimation method and device of the present embodiment based on data model, in the feelings that shortage of data, data are not complete
Under condition, credit evaluation is carried out using small set of data, improves credit violation correction effect, effectively manages the credit risk of user.
Brief description of the drawings
, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical scheme of the prior art
The required accompanying drawing used is briefly described in embodiment or description of the prior art.In all of the figs, similar element
Or part is typically identified by similar reference.In accompanying drawing, each element or part might not be drawn according to the ratio of reality.
Fig. 1 shows a kind of method flow diagram of credit estimation method based on data model provided by the present invention;
Fig. 2 shows a kind of structured flowchart of credit evaluation device based on data model provided by the present invention.
Embodiment
The embodiment of technical solution of the present invention is described in detail below in conjunction with accompanying drawing.Following examples are only used for
Clearly illustrate technical scheme, therefore be intended only as example, and the protection of the present invention can not be limited with this
Scope.
It should be noted that unless otherwise indicated, technical term or scientific terminology used in this application should be this hair
The ordinary meaning that bright one of ordinary skill in the art are understood.
In a first aspect, a kind of credit estimation method based on data model that the embodiment of the present invention is provided, with reference to Fig. 1,
This method includes:
Step S1:The characteristic variable needed for assessment models is obtained from data to be assessed, assessment models are by training
And by the model after inspection.For example, evaluate whether a certain user can refund on time, assessment models can use monthly pay, annual pay,
The characteristic variables such as length of service, address region, education background, are assessed the credit of the user, judge that the user is
It is no to store default risk.
In actual application, assessment models are the pre- models for first passing through training, examining acquisition.
Step S2:Whether each characteristic variable for judging data to be assessed is failure variable:
If so, then using completion variable corresponding to the failure variable to carry out completion, and assessment models are inputted,
If it is not, then input assessment models, failure variable is loss of learning or the incomplete characteristic variable of information.
For example, in actual application, the wage loss of learning or wage information of assessment models acquisition user be not complete, then
Wage this characteristic variable is failure variable, can use the house property information of the user, length of service and industry for being engaged in etc.
This characteristic variable of information completion wage.
Step S3:Assessment models are assessed according to the characteristic variable of input, and export evaluation result.
As shown from the above technical solution, the credit estimation method based on data model that the present embodiment provides, using built in advance
Vertical assessment models, processing user data to be assessed, even if the failure variable that existence information lacks or information is not complete, this method
Also completion variable completion failure variable can be used, improves credit violation correction effect, credit evaluation is completed using small set of data,
The phenomenon for because processing data amount is small, causing assessment models not assess is avoided, credit analysis cost is saved, is carried for credit decisions
For Informational support, potential default risk is reduced.
Therefore, credit estimation method of the present embodiment based on data model, in the case where shortage of data, data are not complete,
Credit evaluation is carried out using small set of data, improves credit violation correction effect.
In order to further improve the accuracy of credit estimation method of the present embodiment based on data model, specifically, commenting
In terms of estimating model construction, before the characteristic variable needed for assessment models is obtained from data to be assessed, this method can also be right
Sample data in training set is classified, and obtains classification results.For example, the classification of variable will classify according to credit promise breaking,
This is dependent variable.For example, according to default conditions, variable " age " will be divided into group, and then each group will have corresponding acquiescence speed
Rate, this can improve the packet for the variable applied in logistic regression.
According to classification results, logistic regression is carried out to the sample data in training set, establishes assessment models.
Logistic regression is mainly used in predicting credit promise breaking.Logistic regression does not require that data set should be normal distribution or tool
There is equal variance.Also, borrower can be divided into two groups by logistic regression, more likely may so repay on time, separately
One group may break a contract on loan.With binary result, modeling analysis personnel can be easily applied and verify phase
Close the effectiveness of variable.
Here, credit estimation method of the present embodiment based on data model is using logistic regression structure assessment models, logic
Return has more preferable estimated performance relative to multilayer perceptron neural network model, can disclose borrower exactly credible
Rely the feature in colony, method is simple, it can be readily appreciated that and can be provided for appropriate regulatory bodies and intuitively verify explanation.
In actual application, using test set sample data examine assessment models, i.e., establish assessment models it
Afterwards, according to assessment models and the sample data of test set, test result is obtained.
According to test result, assessment models are examined, to judge the problem of assessment models whether there is overfitting.Example
Such as, sample data is classified into training set (70%) and test set (30%).The first step of two step regression models is by training set
Decision tree and clustering technique be applied to training set, and identical packet will be obtained from training set and handles test set.
After segmentation, variable will be fit in logistic regression.Assessment using the ROC curve of training set and test set as estimated performance
Standard, and check for over-fitting problem.
Also, credit estimation method of the present embodiment based on data model can also be by the way of cross validation, repeatedly
Training valuation model, that is, after examining assessment models, this method can also according to cross validation method, it is random split training set and
The sample data of test set.
Using the sample data Training valuation model after fractionation.
For example, random segment is the big small subsets such as n by sample data.Each one of subset will be tested by random application
Card collection, and other subsets will be used as training set.One of cross validation is unique in that all subsets can only be in once
Tested.Repeat n times, can effectively apply all information in sample data.Finally, the average value of test result can be with
It is considered as the accuracy of model.Sample data is split at random using cross validation, multiple Training valuation model, with solution
Certainly the problem of assessment models overfitting.
Specifically, in terms of sample data classification, the sample data in training set is classified, obtains classification results
When, the specific implementation process of credit estimation method of the present embodiment based on data model is as follows:If the sample data in training set
For numerical variable, then classified using decision tree logarithm value variable, determine classification results;If the sample data in training set is
Classified variable, then classified variable is classified using clustering algorithm, determine classification results.
In actual application, according to the property of variable, split data into two parts and analyzed.Summarize one part
Numerical variable, another part are made up of classified variable.For numerical variable, CHAID decision tree classifications will be applied by variable point
For different classifications.Classified variable is by by Ward minimum variance clustering combination.
For numerical variable, descriptive statistic shows the general introduction of some functions of borrower.For example, borrower's is averaged
Age is 28 years old, may have stable wage after graduation, be in most cases university.Application time is up to 23 times, borrows money
People can personal information be interior one day after quickly receives loan submitting.Borrower pays 35 yuan of the number average out to of the moon of social insurance,
The slightly above incumbent company work time limit, this shows that borrower may change work.Under normal circumstances, borrower changes work
Chance is fewer, and the possibility that he or she breaks a contract is with regard to smaller, because wages are more stably repaid the loan.
Due to the arborescence run between acquiescence and classification, 95% or 99% significant property level will be off selecting group
Collection, then classification can form new classification.For the classification of some small samples, they will be according to general knowledge, and such as " majoring in " educates
Background, " scholar " is combined as the new category of " this is above section level ".
Ward minimum variance hierarchical cluster is by the small classification for assembled classification variable.It is different from other clustering methods
Differentiating method is characterised by that it clusters classification based on variance analysis rather than distance.Ward clusters are minimized in all clusters
The difference of two squares summation., as a kind of polymerization layered approach, it performs bottom-to-top method for it.Each classification is used as one
Cluster starts, and then gradually merges with other people.Population variance after polymerization can increase with the generation of merging, and this is in cluster
Weighted quadratic distance between the heart.When by them divided by summation square to provide variance proportion, the solution of quadratic sum is also very simple
It is single.
Wherein, decision tree is a kind of stratification supervised learning model, can handle different types of data, such as internal, name
Justice and alphabetic data.In terms of decision Tree algorithms, the automatic interaction detector in C4.5, classification and regression tree (CART) and card side
(CHAID) it is widest credit scoring sector application decision Tree algorithms.
In most cases, by using the segmentation that population can be divided into different homogeneity subgroups, can improve
The performance of logistic regression.For continuous variable, segmentation is referred to as the discrete discretization for turning to classified variable.However, work as borrower
When subdivision between possibility of breaking a contract prediction and borrower's feature is widely different, one group of parted pattern may be than single credit scoring
Model is more suitable for analyzing whole data set.Therefore, the decision tree in each continuous variable will be used as segmented model, be borrowed with optimization
The classification of money people's feature, and attempt to improve its adaptability to logistic regression.
Clustering technique is by the unsupervised learning grader of the data group synthesis set of clusters with similar characteristics.This can also
A suitable target variable is allocated sample is associated with homogeneous feature, to reduce between training and validation data set
Mistake classification effect.On the other hand, by separating isomery borrower, cluster data collection can improve forecasting efficiency.Therefore, should
Uniform data are combined as combining with clustering technique, returned with adaptation logic, to improve credit violation correction performance.
Based on cluster set, characteristic sub-area will uniformly organize progress by combining small sample according to minimum variance, avoid and return
The problem of returning the too small sample of middle variable to count calculating.
Here, credit estimation method of the present embodiment based on data model can be carried out at classification to different type variable
Reason.For numerical variable, this method is classified based on decision tree, and decision tree is relative to artificial neural network and k- arest neighbors
Predictive ability is strong, can calculate Euclidean distance, to optimize the classification of loaning bill feature, is favorably improved it to logistic regression
Adaptability.For classified variable, this method is classified based on clustering technique, will be had using Ward least variance method similar
The data of feature are combined into cluster group, are returned with adaptation logic, improve credit violation correction effect.
Specifically, place can be merged for associated variable, credit estimation method of the present embodiment based on data model
Reason, i.e., according to classification results, before carrying out logistic regression to the sample data in training set, this method can also calculate training set
The distance of middle sample data, determines associated variable.
Judge whether the distance between any two associated variable value is less than distance threshold, if so, then becoming two associations
Amount merges.
The independent variable that logistic regression requires all shall not be related to other independent corresponding relations.It is interrelated not
Can only violate logistic regression it is assumed that this may cause inessential variable significant and reduce predictive ability.
Here, credit estimation method of the present embodiment based on data model will can be mutually related, variable merges place
Reason, with specific reference to the Euclidean distance between each variable, judges whether to merge two associated variables, wherein, away from
Can be that the numerical value or empirical value of acquisition are calculated according to sample data from threshold value.Also, this method enters associated variable
Row merging treatment, credit evaluation risk can be reduced.Otherwise, the variable that is mutually related can reduce the evaluation result of logistic regression
Accuracy.
Specifically, in terms of completion variable processing, credit estimation method of the present embodiment based on data model can either be adopted
Fail-all characteristic variable is mended with average, intermediate value, and can is enough to be worth determination completion variable according to the distance between variable, additionally it is possible to root
Completion variable is determined according to information value.
Wherein, the detailed process that fail-all characteristic variable is mended using average, intermediate value is as follows:
After judging a certain characteristic variable for failure variable, input before the assessment models, this method also includes:To institute
State failure variable and carry out statistical calculation, obtain the average or intermediate value of the failure variable.
Using the average of the failure variable or the failure variable of intermediate value completion loss of learning.
Here, credit estimation method of the present embodiment based on data model can count to failure variable, it is determined
Intermediate value or average, and the information lacked in completion failure variable, in order in the infull feelings of the loss of learning of the variable or information
Under condition, the infull failure variable of loss of learning or information is fallen using completion variable replacement, completes credit evaluation.
Wherein, determine that the detailed process of completion variable is as follows according to Euclidean distance:
Calculate in training set after the distance of sample data, this method can also be detected between a certain variable and its dependent variable
Distance value.
The variable minimum with the variable distance value is arranged to the completion variable of the variable.
When carrying out completion using completion variable corresponding to the failure variable, specific implementation process is:
Using the information completion of completion variable corresponding to the failure variable failure variable.
In actual application, the Euclidean distance between different variables can be calculated using decision tree, if for becoming
For measuring A, the distance between variable B is most short, then variable B is arranged to the completion variable of variables A.For example, counting user
Characteristic variable " wage " information is not complete or lacks, then " wage " of the user, and then completion " work are extrapolated using " length of service "
Money " information.
Here, credit estimation method of the present embodiment based on data model can combine the distance between each variable, sentence
Similitude between disconnected two variables, the completion variable of each variable is determined, in order in the loss of learning or information of the variable
In the case of incomplete, the infull failure variable of loss of learning or information is fallen using completion variable replacement, completes credit evaluation.
Wherein, determine that the detailed process of completion variable is as follows according to information value:
After assessment models are established, before being replaced using completion variable corresponding to the failure variable, this method is also
Including:
Target variable is inputted into assessment models.
According to the information value of the existing characteristic variable of assessment models, examine each existing characteristic variable whether effective.
If the characteristic variable of failure be present, the target variable is arranged to the completion variable of the characteristic variable of failure.Example
Such as, in the data set of borrower, only variable (arri_sz_time) missing value.Due to it and another variable
(arri_sz_yrs) height correlation, so the value of missing value (arri_sz_time) exits from analysis, only remain
“arrival_sz_yrs”.Therefore, there is no the treatment that missing is worth in borrower's data set.
When carrying out completion using completion variable corresponding to the failure variable, specific implementation process is:
Using the information completion of completion variable corresponding to the failure variable failure variable.For example, the variable newly introduced is
" work position ", also, " work position " is the completion variable of " wage ".If characteristic variable " wage " information of user it is incomplete or
Missing, then extrapolated using " work position " " wage " of the user, and then completion " wage " information.
Here, credit estimation method of the present embodiment based on data model can also constantly introduce new target variable,
Also, according to the information value between characteristic variable judge the target variable whether be other characteristic variables completion variable, with
It is easy to, when a certain characteristic variable fails, be replaced using the completion variable of the characteristic variable of the failure, completes credit evaluation.
Also, according to the information value of the existing characteristic variable of assessment models, examine whether each existing characteristic variable has
During effect, specific implementation process is as follows:
According to the allocation proportion of sample data in training set, the information value of each characteristic variable is calculated.
Tested according to predetermined value threshold value, judge whether each characteristic variable is effective.
In actual application, evidence weight is that the ratio of " good " borrower's feature corresponds to " bad " to borrower
The Logarithmic calculation of the ratio of feature, for assessment and the relative risk of more different classes of variable.The specific calculating of evidence weight
Formula is as follows:
Wherein, WOE represents the evidence weight of a certain characteristic variable, and DistrGoods represents " good " in sample data and borrowed money
The distribution proportion in this feature variable of people, DistrBads represent sample data in " bad " borrower in this feature variable
Distribution proportion.
WOE on the occasion of higher, the credit default risk of customer action is lower, and WOE negative value is bigger, the letter of customer action
It is higher with default risk.Variable can be converted into the form of rule and information by WOE, and this make it that different types of variable can be with
In identical method.Variable can be transferred in WOE, can more effectively protect the free degree of small sample problem.Therefore, use
The different variables that WOE is concentrated with smaller sample data.
Information value can assess the predictive ability of characteristic variable, and specific formula for calculation is as follows:
IV=(DistrGoods-DistrBads) * WOE
Wherein, IV represents the information value of a certain characteristic variable, and DistrGoods represents " good " in sample data and borrowed money
The distribution proportion in this feature variable of people, DistrBads represent sample data in " bad " borrower in this feature variable
Distribution proportion, WOE represents the evidence weight of this feature variable.
If the information value IV of a certain characteristic variable is less than 0.02, the predictive ability of this feature variable is very poor.It is if a certain
The information value IV of characteristic variable is between 0.02 to 0.1, then this feature variable is considered as weak predictive ability.If a certain feature
The information value IV of variable is more than 0.5, then it is assumed that is excessively to predict.In general, assessment models can be used more than 0.02, and
Characteristic variable less than 0.5.
Second aspect, the embodiment of the present invention provides a kind of credit evaluation device based on data model, with reference to Fig. 2, the dress
Put including characteristic variable acquisition module 1, failure variable completion module 2 and evaluation module 3, characteristic variable acquisition module 1 be used for from
The characteristic variable needed for assessment models is obtained in data to be assessed, assessment models are by training and by the model after inspection.
Failure variable completion module 2 is used to judge whether each characteristic variable of data to be assessed to be failure variable:Should if so, then using
The completion variable corresponding to variable that fails carries out completion, and inputs assessment models, if it is not, then inputting assessment models, failure variable is
Loss of learning or the incomplete characteristic variable of information.Evaluation module 3 is used to make assessment models be commented according to the characteristic variable of input
Estimate, and export evaluation result.
As shown from the above technical solution, the credit evaluation device based on data model that the present embodiment provides, using built in advance
Vertical assessment models, processing user data to be assessed, even if the failure variable that existence information lacks or information is not complete, the device
Also effect variable can be lost using completion variable completion, improves credit violation correction effect, credit is completed using small set of data and comments
Estimate, avoid the phenomenon for because processing data amount is small, causing assessment models not assess, save credit analysis cost, determined for credit
Plan provides Informational support, reduces potential default risk.
Therefore, credit evaluation device of the present embodiment based on data model, in the case where shortage of data, data are not complete,
Credit evaluation is carried out using small set of data, improves credit violation correction effect.
In order to further improve the accuracy of credit evaluation device of the present embodiment based on data model, specifically, commenting
In terms of estimating model construction, credit evaluation device of the present embodiment based on data model also establishes module including assessment models, assesses
Model building module is used to classify to the sample data in training set, obtains classification results;According to classification results, to training
The sample data of concentration carries out logistic regression, establishes assessment models.According to assessment models and the sample data of test set, obtain and survey
Test result.According to test result, assessment models are examined, to judge the problem of assessment models whether there is overfitting.
Here, credit evaluation device of the present embodiment based on data model is using logistic regression structure assessment models, logic
Return has more preferable estimated performance relative to multilayer perceptron neural network model, can disclose borrower exactly credible
Rely the feature in colony, device is simple, carries out risk management it can be readily appreciated that facilitating.
Specifically, in terms of sample data classification, assessment models are established sample data of the module in training set and carried out
Classification, when obtaining classification results, it is specifically used for:If the sample data in training set is numerical variable, using decision tree logarithm
Value variable is classified, and determines classification results;If the sample data in training set is classified variable, using clustering algorithm to dividing
Class variable is classified, and determines classification results.
Here, credit evaluation device of the present embodiment based on data model can be carried out at classification to different type variable
Reason.For numerical variable, the device is classified based on decision tree, and decision tree is relative to artificial neural network and k- arest neighbors
Predictive ability is strong, can calculate Euclidean distance, to optimize the classification of loaning bill feature, is favorably improved it to logistic regression
Adaptability.For classified variable, the device is classified based on clustering technique, will be had using Ward least variance method similar
The data of feature are combined into cluster group, are returned with adaptation logic, improve credit violation correction effect.
Specifically, for associated variable, credit evaluation device of the present embodiment based on data model can merge place
Reason, i.e., assessment models are established module and are additionally operable to:The distance of sample data in training set is calculated, determines associated variable;Judge any
Whether the distance between two associated variables value is less than distance threshold, if so, then merging two associated variables.
Here, credit evaluation device of the present embodiment based on data model will can be mutually related, variable merges place
Reason, with specific reference to the Euclidean distance between each variable, judges whether to merge two associated variables, wherein, away from
Can be that the numerical value or empirical value of acquisition are calculated according to sample data from threshold value.Also, the device enters associated variable
Row merging treatment, credit evaluation risk can be reduced.Otherwise, the variable that is mutually related can reduce the evaluation result of logistic regression
Accuracy.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show
The description of example " or " some examples " etc. means specific features, structure, material or the spy for combining the embodiment or example description
Point is contained at least one embodiment or example of the present invention.In this manual, to the schematic representation of above-mentioned term not
Identical embodiment or example must be directed to.Moreover, specific features, structure, material or the feature of description can be with office
Combined in an appropriate manner in one or more embodiments or example.In addition, in the case of not conflicting, the skill of this area
Art personnel can be tied the different embodiments or example and the feature of different embodiments or example described in this specification
Close and combine.
It should be noted that the flow chart and block diagram in accompanying drawing show the service of multiple embodiments according to the present invention
Architectural framework in the cards, function and the operation of device, method and computer program product.At this point, flow chart or block diagram
In each square frame can represent the part of a module, program segment or code, the module, one of program segment or code
Subpackage is containing one or more executable instructions for being used to realize defined logic function.It should also be noted that at some as replacement
Realization in, the function that is marked in square frame can also be to occur different from the order marked in accompanying drawing.For example, two continuous
Square frame can essentially perform substantially in parallel, they can also be performed in the opposite order sometimes, and this is according to involved work(
Depending on energy.It is also noted that each square frame and block diagram in block diagram and/or flow chart and/or the square frame in flow chart
Combination, function or the special hardware based server of action it can be realized as defined in execution, or can be with special
The combination of hardware and computer instruction is realized.
The configuration device that the embodiment of the present invention is provided can be computer program product, including store program code
Computer-readable recording medium, the instruction that described program code includes can be used for performing the side described in previous methods embodiment
Method, specific implementation can be found in embodiment of the method, will not be repeated here.
It is apparent to those skilled in the art that for convenience and simplicity of description, the service of foregoing description
The specific work process of device, device and unit, the corresponding process in preceding method embodiment is may be referred to, will not be repeated here.
In several embodiments provided herein, it should be understood that disclosed server, apparatus and method, can
To realize by another way.Device embodiment described above is only schematical, for example, the unit is drawn
Point, only a kind of division of logic function, there can be other dividing mode when actually realizing, in another example, multiple units or group
Part can combine or be desirably integrated into another server, or some features can be ignored, or not perform.It is another, show
Show or the mutual coupling discussed or direct-coupling or communication connection can be by some communication interfaces, device or unit
INDIRECT COUPLING or communication connection, can be electrical, mechanical or other forms.
The unit illustrated as separating component can be or may not be physically separate, show as unit
The part shown can be or may not be physical location, you can with positioned at a place, or can also be published to multiple
On NE.Some or all of unit therein can be selected to realize the mesh of this embodiment scheme according to the actual needs
's.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, can also
That unit is individually physically present, can also two or more units it is integrated in a unit.
If the function is realized in the form of SFU software functional unit and is used as independent production marketing or in use, can be with
It is stored in a computer read/write memory medium.Based on such understanding, technical scheme is substantially in other words
The part to be contributed to prior art or the part of the technical scheme can be embodied in the form of software product, the meter
Calculation machine software product is stored in a storage medium, including some instructions are causing a computer equipment (can be
People's computer, server, or network equipment etc.) perform all or part of step of each embodiment methods described of the present invention.
And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only storage (ROM, Read-Only Memory), arbitrary access are deposited
Reservoir (RAM, Random Access Memory), magnetic disc or CD etc. are various can be with the medium of store program codes.
Finally it should be noted that:Various embodiments above is merely illustrative of the technical solution of the present invention, rather than its limitations;To the greatest extent
The present invention is described in detail with reference to foregoing embodiments for pipe, it will be understood by those within the art that:Its according to
The technical scheme described in foregoing embodiments can so be modified, either which part or all technical characteristic are entered
Row equivalent substitution;And these modifications or replacement, the essence of appropriate technical solution is departed from various embodiments of the present invention technology
The scope of scheme, it all should cover among the claim of the present invention and the scope of specification.
Claims (10)
- A kind of 1. credit estimation method based on data model, it is characterised in that including:The characteristic variable needed for assessment models is obtained from data to be assessed, the assessment models are by training and will examine Model afterwards;Whether each characteristic variable for judging data to be assessed is failure variable:If so, then using completion variable corresponding to the failure variable to carry out completion, and the assessment models are inputted,If it is not, then input the assessment models, the failure variable is loss of learning or the incomplete characteristic variable of information;The assessment models are assessed according to the characteristic variable of input, and export evaluation result.
- 2. the credit estimation method based on data model according to claim 1, it is characterised in thatBefore the characteristic variable needed for assessment models is obtained from data to be assessed, this method also includes:Sample data in training set is classified, obtains classification results;According to the classification results, logistic regression is carried out to the sample data in the training set, establishes the assessment models;According to the assessment models and the sample data of test set, test result is obtained;According to the test result, inspection institute states assessment models.
- 3. the credit estimation method based on data model according to claim 2, it is characterised in thatAfter inspection institute states assessment models, this method also includes:According to cross validation method, the sample data of the training set and the test set is split at random;The assessment models are trained using the sample data after fractionation.
- 4. the credit estimation method based on data model according to claim 2, it is characterised in thatAccording to the classification results, before carrying out logistic regression to the sample data in the training set, this method also includes:The distance of sample data in the training set is calculated, determines associated variable;Judge whether the distance between any two associated variable value is less than distance threshold, if so, then entering two associated variables Row merges.
- 5. the credit estimation method based on data model according to claim 4, it is characterised in thatCalculate in the training set after the distance of sample data, this method also includes:Detect the distance between a certain variable and its dependent variable value;The variable minimum with the variable distance value is arranged to the completion variable of the variable;Completion is carried out using completion variable corresponding to the failure variable, specifically included:The variable that failed described in the information completion of completion variable is corresponded to using the failure variable.
- 6. the credit estimation method based on data model according to claim 2, it is characterised in thatAfter the assessment models are established, before being replaced using completion variable corresponding to the failure variable, this method is also Including:Target variable is inputted into the assessment models;According to the information value of the existing characteristic variable of the assessment models, examine each existing characteristic variable whether effective;If the characteristic variable of failure be present, the target variable is arranged to the completion variable of the characteristic variable of the failure;Completion is carried out using completion variable corresponding to the failure variable, specifically included:The variable that failed described in the information completion of completion variable is corresponded to using the failure variable.
- 7. the credit estimation method based on data model according to claim 6, it is characterised in thatAccording to the information value of the existing characteristic variable of the assessment models, examine each existing characteristic variable whether effective, tool Body includes:According to the allocation proportion of sample data in the training set, the information value of each characteristic variable is calculated;Tested according to predetermined value threshold value, judge whether each characteristic variable is effective.
- 8. the credit estimation method based on data model according to claim 1, it is characterised in thatAfter judging a certain characteristic variable for failure variable, input before the assessment models, this method also includes:To the mistake Imitate variable and carry out statistical calculation, obtain the average or intermediate value of the failure variable;Using the average of the failure variable or the failure variable of intermediate value completion loss of learning.
- A kind of 9. credit evaluation device based on data model, it is characterised in that including:Characteristic variable acquisition module:For obtaining the characteristic variable needed for assessment models, the assessment from data to be assessed Model is by training and by the model after inspection;Fail variable completion module:For judging whether each characteristic variable of data to be assessed is failure variable:If so, then using completion variable corresponding to the failure variable to carry out completion, and the assessment models are inputted,If it is not, then input the assessment models, the failure variable is loss of learning or the incomplete characteristic variable of information;Evaluation module:For making the assessment models be assessed according to the characteristic variable of input, and export evaluation result.
- 10. the credit evaluation device based on data model according to claim 9, it is characterised in that the device also includes commenting Estimate model building module:For classifying to the sample data in training set, classification results are obtained;Tied according to the classification Fruit, logistic regression is carried out to the sample data in the training set, establishes the assessment models;According to the assessment models and survey The sample data of collection is tried, obtains test result;According to the test result, inspection institute states assessment models.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710785997.XA CN107633455A (en) | 2017-09-04 | 2017-09-04 | Credit estimation method and device based on data model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710785997.XA CN107633455A (en) | 2017-09-04 | 2017-09-04 | Credit estimation method and device based on data model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107633455A true CN107633455A (en) | 2018-01-26 |
Family
ID=61101092
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710785997.XA Pending CN107633455A (en) | 2017-09-04 | 2017-09-04 | Credit estimation method and device based on data model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107633455A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108256691A (en) * | 2018-02-08 | 2018-07-06 | 成都智宝大数据科技有限公司 | Refund Probabilistic Prediction Model construction method and device |
CN108960505A (en) * | 2018-05-31 | 2018-12-07 | 试金石信用服务有限公司 | Quantitative estimation method, device, system and the storage medium of personal finance credit |
CN110060144A (en) * | 2019-03-18 | 2019-07-26 | 平安科技(深圳)有限公司 | Amount model training method, amount appraisal procedure, device, equipment and medium |
CN110490419A (en) * | 2019-07-19 | 2019-11-22 | 珠海市岭南大数据研究院 | Processing method, device, computer equipment and the storage medium of Bus information data |
CN111192149A (en) * | 2019-11-25 | 2020-05-22 | 泰康保险集团股份有限公司 | Method and device for generating underwriting result data |
CN111210109A (en) * | 2019-12-20 | 2020-05-29 | 上海淇玥信息技术有限公司 | Method and device for predicting user risk based on associated user and electronic equipment |
CN111797994A (en) * | 2020-06-28 | 2020-10-20 | 北京百度网讯科技有限公司 | Risk assessment method, device, equipment and storage medium |
CN112906765A (en) * | 2021-02-01 | 2021-06-04 | 中国建设银行股份有限公司 | RBF neural network-based customer money laundering risk grading method and system |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101493913A (en) * | 2008-01-23 | 2009-07-29 | 阿里巴巴集团控股有限公司 | Method and system for assessing user credit in internet |
CN101996381A (en) * | 2009-08-14 | 2011-03-30 | 中国工商银行股份有限公司 | Method and system for calculating retail asset risk |
CN102376067A (en) * | 2010-08-20 | 2012-03-14 | 许威 | Scorecard system based on financial credit loan and realization method for scorecard system |
CN105469123A (en) * | 2015-12-30 | 2016-04-06 | 华东理工大学 | Missing data completion method based on k plane regression |
CN106021505A (en) * | 2016-05-23 | 2016-10-12 | 百度在线网络技术(北京)有限公司 | Processing method and apparatus of value parameters of big data factors |
CN106295855A (en) * | 2016-07-28 | 2017-01-04 | 上海财经大学 | The instruction flow method of prediction stock price index futures market anomalies fluctuation |
CN106600455A (en) * | 2016-11-25 | 2017-04-26 | 国网河南省电力公司电力科学研究院 | Electric charge sensitivity assessment method based on logistic regression |
CN106779457A (en) * | 2016-12-29 | 2017-05-31 | 深圳微众税银信息服务有限公司 | A kind of rating business credit method and system |
CN106779755A (en) * | 2016-12-31 | 2017-05-31 | 湖南文沥征信数据服务有限公司 | A kind of network electric business borrows or lends money methods of risk assessment and model |
CN107169534A (en) * | 2017-07-04 | 2017-09-15 | 北京京东尚科信息技术有限公司 | Model training method and device, storage medium, electronic equipment |
-
2017
- 2017-09-04 CN CN201710785997.XA patent/CN107633455A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101493913A (en) * | 2008-01-23 | 2009-07-29 | 阿里巴巴集团控股有限公司 | Method and system for assessing user credit in internet |
CN101996381A (en) * | 2009-08-14 | 2011-03-30 | 中国工商银行股份有限公司 | Method and system for calculating retail asset risk |
CN102376067A (en) * | 2010-08-20 | 2012-03-14 | 许威 | Scorecard system based on financial credit loan and realization method for scorecard system |
CN105469123A (en) * | 2015-12-30 | 2016-04-06 | 华东理工大学 | Missing data completion method based on k plane regression |
CN106021505A (en) * | 2016-05-23 | 2016-10-12 | 百度在线网络技术(北京)有限公司 | Processing method and apparatus of value parameters of big data factors |
CN106295855A (en) * | 2016-07-28 | 2017-01-04 | 上海财经大学 | The instruction flow method of prediction stock price index futures market anomalies fluctuation |
CN106600455A (en) * | 2016-11-25 | 2017-04-26 | 国网河南省电力公司电力科学研究院 | Electric charge sensitivity assessment method based on logistic regression |
CN106779457A (en) * | 2016-12-29 | 2017-05-31 | 深圳微众税银信息服务有限公司 | A kind of rating business credit method and system |
CN106779755A (en) * | 2016-12-31 | 2017-05-31 | 湖南文沥征信数据服务有限公司 | A kind of network electric business borrows or lends money methods of risk assessment and model |
CN107169534A (en) * | 2017-07-04 | 2017-09-15 | 北京京东尚科信息技术有限公司 | Model training method and device, storage medium, electronic equipment |
Non-Patent Citations (2)
Title |
---|
GUSTAVO E A P A. BATISTA 等: "An analysis of four missing data", 《APPLIED ARTIFICIAL INTELLIGENCE》 * |
宋焕林: "数据挖掘中的数据缺失处理", 《河套学院学报》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108256691A (en) * | 2018-02-08 | 2018-07-06 | 成都智宝大数据科技有限公司 | Refund Probabilistic Prediction Model construction method and device |
CN108960505A (en) * | 2018-05-31 | 2018-12-07 | 试金石信用服务有限公司 | Quantitative estimation method, device, system and the storage medium of personal finance credit |
CN110060144A (en) * | 2019-03-18 | 2019-07-26 | 平安科技(深圳)有限公司 | Amount model training method, amount appraisal procedure, device, equipment and medium |
CN110060144B (en) * | 2019-03-18 | 2024-01-30 | 平安科技(深圳)有限公司 | Method for training credit model, method, device, equipment and medium for evaluating credit |
CN110490419A (en) * | 2019-07-19 | 2019-11-22 | 珠海市岭南大数据研究院 | Processing method, device, computer equipment and the storage medium of Bus information data |
CN111192149A (en) * | 2019-11-25 | 2020-05-22 | 泰康保险集团股份有限公司 | Method and device for generating underwriting result data |
CN111210109A (en) * | 2019-12-20 | 2020-05-29 | 上海淇玥信息技术有限公司 | Method and device for predicting user risk based on associated user and electronic equipment |
CN111797994A (en) * | 2020-06-28 | 2020-10-20 | 北京百度网讯科技有限公司 | Risk assessment method, device, equipment and storage medium |
CN111797994B (en) * | 2020-06-28 | 2024-04-05 | 北京百度网讯科技有限公司 | Risk assessment method, apparatus, device and storage medium |
CN112906765A (en) * | 2021-02-01 | 2021-06-04 | 中国建设银行股份有限公司 | RBF neural network-based customer money laundering risk grading method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107633455A (en) | Credit estimation method and device based on data model | |
Huang et al. | Enterprise credit risk evaluation based on neural network algorithm | |
CN107633030A (en) | Credit estimation method and device based on data model | |
CN107633265A (en) | For optimizing the data processing method and device of credit evaluation model | |
Calderon et al. | A roadmap for future neural networks research in auditing and risk assessment | |
CN110246031A (en) | Appraisal procedure, system, equipment and the storage medium of business standing | |
CN107203774A (en) | The method and device that the belonging kinds of data are predicted | |
CN110415111A (en) | Merge the method for logistic regression credit examination & approval with expert features based on user data | |
Ereiz | Predicting default loans using machine learning (OptiML) | |
Dbouk et al. | Towards a machine learning approach for earnings manipulation detection | |
CN109345050A (en) | A kind of quantization transaction prediction technique, device and equipment | |
Ruyu et al. | A comparison of credit rating classification models based on spark-evidence from lending-club | |
CN107590735A (en) | Data digging method and device for credit evaluation | |
CN107766500A (en) | The auditing method of fixed assets card | |
Chimonaki et al. | Identification of financial statement fraud in Greece by using computational intelligence techniques | |
Chen et al. | Mixed credit scoring model of logistic regression and evidence weight in the background of big data | |
CN116911994B (en) | External trade risk early warning system | |
Hui et al. | The model and empirical research of application scoring based on data mining methods | |
Yang et al. | An evidential reasoning rule-based ensemble learning approach for evaluating credit risks with customer heterogeneity | |
CN114612239A (en) | Stock public opinion monitoring and wind control system based on algorithm, big data and artificial intelligence | |
Hiwase et al. | Review on application of data mining in life insurance | |
Kristjanpoller et al. | An empirical application of a hybrid ANFIS model to predict household over-indebtedness | |
Sun et al. | A new perspective of credit scoring for small and medium-sized enterprises based on invoice data | |
KR102334923B1 (en) | Loan expansion hypothesis testing system using artificial intelligence and method using the same | |
CN117994016A (en) | Method for constructing retail credit risk prediction model and consumer credit business Scorebeta model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180126 |