CN107808246A - The intelligent evaluation method and system of collage-credit data - Google Patents

The intelligent evaluation method and system of collage-credit data Download PDF

Info

Publication number
CN107808246A
CN107808246A CN201711015906.0A CN201711015906A CN107808246A CN 107808246 A CN107808246 A CN 107808246A CN 201711015906 A CN201711015906 A CN 201711015906A CN 107808246 A CN107808246 A CN 107808246A
Authority
CN
China
Prior art keywords
data
collage
characteristic
intelligent evaluation
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711015906.0A
Other languages
Chinese (zh)
Inventor
金家芳
陈斌
匡文豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Weixin Hui Chi Financial Technologies Ltd
Original Assignee
Shanghai Weixin Hui Chi Financial Technologies Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Weixin Hui Chi Financial Technologies Ltd filed Critical Shanghai Weixin Hui Chi Financial Technologies Ltd
Priority to CN201711015906.0A priority Critical patent/CN107808246A/en
Publication of CN107808246A publication Critical patent/CN107808246A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Physics & Mathematics (AREA)
  • Finance (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Marketing (AREA)
  • Accounting & Taxation (AREA)
  • Technology Law (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses the intelligent evaluation method and system of a collage-credit data, wherein method comprises the following steps:Sampling procedure, the step of for being sampled acquisition to data;Problem determination procedure, the step of being currently needed for which kind of model being handled data according to for determination;Sample partiting step, the step of for classifying to sample;Characteristic processing step, the step of for being analyzed characteristic and being handled;Calculation step, the step of computing is so as to generate assessment result data and feedback data is carried out to characteristic;Ginseng step is adjusted, for according to the step of "current" model and feedback data progress parameter adjustment;Model evaluation and verification step, the step of for assessing "current" model.The intelligent evaluation method and system of collage-credit data provided by the present invention, the degree of accuracy is high, and stability is strong, and iteration is fast, meets quickly to examine the business scenario made loans, can reach more preferable credit separating capacity.

Description

The intelligent evaluation method and system of collage-credit data
Technical field
The present invention relates to computer and big data technical field, more particularly to the intelligent evaluation method of collage-credit data and it is System.
Background technology
It is well known that the assessment of the reference and loan repayment capacity of loan application people is in the weight of credit Industry risk control work Weight.In the prior art, the reference of loan application people is assessed, carried out by strong financial characteristic, these strong financial features Data mainly obtain from the collage-credit data of Central Bank.However, the collage-credit data of Central Bank and imperfection, the loan application of significant proportion People does not have reference record in the collage-credit data storehouse of Central Bank, thus can not just get the reference number of these loan applications people yet According to.In the prior art, for the loan application of this part loan application people, on the one hand credit is reduced by advancing loan interest rate The risk of mechanism, the mode that refusal is made loans on the other hand is taken to prevent the generation of the risk of credit agency.This just gives finance letter The business development for borrowing mechanism brings significant limitation.
On the other hand, developing rapidly with internet and big data technology, transaction data, social class number such as electric business According to, network behavior data etc., although single or low volume data can not directly react, personal reference is horizontal and loan repayment capacity, It is the combination of the mass data or many-sided data of long term accumulation, the reference that can but react personal to a certain extent is horizontal and goes back Money ability, these data are turned into weak financial characteristic by us.If a kind of method, by for a long time to these weak financial numbers According to acquisition, accumulation, statistics and analysis and constantly enhancing obtain, accumulation, statistics and analysis these data methods, can be credit row The assessment of the reference and loan repayment capacity of industry provides important technical support, is advantageous to the development of finance and credit industry.
The content of the invention
It is an object of the invention to provide a kind of intelligent evaluation method and system of collage-credit data.
The intelligent evaluation method of collage-credit data provided by the present invention, comprises the following steps:Sampling procedure, for data The step of being sampled acquisition;Problem determination procedure, for determining to be currently needed for which kind of model to be handled data according to Step;Sample partiting step, the step of for classifying to sample;Characteristic processing step, for dividing characteristic The step of analysis and processing;Calculation step, computing is carried out to characteristic so as to generate the step of assessment result data and feedback data Suddenly;Ginseng step is adjusted, for according to the step of "current" model and feedback data progress parameter adjustment;Model evaluation and verification step, The step of for assessing "current" model.
The sampling procedure includes the step of oversampling step, sub- sampling step, ranking operation or uses special algorithm mould The step of intending generation group sample data.Described problem determines that step is to judge to work as by sampling the special parameter of characteristic Preceding computing solves the problems, such as it is classification problem, regression problem.When described problem determines step judged result for current operation institute Solve the problems, such as when being classification problem, determine whether that two parts of classes of the problem of current operation solves the problems, such as or more classification are asked Topic.The step of sample partiting step includes the data for sampling acquisition being divided into training set.The sample partiting step bag Include the step of data for sampling acquisition are divided into checking collection.The sample partiting step includes that the data division obtained will be sampled For test set the step of.The characteristic processing step, including:Tagsort step, for characteristic to be determined as into continuous spy The step of levying data or discrete features data;Feature pre-treatment step, the step of for being pre-processed to characteristic;Feature Engineering step, for carrying out the step of modelling is handled to characteristic;Feature selection step, for selecting characteristic The step of selecting property calculation process.The tagsort step includes, and when sampled data is characterized as numeric type data, is classified For continuous characteristic.The tagsort step includes, when sampled data is characterized as discrete data, be classified as from Dissipate characteristic.The feature pre-treatment step includes, when sampled data is characterized as continuous characteristic, to this feature data Carry out missing values pretreatment, exceptional value pretreatment, the pretreatment of numerical characteristics discretization, data characteristics normalization pretreatment, numerical value One or complex item pre-treatment step in Feature Conversion pretreatment.The feature pre-treatment step includes, when sampled data is special Levy for discrete features data when, this feature data are carried out in missing values pretreatments, the pretreatment of drop base, one-hot coding pretreatment One or complex item pre-treatment step.The Feature Engineering step includes, time derivative feature data generation step, for basis Time category feature data calculate the step of generation time derivative feature data.The time derivative feature data are joined including the time difference Number data.The Feature Engineering step includes, space derivative feature data generation step, based on according to spatial class characteristic The step of calculating generation space derivative feature data.The space derivative feature data include region value parameter data or region is poor Supplemental characteristic.The Feature Engineering step includes, characteristic combination step, for characteristic to be combined to the step of computing Suddenly.The characteristic combination step, including assemblage characteristic data operation step and characteristic intersect step.The feature choosing Select step, including dimensionality reduction calculation step or subset Selecting operation step.The subset Selecting operation step selects including filtering type Calculation step or packaging type Selecting operation step or embedded Selecting operation step.
The intelligent evaluation system of collage-credit data provided by the present invention, including:Sampling module, for being sampled to data The module of acquisition;Problem determination module, for determining to be currently needed for the module which kind of model to be handled data according to;Sample Division module, for the module classified to sample;Feature processing block, for what is analyzed characteristic and handled Module;Computing module, computing is carried out to characteristic so as to generate the module of assessment result data and feedback data;Adjust moduli Block, for carrying out the module of parameter adjustment according to "current" model and feedback data;Model evaluation and authentication module, for current The module that model is assessed.The sampling module includes oversampling submodule, sub- sampling submodule, group sample is added Weigh the submodule of computing or the submodule using special algorithm simulation generation group sample data.Described problem determining module is use In judging that current operation solves the problems, such as classification problem or regression problem by sampling the special parameter of characteristic Module.Described problem determining module is to be additionally operable to when described problem determining module judged result is asked by what current operation solved When topic is classification problem, the module of two parts of classes of the problem of current operation solves the problems, such as or more classification problems is judged.The sample This division module includes for the data for sampling acquisition being divided into the submodule of training set;The sample division module includes will sampling The data of acquisition are divided into the submodule of checking collection;The sample division module includes the data for sampling acquisition being divided into test The submodule of collection;The feature processing block, including:Tagsort submodule, for characteristic to be determined as into continuous feature Data or discrete features data;Feature pre-processes submodule, for being pre-processed to characteristic;Feature Engineering submodule, For carrying out modelling processing to characteristic;Feature selecting submodule, for carrying out selective calculation process to characteristic. The tagsort submodule includes, and when sampled data is characterized as numeric type data, is classified as continuous characteristic Modular unit.The tagsort submodule includes, and when sampled data is characterized as discrete data, is classified as discrete spy Levy the modular unit of data.Feature pretreatment submodule includes, for when sampled data is characterized as continuous characteristic, This feature data are carried out with missing values pretreatment, exceptional value pretreatment, the pretreatment of numerical characteristics discretization, data characteristics normalization One or complex item pretreatment in pretreatment, numerical characteristics conversion pretreatment.Feature pretreatment submodule includes, when taking When sample data characteristics is discrete features data, this feature data are carried out with missing values pretreatment, drop base pre-processes, one-hot coding is pre- One in processing or complex item pretreatment.The Feature Engineering submodule includes time derivative feature data generation module list Member, for calculating generation time derivative feature data according to time category feature data.When the time derivative feature data include Between poor supplemental characteristic.The Feature Engineering submodule includes space derivative feature data generation module unit, for according to space Category feature data calculate generation space derivative feature data.The space derivative feature data include region value parameter data, Region difference supplemental characteristic.The Feature Engineering submodule includes, characteristic composite module unit, for characteristic to be carried out Combinatorial operation.The characteristic composite module unit, including assemblage characteristic data operation submodule unit and characteristic are handed over Fork modular unit.The feature selecting submodule, including dimensionality reduction computing module unit or subset Selecting operation modular unit.Institute Stating subset Selecting operation modular unit includes filtering type Selecting operation submodule unit or packaging type Selecting operation submodule unit Or embedded Selecting operation submodule unit.
The intelligent evaluation method and system of collage-credit data provided by the present invention, by weak financial characteristic to credit wind Danger is assessed and decision-making, with credit risk is assessed by strong financial characteristic in the prior art and decision-making formed with Effect is complementary.The intelligent evaluation method and system of collage-credit data provided by the present invention obtain data by internet, in real time generation Higher-dimension variable, more accurately judge the refund wish of individual, loan repayment capacity, and refund potentiality, so as to supplement credit attribute, The financial service of fair high quality is provided for objective group.The intelligent evaluation method and system of collage-credit data provided by the present invention, mould The accuracy of type is high, while internet data has a promptness mostly, renewal frequency height, more can concentrated expression client current letter Use situation.The intelligent evaluation method and system of collage-credit data provided by the present invention, the degree of accuracy is high, and stability is strong, and iteration is fast, full Foot quickly examines the business scenario made loans, and can reach more preferable credit separating capacity.
Brief description of the drawings
The intelligent evaluation system schematic diagram of collage-credit data described in Fig. 1 positions embodiment of the present invention two.
Embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is Part of the embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art The every other embodiment obtained under the premise of creative work is not made, belongs to the scope of protection of the invention.
Embodiment one
The present embodiment provides a kind of intelligent evaluation method of collage-credit data, comprises the following steps:
Sampling procedure, the step of for being sampled acquisition to data;It will be understood by those skilled in the art that for credit operation Business scenario, such as from about fulfil the loan application people of refund ratio be far longer than promise breaking loan application people ratio, such as directly Raw data acquisition is directly used, it is difficult to fully grasp the data and rule of promise breaking loan application people.The sampling step, For being sampled to initial data and according to the sampling number of promise breaking loan application people and the loan application people for such as from about fulfiling refund According to being weighted, so as to more fully react many data and rule.
Problem determination procedure, the step of being currently needed for which kind of model being handled data according to for determination;
Sample partiting step, the step of for classifying to sample;
Characteristic processing step, the step of for being analyzed characteristic and being handled;
Calculation step, the step of computing is so as to generate assessment result data and feedback data is carried out to characteristic;
Ginseng step is adjusted, for according to the step of "current" model and feedback data progress parameter adjustment;
Model evaluation and verification step, the step of for assessing "current" model;
Receive empirical value it will be understood by those skilled in the art that the parameter adjustment can use and carry out grid in empirical value annex Search for realize, to generate more preferable operational model.It is described that "current" model is assessed, be according to index parameter assess its Accuracy, validity and stability on data set.It will be understood by those skilled in the art that the model evaluation and verification step For implementation model predicted value to the conversion scored and the evaluation index for calculating model.It will be understood by those skilled in the art that Most of machine learning methods, a sample is predicted, can all export 0-1 value, it is big close to 1 Default Probability, close to 0 promise breaking Probability is small, but this is not directly perceived enough, it is impossible to which used in business, general probability score conversion is realized in machine learning engine design Function, nearly all disaggregated model output probability can be made to be converted to fico points(Referred to 0-1000 points of normal state point), help industry Business decision-making, meanwhile, utilize AUC, KS values, PSI values, score distribution and rate of violation trend, the intersection square of two Rating Models Battle array, helps speed up model policy making.
Further, the sampling procedure includes oversampling step.
Further, the sampling procedure includes sub- sampling step.
Further, the step of sampling procedure includes group sample is weighted.
Further, the step of sampling procedure is also included using special algorithm simulation generation group sample data.
Further, described problem determines step to judge that current operation is solved by sampling the special parameter of characteristic Certainly the problem of is classification problem, regression problem.
Further, when described problem determine step judged result by current operation solve the problems, such as be classification problem when, Determine whether two parts of classes of the problem of current operation solves the problems, such as or more classification problems.
Further, the step of sample partiting step includes the data for sampling acquisition being divided into training set;
Further, the step of sample partiting step includes the data for sampling acquisition being divided into checking collection;
Further, the step of sample partiting step includes the data for sampling acquisition being divided into test set;
It will be understood by those skilled in the art that the training set and checking collection come from same distributed data of same period, it is described Test set is across issue evidence, i.e. the data from different time sections, for precision of the testing model in the case where business changes And stability.
Further, the characteristic processing step, including:
Tagsort step, the step of for characteristic to be determined as into continuous characteristic or discrete features data.This area Technical staff is it is understood that characteristic includes continuous characteristic and discrete features data, for continuous characteristic and discrete Characteristic will use different method for subsequent processing, therefore, it is necessary to will sample the characteristic progress tagsort obtained.
Feature Engineering step, for carrying out the step of modelling is handled to characteristic.
Feature selection step, for characteristic carry out selective calculation process the step of.
Further, the tagsort step includes, when sampled data is characterized as numeric type data, the company of being classified as Continuous characteristic.
Further, the tagsort step includes, when sampled data is characterized as discrete data, be classified as from Dissipate characteristic.
Further, the feature pre-treatment step includes, when sampled data is characterized as continuous characteristic, to this feature Data carry out missing values pretreatment, exceptional value pre-processes, numerical characteristics discretization pre-processes, data characteristics normalization pre-processes, One or complex item pre-treatment step in numerical characteristics conversion pretreatment.
Further, the feature pre-treatment step includes, when sampled data is characterized as discrete features data, to this feature Data carry out one or complex item pre-treatment step in missing values pretreatment, the pretreatment of drop base, one-hot coding pretreatment.
Further, the Feature Engineering step includes, time derivative feature data generation step, for special according to time class Levy the step of data calculate generation time derivative feature data.
Further, the time derivative feature data include time difference supplemental characteristic.
Further, the Feature Engineering step includes, space derivative feature data generation step, for special according to spatial class Levy the step of data calculate generation space derivative feature data.
Further, the space derivative feature data include region value parameter data, region difference supplemental characteristic.
Further, the Feature Engineering step includes, characteristic combination step, for characteristic to be combined into fortune The step of calculation.It characteristic is combined computing can use simple linear model it will be understood by those skilled in the art that described; Also complex Logic Regression Models or model-naive Bayesian can be used, can so strengthen the non-linear of data characteristics and hand over Fork property.
Further, the characteristic combination step, including assemblage characteristic data operation step and characteristic cross steps Suddenly.
Further, the feature selection step, including dimensionality reduction calculation step.
Further, the feature selection step, in addition to subset Selecting operation step.
Further, the subset Selecting operation step includes filtering type Selecting operation step;Those skilled in the art can be with Understand, filtering type Selecting operation can be by feature importance index, such as IV, information gain, and gini index carries out computing.
Further, the subset Selecting operation step includes packaging type Selecting operation step;Those skilled in the art can be with Understand, the packaging type Selecting operation step can be realized by random search, the mode of heuristic search.
Further, the subset Selecting operation step includes embedded Selecting operation step;Those skilled in the art can be with Understand, the embedded Selecting operation step can be realized by the regularization of the norm such as L1-Lasso, L2-Ridge, Assembled tree.
Embodiment two
The present embodiment provides a kind of intelligent evaluation system of collage-credit data, including:
Sampling module, for data to be sampled with the module of acquisition;It will be understood by those skilled in the art that for credit operation Business scenario, such as from about fulfil the loan application people of refund ratio be far longer than promise breaking loan application people ratio, such as directly Raw data acquisition is directly used, it is difficult to fully grasp the data and rule of promise breaking loan application people.The sampling module, For being sampled to initial data and according to the sampling number of promise breaking loan application people and the loan application people for such as from about fulfiling refund According to being weighted, so as to more fully react many data and rule.
Problem determination module, for determining to be currently needed for the module which kind of model to be handled data according to;
Sample division module, for the module classified to sample;
Feature processing block, for the module analyzed characteristic and handled;
Computing module, computing is carried out to characteristic so as to generate the module of assessment result data and feedback data;
Moduli block is adjusted, for carrying out the module of parameter adjustment according to "current" model and feedback data;
Model evaluation and authentication module, for the module assessed "current" model.It will be understood by those skilled in the art that institute Stating parameter adjustment can be using receiving empirical value and being realized in empirical value annex progress grid search, to generate more preferable computing mould Type.It is described that "current" model is assessed, it is that its accuracy on data set, validity and stably are assessed according to index parameter Property.
Receive empirical value it will be understood by those skilled in the art that the parameter adjustment can use and carried out in empirical value annex Grid search is realized.It will be understood by those skilled in the art that the model evaluation and authentication module are predicted for implementation model It is worth the conversion of scoring and calculates the evaluation index of model.It will be understood by those skilled in the art that most of machine learning sides Method, a sample is predicted, can all export 0-1 value, it is big close to 1 Default Probability, it is small close to 0 Default Probability, but this It is not directly perceived enough, it is impossible to which that used in business, general probability score translation function is realized in machine learning engine design, can make almost institute There is disaggregated model output probability to be converted to fico points(Referred to 0-1000 points of normal state point), operational decision making is helped, meanwhile, utilize AUC, KS values, PSI values, score distribution and rate of violation trend, the cross matrix of two Rating Models, help speed up model political affairs Plan is formulated.
Further, the sampling module includes oversampling submodule.
Further, the sampling module includes sub- sampling submodule.
Further, the sampling module includes the submodule that group sample is weighted.
Further, the sampling module also includes the submodule using special algorithm simulation generation group sample data.
Further, described problem determining module is for judging current operation by sampling the special parameter of characteristic The problem of solved is classification problem or the module of regression problem;
Further, described problem determining module is is additionally operable to when described problem determining module judged result is solved by current operation The problem of when being classification problem, judge the module of two parts of classes of the problem of current operation solves the problems, such as or more classification problems.
Further, the sample division module includes for the data for sampling acquisition being divided into the submodule of training set;
Further, the sample division module includes for the data for sampling acquisition being divided into the submodule of checking collection;
Further, the sample division module includes for the data for sampling acquisition being divided into the submodule of test set;
It will be understood by those skilled in the art that the training set and checking collection come from same distributed data of same period, it is described Test set is across issue evidence, i.e. the data from different time sections, for precision of the testing model in the case where business changes And stability.
Further, the feature processing block, including:
Tagsort submodule, for characteristic to be determined as into continuous characteristic or discrete features data.Art technology Personnel are it is understood that characteristic includes continuous characteristic and discrete features data, for continuous characteristic and discrete features Data will use different subsequent treatments, therefore, it is necessary to will sample the characteristic progress tagsort obtained.
Feature pre-processes submodule, for being pre-processed to characteristic.
Feature Engineering submodule, for carrying out modelling processing to characteristic.
Feature selecting submodule, for carrying out selective calculation process to characteristic.
Further, the tagsort submodule includes, and when sampled data is characterized as numeric type data, is classified as The modular unit of continuous characteristic.
Further, the tagsort submodule includes, and when sampled data is characterized as discrete data, is classified as The modular unit of discrete features data.
Further, feature pretreatment submodule includes, right for when sampled data is characterized as continuous characteristic This feature data carry out missing values pretreatment, exceptional value pretreatment, the pretreatment of numerical characteristics discretization, data characteristics normalization in advance One or complex item pretreatment in processing, numerical characteristics conversion pretreatment.
Further, the feature pretreatment submodule includes, when sampled data is characterized as discrete features data, to the spy Levy one or complex item pretreatment during data carry out missing values pretreatment, drop base pre-processes, one-hot coding pre-processes.
Further, the Feature Engineering submodule includes time derivative feature data generation module unit, for according to when Between category feature data calculate generation time derivative feature data.
Further, the time derivative feature data include time difference supplemental characteristic.
Further, the Feature Engineering submodule includes space derivative feature data generation module unit, for according to sky Between category feature data calculate generation space derivative feature data.
Further, the space derivative feature data include region value parameter data, region difference supplemental characteristic.
Further, the Feature Engineering submodule includes, characteristic composite module unit, for characteristic to be carried out Combinatorial operation.It characteristic is combined computing can use simple linear model it will be understood by those skilled in the art that described; Also complex Logic Regression Models or model-naive Bayesian can be used, can so strengthen the non-linear of data characteristics and hand over Fork property.
Further, the characteristic composite module unit, including assemblage characteristic data operation submodule unit and feature Data cross submodule unit.
Further, the feature selecting submodule, including dimensionality reduction computing module unit.
Further, the feature selecting submodule, in addition to subset Selecting operation modular unit.
Further, the subset Selecting operation modular unit includes filtering type Selecting operation submodule unit;This area skill Art personnel are appreciated that filtering type Selecting operation can be by feature importance index, such as IV, information gain, and gini index enters Row computing.
Further, the subset Selecting operation modular unit includes packaging type Selecting operation submodule unit;This area skill Art personnel are appreciated that the packaging type Selecting operation can be realized by random search, the mode of heuristic search.
Further, the subset Selecting operation modular unit includes embedded Selecting operation submodule unit;This area skill Art personnel are appreciated that the embedded Selecting operation can be real by the regularization of the norm such as L1-Lasso, L2-Ridge, Assembled tree It is existing.
Finally it should be noted that:The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although The present invention is described in detail with reference to the foregoing embodiments, it will be understood by those within the art that:It still may be used To be modified to the technical scheme described in foregoing embodiments, or equivalent substitution is carried out to which part technical characteristic; And these modification or replace, do not make appropriate technical solution essence depart from various embodiments of the present invention technical scheme spirit and Scope.

Claims (46)

1. a kind of intelligent evaluation method of collage-credit data, it is characterised in that comprise the following steps:
Sampling procedure, the step of for being sampled acquisition to data;
Problem determination procedure, the step of being currently needed for which kind of model being handled data according to for determination;
Sample partiting step, the step of for classifying to sample;
Characteristic processing step, the step of for being analyzed characteristic and being handled;
Calculation step, the step of computing is so as to generate assessment result data and feedback data is carried out to characteristic;
Ginseng step is adjusted, for according to the step of "current" model and feedback data progress parameter adjustment;
Model evaluation and verification step, the step of for assessing "current" model.
2. the intelligent evaluation method of collage-credit data as claimed in claim 1, it is characterised in that the sampling procedure included taking out Sample step.
3. the intelligent evaluation method of collage-credit data as claimed in claim 1, it is characterised in that the sampling procedure includes owing to take out Sample step.
4. the intelligent evaluation method of collage-credit data as claimed in claim 1, it is characterised in that the sampling procedure is included to small The step of class sample is weighted.
5. the intelligent evaluation method of collage-credit data as claimed in claim 1, it is characterised in that the sampling procedure also includes adopting The step of generation group sample data being simulated with special algorithm.
6. the intelligent evaluation method of collage-credit data as claimed in claim 1, it is characterised in that described problem determines that step is logical The special parameter of over sampling characteristic is come to judge that current operation solves the problems, such as be classification problem, regression problem.
7. the intelligent evaluation method of collage-credit data as claimed in claim 6, it is characterised in that when described problem determines that step is sentenced Disconnected result by current operation solve the problems, such as be classification problem when, determine whether that current operation solves the problems, such as two parts of classes Problem or more classification problems.
8. the intelligent evaluation method of collage-credit data as claimed in claim 1, it is characterised in that the sample partiting step includes The step of data for sampling acquisition are divided into training set.
9. the intelligent evaluation method of collage-credit data as claimed in claim 1, it is characterised in that the sample partiting step includes The step of data for sampling acquisition are divided into checking collection.
10. the intelligent evaluation method of collage-credit data as claimed in claim 1, it is characterised in that the sample partiting step bag Include the step of data for sampling acquisition are divided into test set.
11. the intelligent evaluation method of collage-credit data as claimed in claim 1, it is characterised in that the characteristic processing step, bag Include:
Tagsort step, the step of for characteristic to be determined as into continuous characteristic or discrete features data;
Feature pre-treatment step, the step of for being pre-processed to characteristic;
Feature Engineering step, for carrying out the step of modelling is handled to characteristic;
Feature selection step, for characteristic carry out selective calculation process the step of.
12. the intelligent evaluation method of collage-credit data as claimed in claim 11, it is characterised in that the tagsort step bag Include, when sampled data is characterized as numeric type data, be classified as continuous characteristic.
13. the intelligent evaluation method of collage-credit data as claimed in claim 11, it is characterised in that the tagsort step bag Include, when sampled data is characterized as discrete data, be classified as discrete features data.
14. the intelligent evaluation method of collage-credit data as claimed in claim 11, it is characterised in that the feature pre-treatment step Including when sampled data is characterized as continuous characteristic, carrying out missing values pretreatment to this feature data, exceptional value is located in advance One or plural number in reason, the pretreatment of numerical characteristics discretization, data characteristics normalization pretreatment, numerical characteristics conversion pretreatment Item pre-treatment step.
15. the intelligent evaluation method of collage-credit data as claimed in claim 11, it is characterised in that the feature pre-treatment step Including, when sampled data is characterized as discrete features data, this feature data are carried out missing values pretreatments, the pretreatment of drop base, One or complex item pre-treatment step in one-hot coding pretreatment.
16. the intelligent evaluation method of collage-credit data as claimed in claim 11, it is characterised in that the Feature Engineering step bag Include, time derivative feature data generation step, for calculating generation time derivative feature data according to time category feature data Step.
17. the intelligent evaluation method of collage-credit data as claimed in claim 16, it is characterised in that the time derivative feature number According to including time difference supplemental characteristic.
18. the intelligent evaluation method of collage-credit data as claimed in claim 11, it is characterised in that the Feature Engineering step bag Include, space derivative feature data generation step, for calculating generation space derivative feature data according to spatial class characteristic Step.
19. the intelligent evaluation method of collage-credit data as claimed in claim 18, it is characterised in that the space derivative feature number According to including region value parameter data or region difference supplemental characteristic.
20. the intelligent evaluation method of collage-credit data as claimed in claim 11, it is characterised in that the Feature Engineering step bag Include, characteristic combination step, the step of for characteristic to be combined into computing.
21. the intelligent evaluation method of collage-credit data as claimed in claim 20, it is characterised in that the characteristic combination step Suddenly, including assemblage characteristic data operation step and characteristic intersect step.
22. the intelligent evaluation method of collage-credit data as claimed in claim 21, it is characterised in that the feature selection step, Including dimensionality reduction calculation step or subset Selecting operation step.
23. the intelligent evaluation method of collage-credit data as claimed in claim 22, it is characterised in that the subset Selecting operation step Suddenly filtering type Selecting operation step or packaging type Selecting operation step or embedded Selecting operation step are included.
A kind of 24. intelligent evaluation system of collage-credit data, it is characterised in that including:
Sampling module, for data to be sampled with the module of acquisition;
Problem determination module, for determining to be currently needed for the module which kind of model to be handled data according to;
Sample division module, for the module classified to sample;
Feature processing block, for the module analyzed characteristic and handled;
Computing module, computing is carried out to characteristic so as to generate the module of assessment result data and feedback data;
Moduli block is adjusted, for carrying out the module of parameter adjustment according to "current" model and feedback data;
Model evaluation and authentication module, for the module assessed "current" model.
25. the intelligent evaluation system of collage-credit data as claimed in claim 24, it is characterised in that the sampling module included Sampling submodule.
26. the intelligent evaluation system of collage-credit data as claimed in claim 24, it is characterised in that the sampling module includes owing Sampling submodule.
27. the intelligent evaluation system of collage-credit data as claimed in claim 24, it is characterised in that the sampling module include pair The submodule that group sample is weighted.
28. the intelligent evaluation system of collage-credit data as claimed in claim 24, it is characterised in that the sampling module also includes The submodule of generation group sample data is simulated using special algorithm.
29. the intelligent evaluation system of collage-credit data as claimed in claim 24, it is characterised in that described problem determining module is For judging that current operation solves the problems, such as that classification problem or recurrence are asked by sampling the special parameter of characteristic The module of topic.
30. the intelligent evaluation system of collage-credit data as claimed in claim 29, it is characterised in that described problem determining module is Be additionally operable to when described problem determining module judged result by current operation solve the problems, such as be classification problem when, judge current fortune Calculate the module of two parts of classes of the problem of solving the problems, such as or more classification problems.
31. the intelligent evaluation system of collage-credit data as claimed in claim 24, it is characterised in that the sample division module bag Include the submodule that the data for sampling acquisition are divided into training set.
32. the intelligent evaluation system of collage-credit data as claimed in claim 24, it is characterised in that the sample division module bag Include the submodule that the data for sampling acquisition are divided into checking collection.
33. the intelligent evaluation system of collage-credit data as claimed in claim 24, it is characterised in that the sample division module bag Include the submodule that the data for sampling acquisition are divided into test set.
34. the intelligent evaluation system of collage-credit data as claimed in claim 24, it is characterised in that the feature processing block, Including:
Tagsort submodule, for characteristic to be determined as into continuous characteristic or discrete features data;
Feature pre-processes submodule, for being pre-processed to characteristic;
Feature Engineering submodule, for carrying out modelling processing to characteristic;
Feature selecting submodule, for carrying out selective calculation process to characteristic.
35. the intelligent evaluation system of collage-credit data as claimed in claim 34, it is characterised in that the tagsort submodule Including when sampled data is characterized as numeric type data, being classified as the modular unit of continuous characteristic.
36. the intelligent evaluation system of collage-credit data as claimed in claim 34, it is characterised in that the tagsort submodule Including when sampled data is characterized as discrete data, being classified as the modular unit of discrete features data.
37. the intelligent evaluation system of collage-credit data as claimed in claim 34, it is characterised in that the feature pre-processes submodule Block includes, for when sampled data is characterized as continuous characteristic, missing values pretreatment, exceptional value to be carried out to this feature data Pretreatment, numerical characteristics discretization pretreatment, data characteristics normalization pretreatment, numerical characteristics conversion pretreatment in one or Complex item pre-processes.
38. the intelligent evaluation system of collage-credit data as claimed in claim 34, it is characterised in that the feature pre-processes submodule Block includes, and when sampled data is characterized as discrete features data, this feature data is carried out with missing values pretreatment, drop base is located in advance One in reason, one-hot coding pretreatment or complex item pretreatment.
39. the intelligent evaluation system of collage-credit data as claimed in claim 34, it is characterised in that the Feature Engineering submodule Including time derivative feature data generation module unit, for calculating generation time derivative feature number according to time category feature data According to.
40. the intelligent evaluation system of collage-credit data as claimed in claim 39, it is characterised in that the time derivative feature number According to including time difference supplemental characteristic.
41. the intelligent evaluation system of collage-credit data as claimed in claim 34, it is characterised in that the Feature Engineering submodule Including space derivative feature data generation module unit, for calculating generation space derivative feature number according to spatial class characteristic According to.
42. the intelligent evaluation system of collage-credit data as claimed in claim 41, it is characterised in that the space derivative feature number According to including region value parameter data, region difference supplemental characteristic.
43. the intelligent evaluation system of collage-credit data as claimed in claim 34, it is characterised in that the Feature Engineering submodule Including characteristic composite module unit, for characteristic to be combined into computing.
44. the intelligent evaluation system of collage-credit data as claimed in claim 43, it is characterised in that the characteristic combination die Module unit, including assemblage characteristic data operation submodule unit and characteristic intersect submodule unit.
45. the intelligent evaluation system of collage-credit data as claimed in claim 34, it is characterised in that the feature selecting submodule Block, including dimensionality reduction computing module unit or subset Selecting operation modular unit.
46. the intelligent evaluation system of collage-credit data as claimed in claim 45, it is characterised in that the subset Selecting operation mould Module unit includes filtering type Selecting operation submodule unit or packaging type Selecting operation submodule unit or embedded Selecting operation Submodule unit.
CN201711015906.0A 2017-10-26 2017-10-26 The intelligent evaluation method and system of collage-credit data Pending CN107808246A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711015906.0A CN107808246A (en) 2017-10-26 2017-10-26 The intelligent evaluation method and system of collage-credit data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711015906.0A CN107808246A (en) 2017-10-26 2017-10-26 The intelligent evaluation method and system of collage-credit data

Publications (1)

Publication Number Publication Date
CN107808246A true CN107808246A (en) 2018-03-16

Family

ID=61591179

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711015906.0A Pending CN107808246A (en) 2017-10-26 2017-10-26 The intelligent evaluation method and system of collage-credit data

Country Status (1)

Country Link
CN (1) CN107808246A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875815A (en) * 2018-06-04 2018-11-23 深圳市研信小额贷款有限公司 Feature Engineering variable determines method and device
CN109614609A (en) * 2018-11-06 2019-04-12 阿里巴巴集团控股有限公司 Method for establishing model and device
CN109670724A (en) * 2018-12-29 2019-04-23 重庆誉存大数据科技有限公司 Methods of risk assessment and device
CN109685574A (en) * 2018-12-25 2019-04-26 拉扎斯网络科技(上海)有限公司 Data determination method, device, electronic equipment and computer readable storage medium
CN109903095A (en) * 2019-03-01 2019-06-18 上海拉扎斯信息科技有限公司 Data processing method, device, electronic equipment and computer readable storage medium
CN110648211A (en) * 2018-06-07 2020-01-03 埃森哲环球解决方案有限公司 Data validation
CN111652710A (en) * 2020-06-03 2020-09-11 北京化工大学 Personal credit risk assessment method based on ensemble tree feature extraction and Logistic regression
WO2021042556A1 (en) * 2019-09-03 2021-03-11 平安科技(深圳)有限公司 Classification model training method, apparatus and device, and computer-readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6112190A (en) * 1997-08-19 2000-08-29 Citibank, N.A. Method and system for commercial credit analysis
CN106897918A (en) * 2017-02-24 2017-06-27 上海易贷网金融信息服务有限公司 A kind of hybrid machine learning credit scoring model construction method
CN107248030A (en) * 2017-05-26 2017-10-13 谢首鹏 A kind of bond Risk Forecast Method and system based on machine learning algorithm

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6112190A (en) * 1997-08-19 2000-08-29 Citibank, N.A. Method and system for commercial credit analysis
CN106897918A (en) * 2017-02-24 2017-06-27 上海易贷网金融信息服务有限公司 A kind of hybrid machine learning credit scoring model construction method
CN107248030A (en) * 2017-05-26 2017-10-13 谢首鹏 A kind of bond Risk Forecast Method and system based on machine learning algorithm

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875815A (en) * 2018-06-04 2018-11-23 深圳市研信小额贷款有限公司 Feature Engineering variable determines method and device
CN110648211A (en) * 2018-06-07 2020-01-03 埃森哲环球解决方案有限公司 Data validation
CN110648211B (en) * 2018-06-07 2023-10-03 埃森哲环球解决方案有限公司 data verification
CN109614609A (en) * 2018-11-06 2019-04-12 阿里巴巴集团控股有限公司 Method for establishing model and device
CN109614609B (en) * 2018-11-06 2023-05-05 创新先进技术有限公司 Model building method and device
CN109685574A (en) * 2018-12-25 2019-04-26 拉扎斯网络科技(上海)有限公司 Data determination method, device, electronic equipment and computer readable storage medium
CN109670724A (en) * 2018-12-29 2019-04-23 重庆誉存大数据科技有限公司 Methods of risk assessment and device
CN109903095A (en) * 2019-03-01 2019-06-18 上海拉扎斯信息科技有限公司 Data processing method, device, electronic equipment and computer readable storage medium
WO2021042556A1 (en) * 2019-09-03 2021-03-11 平安科技(深圳)有限公司 Classification model training method, apparatus and device, and computer-readable storage medium
CN111652710A (en) * 2020-06-03 2020-09-11 北京化工大学 Personal credit risk assessment method based on ensemble tree feature extraction and Logistic regression
CN111652710B (en) * 2020-06-03 2024-01-30 北京化工大学 Personal credit risk assessment method based on integrated tree feature extraction and Logistic regression

Similar Documents

Publication Publication Date Title
CN107808246A (en) The intelligent evaluation method and system of collage-credit data
Ma et al. Financial credit risk prediction in internet finance driven by machine learning
CN110415111A (en) Merge the method for logistic regression credit examination & approval with expert features based on user data
CN107330785A (en) A kind of petty load system and method based on the intelligent air control of big data
CN106779755A (en) A kind of network electric business borrows or lends money methods of risk assessment and model
CN110111198A (en) User's financial risks predictor method, device, electronic equipment and readable medium
CN106326984A (en) User intention identification method and device and automatic answering system
Nilsson Investment decisions in a public bureaucracy: A case study of Swedish road planning practices
CN110378786A (en) Model training method, promise breaking conduction Risk Identification Method, device and storage medium
Gao The use of machine learning combined with data mining technology in financial risk prevention
CN106484919A (en) A kind of industrial sustainability sorting technique based on webpage autonomous word and system
CN108549907A (en) A kind of data verification method based on multi-source transfer learning
CN113344700A (en) Wind control model construction method and device based on multi-objective optimization and electronic equipment
CN109063045A (en) A kind of financial service method and financial service terminal
Lin Innovative risk early warning model under data mining approach in risk assessment of internet credit finance
CN108564465A (en) A kind of enterprise credit management method
Shinde et al. Loan prediction system using machine learning
Liu et al. An innovative model fusion algorithm to improve the recall rate of peer-to-peer lending default customers
Li et al. Prediction of Unbalanced Financial Risk Based on GRA-TOPSIS and SMOTE-CNN
Chen et al. Financial distress prediction using data mining techniques
Mittal et al. A study on credit risk assessment in banking sector using data mining techniques
Mao et al. Information system construction and research on preference of model by multi-class decision tree regression
Peng Research on credit risk identification of Internet financial enterprises based on big data
Pang et al. Wt model & applications in loan platform customer default prediction based on decision tree algorithms
Xi et al. Improved AHP model and neural network for consumer finance credit risk assessment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180316

RJ01 Rejection of invention patent application after publication