CN107808246A - The intelligent evaluation method and system of collage-credit data - Google Patents
The intelligent evaluation method and system of collage-credit data Download PDFInfo
- Publication number
- CN107808246A CN107808246A CN201711015906.0A CN201711015906A CN107808246A CN 107808246 A CN107808246 A CN 107808246A CN 201711015906 A CN201711015906 A CN 201711015906A CN 107808246 A CN107808246 A CN 107808246A
- Authority
- CN
- China
- Prior art keywords
- data
- collage
- characteristic
- intelligent evaluation
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/03—Credit; Loans; Processing thereof
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- Development Economics (AREA)
- Economics (AREA)
- General Business, Economics & Management (AREA)
- Educational Administration (AREA)
- Entrepreneurship & Innovation (AREA)
- Physics & Mathematics (AREA)
- Finance (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Marketing (AREA)
- Accounting & Taxation (AREA)
- Technology Law (AREA)
- Game Theory and Decision Science (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses the intelligent evaluation method and system of a collage-credit data, wherein method comprises the following steps:Sampling procedure, the step of for being sampled acquisition to data;Problem determination procedure, the step of being currently needed for which kind of model being handled data according to for determination;Sample partiting step, the step of for classifying to sample;Characteristic processing step, the step of for being analyzed characteristic and being handled;Calculation step, the step of computing is so as to generate assessment result data and feedback data is carried out to characteristic;Ginseng step is adjusted, for according to the step of "current" model and feedback data progress parameter adjustment;Model evaluation and verification step, the step of for assessing "current" model.The intelligent evaluation method and system of collage-credit data provided by the present invention, the degree of accuracy is high, and stability is strong, and iteration is fast, meets quickly to examine the business scenario made loans, can reach more preferable credit separating capacity.
Description
Technical field
The present invention relates to computer and big data technical field, more particularly to the intelligent evaluation method of collage-credit data and it is
System.
Background technology
It is well known that the assessment of the reference and loan repayment capacity of loan application people is in the weight of credit Industry risk control work
Weight.In the prior art, the reference of loan application people is assessed, carried out by strong financial characteristic, these strong financial features
Data mainly obtain from the collage-credit data of Central Bank.However, the collage-credit data of Central Bank and imperfection, the loan application of significant proportion
People does not have reference record in the collage-credit data storehouse of Central Bank, thus can not just get the reference number of these loan applications people yet
According to.In the prior art, for the loan application of this part loan application people, on the one hand credit is reduced by advancing loan interest rate
The risk of mechanism, the mode that refusal is made loans on the other hand is taken to prevent the generation of the risk of credit agency.This just gives finance letter
The business development for borrowing mechanism brings significant limitation.
On the other hand, developing rapidly with internet and big data technology, transaction data, social class number such as electric business
According to, network behavior data etc., although single or low volume data can not directly react, personal reference is horizontal and loan repayment capacity,
It is the combination of the mass data or many-sided data of long term accumulation, the reference that can but react personal to a certain extent is horizontal and goes back
Money ability, these data are turned into weak financial characteristic by us.If a kind of method, by for a long time to these weak financial numbers
According to acquisition, accumulation, statistics and analysis and constantly enhancing obtain, accumulation, statistics and analysis these data methods, can be credit row
The assessment of the reference and loan repayment capacity of industry provides important technical support, is advantageous to the development of finance and credit industry.
The content of the invention
It is an object of the invention to provide a kind of intelligent evaluation method and system of collage-credit data.
The intelligent evaluation method of collage-credit data provided by the present invention, comprises the following steps:Sampling procedure, for data
The step of being sampled acquisition;Problem determination procedure, for determining to be currently needed for which kind of model to be handled data according to
Step;Sample partiting step, the step of for classifying to sample;Characteristic processing step, for dividing characteristic
The step of analysis and processing;Calculation step, computing is carried out to characteristic so as to generate the step of assessment result data and feedback data
Suddenly;Ginseng step is adjusted, for according to the step of "current" model and feedback data progress parameter adjustment;Model evaluation and verification step,
The step of for assessing "current" model.
The sampling procedure includes the step of oversampling step, sub- sampling step, ranking operation or uses special algorithm mould
The step of intending generation group sample data.Described problem determines that step is to judge to work as by sampling the special parameter of characteristic
Preceding computing solves the problems, such as it is classification problem, regression problem.When described problem determines step judged result for current operation institute
Solve the problems, such as when being classification problem, determine whether that two parts of classes of the problem of current operation solves the problems, such as or more classification are asked
Topic.The step of sample partiting step includes the data for sampling acquisition being divided into training set.The sample partiting step bag
Include the step of data for sampling acquisition are divided into checking collection.The sample partiting step includes that the data division obtained will be sampled
For test set the step of.The characteristic processing step, including:Tagsort step, for characteristic to be determined as into continuous spy
The step of levying data or discrete features data;Feature pre-treatment step, the step of for being pre-processed to characteristic;Feature
Engineering step, for carrying out the step of modelling is handled to characteristic;Feature selection step, for selecting characteristic
The step of selecting property calculation process.The tagsort step includes, and when sampled data is characterized as numeric type data, is classified
For continuous characteristic.The tagsort step includes, when sampled data is characterized as discrete data, be classified as from
Dissipate characteristic.The feature pre-treatment step includes, when sampled data is characterized as continuous characteristic, to this feature data
Carry out missing values pretreatment, exceptional value pretreatment, the pretreatment of numerical characteristics discretization, data characteristics normalization pretreatment, numerical value
One or complex item pre-treatment step in Feature Conversion pretreatment.The feature pre-treatment step includes, when sampled data is special
Levy for discrete features data when, this feature data are carried out in missing values pretreatments, the pretreatment of drop base, one-hot coding pretreatment
One or complex item pre-treatment step.The Feature Engineering step includes, time derivative feature data generation step, for basis
Time category feature data calculate the step of generation time derivative feature data.The time derivative feature data are joined including the time difference
Number data.The Feature Engineering step includes, space derivative feature data generation step, based on according to spatial class characteristic
The step of calculating generation space derivative feature data.The space derivative feature data include region value parameter data or region is poor
Supplemental characteristic.The Feature Engineering step includes, characteristic combination step, for characteristic to be combined to the step of computing
Suddenly.The characteristic combination step, including assemblage characteristic data operation step and characteristic intersect step.The feature choosing
Select step, including dimensionality reduction calculation step or subset Selecting operation step.The subset Selecting operation step selects including filtering type
Calculation step or packaging type Selecting operation step or embedded Selecting operation step.
The intelligent evaluation system of collage-credit data provided by the present invention, including:Sampling module, for being sampled to data
The module of acquisition;Problem determination module, for determining to be currently needed for the module which kind of model to be handled data according to;Sample
Division module, for the module classified to sample;Feature processing block, for what is analyzed characteristic and handled
Module;Computing module, computing is carried out to characteristic so as to generate the module of assessment result data and feedback data;Adjust moduli
Block, for carrying out the module of parameter adjustment according to "current" model and feedback data;Model evaluation and authentication module, for current
The module that model is assessed.The sampling module includes oversampling submodule, sub- sampling submodule, group sample is added
Weigh the submodule of computing or the submodule using special algorithm simulation generation group sample data.Described problem determining module is use
In judging that current operation solves the problems, such as classification problem or regression problem by sampling the special parameter of characteristic
Module.Described problem determining module is to be additionally operable to when described problem determining module judged result is asked by what current operation solved
When topic is classification problem, the module of two parts of classes of the problem of current operation solves the problems, such as or more classification problems is judged.The sample
This division module includes for the data for sampling acquisition being divided into the submodule of training set;The sample division module includes will sampling
The data of acquisition are divided into the submodule of checking collection;The sample division module includes the data for sampling acquisition being divided into test
The submodule of collection;The feature processing block, including:Tagsort submodule, for characteristic to be determined as into continuous feature
Data or discrete features data;Feature pre-processes submodule, for being pre-processed to characteristic;Feature Engineering submodule,
For carrying out modelling processing to characteristic;Feature selecting submodule, for carrying out selective calculation process to characteristic.
The tagsort submodule includes, and when sampled data is characterized as numeric type data, is classified as continuous characteristic
Modular unit.The tagsort submodule includes, and when sampled data is characterized as discrete data, is classified as discrete spy
Levy the modular unit of data.Feature pretreatment submodule includes, for when sampled data is characterized as continuous characteristic,
This feature data are carried out with missing values pretreatment, exceptional value pretreatment, the pretreatment of numerical characteristics discretization, data characteristics normalization
One or complex item pretreatment in pretreatment, numerical characteristics conversion pretreatment.Feature pretreatment submodule includes, when taking
When sample data characteristics is discrete features data, this feature data are carried out with missing values pretreatment, drop base pre-processes, one-hot coding is pre-
One in processing or complex item pretreatment.The Feature Engineering submodule includes time derivative feature data generation module list
Member, for calculating generation time derivative feature data according to time category feature data.When the time derivative feature data include
Between poor supplemental characteristic.The Feature Engineering submodule includes space derivative feature data generation module unit, for according to space
Category feature data calculate generation space derivative feature data.The space derivative feature data include region value parameter data,
Region difference supplemental characteristic.The Feature Engineering submodule includes, characteristic composite module unit, for characteristic to be carried out
Combinatorial operation.The characteristic composite module unit, including assemblage characteristic data operation submodule unit and characteristic are handed over
Fork modular unit.The feature selecting submodule, including dimensionality reduction computing module unit or subset Selecting operation modular unit.Institute
Stating subset Selecting operation modular unit includes filtering type Selecting operation submodule unit or packaging type Selecting operation submodule unit
Or embedded Selecting operation submodule unit.
The intelligent evaluation method and system of collage-credit data provided by the present invention, by weak financial characteristic to credit wind
Danger is assessed and decision-making, with credit risk is assessed by strong financial characteristic in the prior art and decision-making formed with
Effect is complementary.The intelligent evaluation method and system of collage-credit data provided by the present invention obtain data by internet, in real time generation
Higher-dimension variable, more accurately judge the refund wish of individual, loan repayment capacity, and refund potentiality, so as to supplement credit attribute,
The financial service of fair high quality is provided for objective group.The intelligent evaluation method and system of collage-credit data provided by the present invention, mould
The accuracy of type is high, while internet data has a promptness mostly, renewal frequency height, more can concentrated expression client current letter
Use situation.The intelligent evaluation method and system of collage-credit data provided by the present invention, the degree of accuracy is high, and stability is strong, and iteration is fast, full
Foot quickly examines the business scenario made loans, and can reach more preferable credit separating capacity.
Brief description of the drawings
The intelligent evaluation system schematic diagram of collage-credit data described in Fig. 1 positions embodiment of the present invention two.
Embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention
In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is
Part of the embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art
The every other embodiment obtained under the premise of creative work is not made, belongs to the scope of protection of the invention.
Embodiment one
The present embodiment provides a kind of intelligent evaluation method of collage-credit data, comprises the following steps:
Sampling procedure, the step of for being sampled acquisition to data;It will be understood by those skilled in the art that for credit operation
Business scenario, such as from about fulfil the loan application people of refund ratio be far longer than promise breaking loan application people ratio, such as directly
Raw data acquisition is directly used, it is difficult to fully grasp the data and rule of promise breaking loan application people.The sampling step,
For being sampled to initial data and according to the sampling number of promise breaking loan application people and the loan application people for such as from about fulfiling refund
According to being weighted, so as to more fully react many data and rule.
Problem determination procedure, the step of being currently needed for which kind of model being handled data according to for determination;
Sample partiting step, the step of for classifying to sample;
Characteristic processing step, the step of for being analyzed characteristic and being handled;
Calculation step, the step of computing is so as to generate assessment result data and feedback data is carried out to characteristic;
Ginseng step is adjusted, for according to the step of "current" model and feedback data progress parameter adjustment;
Model evaluation and verification step, the step of for assessing "current" model;
Receive empirical value it will be understood by those skilled in the art that the parameter adjustment can use and carry out grid in empirical value annex
Search for realize, to generate more preferable operational model.It is described that "current" model is assessed, be according to index parameter assess its
Accuracy, validity and stability on data set.It will be understood by those skilled in the art that the model evaluation and verification step
For implementation model predicted value to the conversion scored and the evaluation index for calculating model.It will be understood by those skilled in the art that
Most of machine learning methods, a sample is predicted, can all export 0-1 value, it is big close to 1 Default Probability, close to 0 promise breaking
Probability is small, but this is not directly perceived enough, it is impossible to which used in business, general probability score conversion is realized in machine learning engine design
Function, nearly all disaggregated model output probability can be made to be converted to fico points(Referred to 0-1000 points of normal state point), help industry
Business decision-making, meanwhile, utilize AUC, KS values, PSI values, score distribution and rate of violation trend, the intersection square of two Rating Models
Battle array, helps speed up model policy making.
Further, the sampling procedure includes oversampling step.
Further, the sampling procedure includes sub- sampling step.
Further, the step of sampling procedure includes group sample is weighted.
Further, the step of sampling procedure is also included using special algorithm simulation generation group sample data.
Further, described problem determines step to judge that current operation is solved by sampling the special parameter of characteristic
Certainly the problem of is classification problem, regression problem.
Further, when described problem determine step judged result by current operation solve the problems, such as be classification problem when,
Determine whether two parts of classes of the problem of current operation solves the problems, such as or more classification problems.
Further, the step of sample partiting step includes the data for sampling acquisition being divided into training set;
Further, the step of sample partiting step includes the data for sampling acquisition being divided into checking collection;
Further, the step of sample partiting step includes the data for sampling acquisition being divided into test set;
It will be understood by those skilled in the art that the training set and checking collection come from same distributed data of same period, it is described
Test set is across issue evidence, i.e. the data from different time sections, for precision of the testing model in the case where business changes
And stability.
Further, the characteristic processing step, including:
Tagsort step, the step of for characteristic to be determined as into continuous characteristic or discrete features data.This area
Technical staff is it is understood that characteristic includes continuous characteristic and discrete features data, for continuous characteristic and discrete
Characteristic will use different method for subsequent processing, therefore, it is necessary to will sample the characteristic progress tagsort obtained.
Feature Engineering step, for carrying out the step of modelling is handled to characteristic.
Feature selection step, for characteristic carry out selective calculation process the step of.
Further, the tagsort step includes, when sampled data is characterized as numeric type data, the company of being classified as
Continuous characteristic.
Further, the tagsort step includes, when sampled data is characterized as discrete data, be classified as from
Dissipate characteristic.
Further, the feature pre-treatment step includes, when sampled data is characterized as continuous characteristic, to this feature
Data carry out missing values pretreatment, exceptional value pre-processes, numerical characteristics discretization pre-processes, data characteristics normalization pre-processes,
One or complex item pre-treatment step in numerical characteristics conversion pretreatment.
Further, the feature pre-treatment step includes, when sampled data is characterized as discrete features data, to this feature
Data carry out one or complex item pre-treatment step in missing values pretreatment, the pretreatment of drop base, one-hot coding pretreatment.
Further, the Feature Engineering step includes, time derivative feature data generation step, for special according to time class
Levy the step of data calculate generation time derivative feature data.
Further, the time derivative feature data include time difference supplemental characteristic.
Further, the Feature Engineering step includes, space derivative feature data generation step, for special according to spatial class
Levy the step of data calculate generation space derivative feature data.
Further, the space derivative feature data include region value parameter data, region difference supplemental characteristic.
Further, the Feature Engineering step includes, characteristic combination step, for characteristic to be combined into fortune
The step of calculation.It characteristic is combined computing can use simple linear model it will be understood by those skilled in the art that described;
Also complex Logic Regression Models or model-naive Bayesian can be used, can so strengthen the non-linear of data characteristics and hand over
Fork property.
Further, the characteristic combination step, including assemblage characteristic data operation step and characteristic cross steps
Suddenly.
Further, the feature selection step, including dimensionality reduction calculation step.
Further, the feature selection step, in addition to subset Selecting operation step.
Further, the subset Selecting operation step includes filtering type Selecting operation step;Those skilled in the art can be with
Understand, filtering type Selecting operation can be by feature importance index, such as IV, information gain, and gini index carries out computing.
Further, the subset Selecting operation step includes packaging type Selecting operation step;Those skilled in the art can be with
Understand, the packaging type Selecting operation step can be realized by random search, the mode of heuristic search.
Further, the subset Selecting operation step includes embedded Selecting operation step;Those skilled in the art can be with
Understand, the embedded Selecting operation step can be realized by the regularization of the norm such as L1-Lasso, L2-Ridge, Assembled tree.
Embodiment two
The present embodiment provides a kind of intelligent evaluation system of collage-credit data, including:
Sampling module, for data to be sampled with the module of acquisition;It will be understood by those skilled in the art that for credit operation
Business scenario, such as from about fulfil the loan application people of refund ratio be far longer than promise breaking loan application people ratio, such as directly
Raw data acquisition is directly used, it is difficult to fully grasp the data and rule of promise breaking loan application people.The sampling module,
For being sampled to initial data and according to the sampling number of promise breaking loan application people and the loan application people for such as from about fulfiling refund
According to being weighted, so as to more fully react many data and rule.
Problem determination module, for determining to be currently needed for the module which kind of model to be handled data according to;
Sample division module, for the module classified to sample;
Feature processing block, for the module analyzed characteristic and handled;
Computing module, computing is carried out to characteristic so as to generate the module of assessment result data and feedback data;
Moduli block is adjusted, for carrying out the module of parameter adjustment according to "current" model and feedback data;
Model evaluation and authentication module, for the module assessed "current" model.It will be understood by those skilled in the art that institute
Stating parameter adjustment can be using receiving empirical value and being realized in empirical value annex progress grid search, to generate more preferable computing mould
Type.It is described that "current" model is assessed, it is that its accuracy on data set, validity and stably are assessed according to index parameter
Property.
Receive empirical value it will be understood by those skilled in the art that the parameter adjustment can use and carried out in empirical value annex
Grid search is realized.It will be understood by those skilled in the art that the model evaluation and authentication module are predicted for implementation model
It is worth the conversion of scoring and calculates the evaluation index of model.It will be understood by those skilled in the art that most of machine learning sides
Method, a sample is predicted, can all export 0-1 value, it is big close to 1 Default Probability, it is small close to 0 Default Probability, but this
It is not directly perceived enough, it is impossible to which that used in business, general probability score translation function is realized in machine learning engine design, can make almost institute
There is disaggregated model output probability to be converted to fico points(Referred to 0-1000 points of normal state point), operational decision making is helped, meanwhile, utilize
AUC, KS values, PSI values, score distribution and rate of violation trend, the cross matrix of two Rating Models, help speed up model political affairs
Plan is formulated.
Further, the sampling module includes oversampling submodule.
Further, the sampling module includes sub- sampling submodule.
Further, the sampling module includes the submodule that group sample is weighted.
Further, the sampling module also includes the submodule using special algorithm simulation generation group sample data.
Further, described problem determining module is for judging current operation by sampling the special parameter of characteristic
The problem of solved is classification problem or the module of regression problem;
Further, described problem determining module is is additionally operable to when described problem determining module judged result is solved by current operation
The problem of when being classification problem, judge the module of two parts of classes of the problem of current operation solves the problems, such as or more classification problems.
Further, the sample division module includes for the data for sampling acquisition being divided into the submodule of training set;
Further, the sample division module includes for the data for sampling acquisition being divided into the submodule of checking collection;
Further, the sample division module includes for the data for sampling acquisition being divided into the submodule of test set;
It will be understood by those skilled in the art that the training set and checking collection come from same distributed data of same period, it is described
Test set is across issue evidence, i.e. the data from different time sections, for precision of the testing model in the case where business changes
And stability.
Further, the feature processing block, including:
Tagsort submodule, for characteristic to be determined as into continuous characteristic or discrete features data.Art technology
Personnel are it is understood that characteristic includes continuous characteristic and discrete features data, for continuous characteristic and discrete features
Data will use different subsequent treatments, therefore, it is necessary to will sample the characteristic progress tagsort obtained.
Feature pre-processes submodule, for being pre-processed to characteristic.
Feature Engineering submodule, for carrying out modelling processing to characteristic.
Feature selecting submodule, for carrying out selective calculation process to characteristic.
Further, the tagsort submodule includes, and when sampled data is characterized as numeric type data, is classified as
The modular unit of continuous characteristic.
Further, the tagsort submodule includes, and when sampled data is characterized as discrete data, is classified as
The modular unit of discrete features data.
Further, feature pretreatment submodule includes, right for when sampled data is characterized as continuous characteristic
This feature data carry out missing values pretreatment, exceptional value pretreatment, the pretreatment of numerical characteristics discretization, data characteristics normalization in advance
One or complex item pretreatment in processing, numerical characteristics conversion pretreatment.
Further, the feature pretreatment submodule includes, when sampled data is characterized as discrete features data, to the spy
Levy one or complex item pretreatment during data carry out missing values pretreatment, drop base pre-processes, one-hot coding pre-processes.
Further, the Feature Engineering submodule includes time derivative feature data generation module unit, for according to when
Between category feature data calculate generation time derivative feature data.
Further, the time derivative feature data include time difference supplemental characteristic.
Further, the Feature Engineering submodule includes space derivative feature data generation module unit, for according to sky
Between category feature data calculate generation space derivative feature data.
Further, the space derivative feature data include region value parameter data, region difference supplemental characteristic.
Further, the Feature Engineering submodule includes, characteristic composite module unit, for characteristic to be carried out
Combinatorial operation.It characteristic is combined computing can use simple linear model it will be understood by those skilled in the art that described;
Also complex Logic Regression Models or model-naive Bayesian can be used, can so strengthen the non-linear of data characteristics and hand over
Fork property.
Further, the characteristic composite module unit, including assemblage characteristic data operation submodule unit and feature
Data cross submodule unit.
Further, the feature selecting submodule, including dimensionality reduction computing module unit.
Further, the feature selecting submodule, in addition to subset Selecting operation modular unit.
Further, the subset Selecting operation modular unit includes filtering type Selecting operation submodule unit;This area skill
Art personnel are appreciated that filtering type Selecting operation can be by feature importance index, such as IV, information gain, and gini index enters
Row computing.
Further, the subset Selecting operation modular unit includes packaging type Selecting operation submodule unit;This area skill
Art personnel are appreciated that the packaging type Selecting operation can be realized by random search, the mode of heuristic search.
Further, the subset Selecting operation modular unit includes embedded Selecting operation submodule unit;This area skill
Art personnel are appreciated that the embedded Selecting operation can be real by the regularization of the norm such as L1-Lasso, L2-Ridge, Assembled tree
It is existing.
Finally it should be noted that:The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although
The present invention is described in detail with reference to the foregoing embodiments, it will be understood by those within the art that:It still may be used
To be modified to the technical scheme described in foregoing embodiments, or equivalent substitution is carried out to which part technical characteristic;
And these modification or replace, do not make appropriate technical solution essence depart from various embodiments of the present invention technical scheme spirit and
Scope.
Claims (46)
1. a kind of intelligent evaluation method of collage-credit data, it is characterised in that comprise the following steps:
Sampling procedure, the step of for being sampled acquisition to data;
Problem determination procedure, the step of being currently needed for which kind of model being handled data according to for determination;
Sample partiting step, the step of for classifying to sample;
Characteristic processing step, the step of for being analyzed characteristic and being handled;
Calculation step, the step of computing is so as to generate assessment result data and feedback data is carried out to characteristic;
Ginseng step is adjusted, for according to the step of "current" model and feedback data progress parameter adjustment;
Model evaluation and verification step, the step of for assessing "current" model.
2. the intelligent evaluation method of collage-credit data as claimed in claim 1, it is characterised in that the sampling procedure included taking out
Sample step.
3. the intelligent evaluation method of collage-credit data as claimed in claim 1, it is characterised in that the sampling procedure includes owing to take out
Sample step.
4. the intelligent evaluation method of collage-credit data as claimed in claim 1, it is characterised in that the sampling procedure is included to small
The step of class sample is weighted.
5. the intelligent evaluation method of collage-credit data as claimed in claim 1, it is characterised in that the sampling procedure also includes adopting
The step of generation group sample data being simulated with special algorithm.
6. the intelligent evaluation method of collage-credit data as claimed in claim 1, it is characterised in that described problem determines that step is logical
The special parameter of over sampling characteristic is come to judge that current operation solves the problems, such as be classification problem, regression problem.
7. the intelligent evaluation method of collage-credit data as claimed in claim 6, it is characterised in that when described problem determines that step is sentenced
Disconnected result by current operation solve the problems, such as be classification problem when, determine whether that current operation solves the problems, such as two parts of classes
Problem or more classification problems.
8. the intelligent evaluation method of collage-credit data as claimed in claim 1, it is characterised in that the sample partiting step includes
The step of data for sampling acquisition are divided into training set.
9. the intelligent evaluation method of collage-credit data as claimed in claim 1, it is characterised in that the sample partiting step includes
The step of data for sampling acquisition are divided into checking collection.
10. the intelligent evaluation method of collage-credit data as claimed in claim 1, it is characterised in that the sample partiting step bag
Include the step of data for sampling acquisition are divided into test set.
11. the intelligent evaluation method of collage-credit data as claimed in claim 1, it is characterised in that the characteristic processing step, bag
Include:
Tagsort step, the step of for characteristic to be determined as into continuous characteristic or discrete features data;
Feature pre-treatment step, the step of for being pre-processed to characteristic;
Feature Engineering step, for carrying out the step of modelling is handled to characteristic;
Feature selection step, for characteristic carry out selective calculation process the step of.
12. the intelligent evaluation method of collage-credit data as claimed in claim 11, it is characterised in that the tagsort step bag
Include, when sampled data is characterized as numeric type data, be classified as continuous characteristic.
13. the intelligent evaluation method of collage-credit data as claimed in claim 11, it is characterised in that the tagsort step bag
Include, when sampled data is characterized as discrete data, be classified as discrete features data.
14. the intelligent evaluation method of collage-credit data as claimed in claim 11, it is characterised in that the feature pre-treatment step
Including when sampled data is characterized as continuous characteristic, carrying out missing values pretreatment to this feature data, exceptional value is located in advance
One or plural number in reason, the pretreatment of numerical characteristics discretization, data characteristics normalization pretreatment, numerical characteristics conversion pretreatment
Item pre-treatment step.
15. the intelligent evaluation method of collage-credit data as claimed in claim 11, it is characterised in that the feature pre-treatment step
Including, when sampled data is characterized as discrete features data, this feature data are carried out missing values pretreatments, the pretreatment of drop base,
One or complex item pre-treatment step in one-hot coding pretreatment.
16. the intelligent evaluation method of collage-credit data as claimed in claim 11, it is characterised in that the Feature Engineering step bag
Include, time derivative feature data generation step, for calculating generation time derivative feature data according to time category feature data
Step.
17. the intelligent evaluation method of collage-credit data as claimed in claim 16, it is characterised in that the time derivative feature number
According to including time difference supplemental characteristic.
18. the intelligent evaluation method of collage-credit data as claimed in claim 11, it is characterised in that the Feature Engineering step bag
Include, space derivative feature data generation step, for calculating generation space derivative feature data according to spatial class characteristic
Step.
19. the intelligent evaluation method of collage-credit data as claimed in claim 18, it is characterised in that the space derivative feature number
According to including region value parameter data or region difference supplemental characteristic.
20. the intelligent evaluation method of collage-credit data as claimed in claim 11, it is characterised in that the Feature Engineering step bag
Include, characteristic combination step, the step of for characteristic to be combined into computing.
21. the intelligent evaluation method of collage-credit data as claimed in claim 20, it is characterised in that the characteristic combination step
Suddenly, including assemblage characteristic data operation step and characteristic intersect step.
22. the intelligent evaluation method of collage-credit data as claimed in claim 21, it is characterised in that the feature selection step,
Including dimensionality reduction calculation step or subset Selecting operation step.
23. the intelligent evaluation method of collage-credit data as claimed in claim 22, it is characterised in that the subset Selecting operation step
Suddenly filtering type Selecting operation step or packaging type Selecting operation step or embedded Selecting operation step are included.
A kind of 24. intelligent evaluation system of collage-credit data, it is characterised in that including:
Sampling module, for data to be sampled with the module of acquisition;
Problem determination module, for determining to be currently needed for the module which kind of model to be handled data according to;
Sample division module, for the module classified to sample;
Feature processing block, for the module analyzed characteristic and handled;
Computing module, computing is carried out to characteristic so as to generate the module of assessment result data and feedback data;
Moduli block is adjusted, for carrying out the module of parameter adjustment according to "current" model and feedback data;
Model evaluation and authentication module, for the module assessed "current" model.
25. the intelligent evaluation system of collage-credit data as claimed in claim 24, it is characterised in that the sampling module included
Sampling submodule.
26. the intelligent evaluation system of collage-credit data as claimed in claim 24, it is characterised in that the sampling module includes owing
Sampling submodule.
27. the intelligent evaluation system of collage-credit data as claimed in claim 24, it is characterised in that the sampling module include pair
The submodule that group sample is weighted.
28. the intelligent evaluation system of collage-credit data as claimed in claim 24, it is characterised in that the sampling module also includes
The submodule of generation group sample data is simulated using special algorithm.
29. the intelligent evaluation system of collage-credit data as claimed in claim 24, it is characterised in that described problem determining module is
For judging that current operation solves the problems, such as that classification problem or recurrence are asked by sampling the special parameter of characteristic
The module of topic.
30. the intelligent evaluation system of collage-credit data as claimed in claim 29, it is characterised in that described problem determining module is
Be additionally operable to when described problem determining module judged result by current operation solve the problems, such as be classification problem when, judge current fortune
Calculate the module of two parts of classes of the problem of solving the problems, such as or more classification problems.
31. the intelligent evaluation system of collage-credit data as claimed in claim 24, it is characterised in that the sample division module bag
Include the submodule that the data for sampling acquisition are divided into training set.
32. the intelligent evaluation system of collage-credit data as claimed in claim 24, it is characterised in that the sample division module bag
Include the submodule that the data for sampling acquisition are divided into checking collection.
33. the intelligent evaluation system of collage-credit data as claimed in claim 24, it is characterised in that the sample division module bag
Include the submodule that the data for sampling acquisition are divided into test set.
34. the intelligent evaluation system of collage-credit data as claimed in claim 24, it is characterised in that the feature processing block,
Including:
Tagsort submodule, for characteristic to be determined as into continuous characteristic or discrete features data;
Feature pre-processes submodule, for being pre-processed to characteristic;
Feature Engineering submodule, for carrying out modelling processing to characteristic;
Feature selecting submodule, for carrying out selective calculation process to characteristic.
35. the intelligent evaluation system of collage-credit data as claimed in claim 34, it is characterised in that the tagsort submodule
Including when sampled data is characterized as numeric type data, being classified as the modular unit of continuous characteristic.
36. the intelligent evaluation system of collage-credit data as claimed in claim 34, it is characterised in that the tagsort submodule
Including when sampled data is characterized as discrete data, being classified as the modular unit of discrete features data.
37. the intelligent evaluation system of collage-credit data as claimed in claim 34, it is characterised in that the feature pre-processes submodule
Block includes, for when sampled data is characterized as continuous characteristic, missing values pretreatment, exceptional value to be carried out to this feature data
Pretreatment, numerical characteristics discretization pretreatment, data characteristics normalization pretreatment, numerical characteristics conversion pretreatment in one or
Complex item pre-processes.
38. the intelligent evaluation system of collage-credit data as claimed in claim 34, it is characterised in that the feature pre-processes submodule
Block includes, and when sampled data is characterized as discrete features data, this feature data is carried out with missing values pretreatment, drop base is located in advance
One in reason, one-hot coding pretreatment or complex item pretreatment.
39. the intelligent evaluation system of collage-credit data as claimed in claim 34, it is characterised in that the Feature Engineering submodule
Including time derivative feature data generation module unit, for calculating generation time derivative feature number according to time category feature data
According to.
40. the intelligent evaluation system of collage-credit data as claimed in claim 39, it is characterised in that the time derivative feature number
According to including time difference supplemental characteristic.
41. the intelligent evaluation system of collage-credit data as claimed in claim 34, it is characterised in that the Feature Engineering submodule
Including space derivative feature data generation module unit, for calculating generation space derivative feature number according to spatial class characteristic
According to.
42. the intelligent evaluation system of collage-credit data as claimed in claim 41, it is characterised in that the space derivative feature number
According to including region value parameter data, region difference supplemental characteristic.
43. the intelligent evaluation system of collage-credit data as claimed in claim 34, it is characterised in that the Feature Engineering submodule
Including characteristic composite module unit, for characteristic to be combined into computing.
44. the intelligent evaluation system of collage-credit data as claimed in claim 43, it is characterised in that the characteristic combination die
Module unit, including assemblage characteristic data operation submodule unit and characteristic intersect submodule unit.
45. the intelligent evaluation system of collage-credit data as claimed in claim 34, it is characterised in that the feature selecting submodule
Block, including dimensionality reduction computing module unit or subset Selecting operation modular unit.
46. the intelligent evaluation system of collage-credit data as claimed in claim 45, it is characterised in that the subset Selecting operation mould
Module unit includes filtering type Selecting operation submodule unit or packaging type Selecting operation submodule unit or embedded Selecting operation
Submodule unit.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711015906.0A CN107808246A (en) | 2017-10-26 | 2017-10-26 | The intelligent evaluation method and system of collage-credit data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711015906.0A CN107808246A (en) | 2017-10-26 | 2017-10-26 | The intelligent evaluation method and system of collage-credit data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107808246A true CN107808246A (en) | 2018-03-16 |
Family
ID=61591179
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711015906.0A Pending CN107808246A (en) | 2017-10-26 | 2017-10-26 | The intelligent evaluation method and system of collage-credit data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107808246A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108875815A (en) * | 2018-06-04 | 2018-11-23 | 深圳市研信小额贷款有限公司 | Feature Engineering variable determines method and device |
CN109614609A (en) * | 2018-11-06 | 2019-04-12 | 阿里巴巴集团控股有限公司 | Method for establishing model and device |
CN109670724A (en) * | 2018-12-29 | 2019-04-23 | 重庆誉存大数据科技有限公司 | Methods of risk assessment and device |
CN109685574A (en) * | 2018-12-25 | 2019-04-26 | 拉扎斯网络科技(上海)有限公司 | Data determination method, device, electronic equipment and computer readable storage medium |
CN109903095A (en) * | 2019-03-01 | 2019-06-18 | 上海拉扎斯信息科技有限公司 | Data processing method, device, electronic equipment and computer readable storage medium |
CN110648211A (en) * | 2018-06-07 | 2020-01-03 | 埃森哲环球解决方案有限公司 | Data validation |
CN111652710A (en) * | 2020-06-03 | 2020-09-11 | 北京化工大学 | Personal credit risk assessment method based on ensemble tree feature extraction and Logistic regression |
WO2021042556A1 (en) * | 2019-09-03 | 2021-03-11 | 平安科技(深圳)有限公司 | Classification model training method, apparatus and device, and computer-readable storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6112190A (en) * | 1997-08-19 | 2000-08-29 | Citibank, N.A. | Method and system for commercial credit analysis |
CN106897918A (en) * | 2017-02-24 | 2017-06-27 | 上海易贷网金融信息服务有限公司 | A kind of hybrid machine learning credit scoring model construction method |
CN107248030A (en) * | 2017-05-26 | 2017-10-13 | 谢首鹏 | A kind of bond Risk Forecast Method and system based on machine learning algorithm |
-
2017
- 2017-10-26 CN CN201711015906.0A patent/CN107808246A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6112190A (en) * | 1997-08-19 | 2000-08-29 | Citibank, N.A. | Method and system for commercial credit analysis |
CN106897918A (en) * | 2017-02-24 | 2017-06-27 | 上海易贷网金融信息服务有限公司 | A kind of hybrid machine learning credit scoring model construction method |
CN107248030A (en) * | 2017-05-26 | 2017-10-13 | 谢首鹏 | A kind of bond Risk Forecast Method and system based on machine learning algorithm |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108875815A (en) * | 2018-06-04 | 2018-11-23 | 深圳市研信小额贷款有限公司 | Feature Engineering variable determines method and device |
CN110648211A (en) * | 2018-06-07 | 2020-01-03 | 埃森哲环球解决方案有限公司 | Data validation |
CN110648211B (en) * | 2018-06-07 | 2023-10-03 | 埃森哲环球解决方案有限公司 | data verification |
CN109614609A (en) * | 2018-11-06 | 2019-04-12 | 阿里巴巴集团控股有限公司 | Method for establishing model and device |
CN109614609B (en) * | 2018-11-06 | 2023-05-05 | 创新先进技术有限公司 | Model building method and device |
CN109685574A (en) * | 2018-12-25 | 2019-04-26 | 拉扎斯网络科技(上海)有限公司 | Data determination method, device, electronic equipment and computer readable storage medium |
CN109670724A (en) * | 2018-12-29 | 2019-04-23 | 重庆誉存大数据科技有限公司 | Methods of risk assessment and device |
CN109903095A (en) * | 2019-03-01 | 2019-06-18 | 上海拉扎斯信息科技有限公司 | Data processing method, device, electronic equipment and computer readable storage medium |
WO2021042556A1 (en) * | 2019-09-03 | 2021-03-11 | 平安科技(深圳)有限公司 | Classification model training method, apparatus and device, and computer-readable storage medium |
CN111652710A (en) * | 2020-06-03 | 2020-09-11 | 北京化工大学 | Personal credit risk assessment method based on ensemble tree feature extraction and Logistic regression |
CN111652710B (en) * | 2020-06-03 | 2024-01-30 | 北京化工大学 | Personal credit risk assessment method based on integrated tree feature extraction and Logistic regression |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107808246A (en) | The intelligent evaluation method and system of collage-credit data | |
Ma et al. | Financial credit risk prediction in internet finance driven by machine learning | |
CN110415111A (en) | Merge the method for logistic regression credit examination & approval with expert features based on user data | |
CN107330785A (en) | A kind of petty load system and method based on the intelligent air control of big data | |
CN106779755A (en) | A kind of network electric business borrows or lends money methods of risk assessment and model | |
CN110111198A (en) | User's financial risks predictor method, device, electronic equipment and readable medium | |
CN106326984A (en) | User intention identification method and device and automatic answering system | |
Nilsson | Investment decisions in a public bureaucracy: A case study of Swedish road planning practices | |
CN110378786A (en) | Model training method, promise breaking conduction Risk Identification Method, device and storage medium | |
Gao | The use of machine learning combined with data mining technology in financial risk prevention | |
CN106484919A (en) | A kind of industrial sustainability sorting technique based on webpage autonomous word and system | |
CN108549907A (en) | A kind of data verification method based on multi-source transfer learning | |
CN113344700A (en) | Wind control model construction method and device based on multi-objective optimization and electronic equipment | |
CN109063045A (en) | A kind of financial service method and financial service terminal | |
Lin | Innovative risk early warning model under data mining approach in risk assessment of internet credit finance | |
CN108564465A (en) | A kind of enterprise credit management method | |
Shinde et al. | Loan prediction system using machine learning | |
Liu et al. | An innovative model fusion algorithm to improve the recall rate of peer-to-peer lending default customers | |
Li et al. | Prediction of Unbalanced Financial Risk Based on GRA-TOPSIS and SMOTE-CNN | |
Chen et al. | Financial distress prediction using data mining techniques | |
Mittal et al. | A study on credit risk assessment in banking sector using data mining techniques | |
Mao et al. | Information system construction and research on preference of model by multi-class decision tree regression | |
Peng | Research on credit risk identification of Internet financial enterprises based on big data | |
Pang et al. | Wt model & applications in loan platform customer default prediction based on decision tree algorithms | |
Xi et al. | Improved AHP model and neural network for consumer finance credit risk assessment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180316 |
|
RJ01 | Rejection of invention patent application after publication |