CN108335200A - A kind of credit rating method that feature based is chosen - Google Patents

A kind of credit rating method that feature based is chosen Download PDF

Info

Publication number
CN108335200A
CN108335200A CN201810414547.4A CN201810414547A CN108335200A CN 108335200 A CN108335200 A CN 108335200A CN 201810414547 A CN201810414547 A CN 201810414547A CN 108335200 A CN108335200 A CN 108335200A
Authority
CN
China
Prior art keywords
ripper
characteristic attribute
rating
credit
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810414547.4A
Other languages
Chinese (zh)
Inventor
杨胜刚
陈佐
赵寒枫
陈邦道
梅雪松
余湘军
李浩之
王芍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN201810414547.4A priority Critical patent/CN108335200A/en
Publication of CN108335200A publication Critical patent/CN108335200A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Abstract

The present invention discloses a kind of credit rating method that feature based is chosen, and step includes:S1. the user credit information collection for model training is obtained, user credit information is extracted respectively and the corresponding characteristic attribute of each information is concentrated to constitute property attribute collection;S2. multiple RIPPER is executed using RIPPER graders to property attribute collection to classify, characteristic attribute is concentrated to screen characteristic attribute according to classification results after each RIPPER classification, characteristic attribute collection after screening is re-started into RIPPER classification, until the RIPPER rating models needed for generating;S3. it inputs the credit information of user to be assessed and extracts corresponding characteristic attribute, the characteristic attribute extracted is input in RIPPER rating models and is classified, obtain the output of credit rating result.The present invention has many advantages, such as that implementation method is simple, data processing amount is small, grading is efficient, grading performance is good and the grading that acquisition can be convenient for be easy to user's understanding is regular.

Description

A kind of credit rating method that feature based is chosen
Technical field
The present invention relates to a kind of credit rating methods that credit evaluation technical field more particularly to feature based are chosen.
Background technology
Credit rating can be divided into two kinds of narrow sense and broad sense, and the credit rating of narrow sense is by independent third party's credit rating Intermediary pays one's debts in full amount as scheduled to debtor the ability of principal and interest and wish is evaluated, and simple grading symbol table is used in combination Show the severity of its default risk and loss, the credit rating of broad sense is then to fulfil related contract to grading object and economy is held The overall assessment of the ability and wish of promise.When credit mechanism receives customers' credit application, in the application form submitted using client Characteristic variable establishes Rating Model and obtains the credit value of applicant, by the value compared with the standard value being previously set, judges The overdue possibility of the borrower, to decide whether to grant credit and the accrediting amount, such credit scoring is to apply for scoring. The methods of marking Main Basiss of application scoring are client personal information, are broadly divided into four parts:When personal essential information, Main name, working condition, inhabitation address, the education degree etc. for including client;Second is that personal transaction record, mainly client A situation arises with the business of financial institution;Third, the personal credit history of client, it is mainly personal from financial institution loan situation, Situation of repaying the loan etc.;Fourth, the open judgement or bankruptcy situation etc. of open record case, mainly law court about client.It is obtaining After taking personal credit information, credit mechanism obtains the credit scoring of client by establishing personal credit Rating Model, and credit is commented Divide and shows the corresponding credit grade of client, and credit mechanism then gives the different accrediting amount of client according to this credit scoring.
To newly submitting the user of application, the relevant information according to offer is needed to carry out credit rating to user, it is right at present The credit rating of user is all mainly to use credit scoring card or machine learning two ways, but the grading based on scorecard Mode is too extensive, poor for the scoring precision effect of individual, and the rating methods based on machine learning then have explanation Difficulty, policymaker are difficult to intuitively understand rule therein, lead to the problem of decision hardly possible, and are usually all using empirical artificial The size of Feature Selection mode or simple Feature Selection algorithm, the input data set of a classification task can be joined by two It counts to describe:Characteristic N and instance number P, often N and P is very big for the data of analysis, and the conference of crossing of N and P cause " dimension disaster " " multiple shot array ", features described above choose mode for multidimensional feature attribute, can cause task amount is big, sorting algorithm realize it is multiple The problems such as miscellaneous, dependence is strong and underaction, to reduce classification effectiveness, is not suitable for the higher credit rating of requirement of real-time In.
Invention content
The technical problem to be solved in the present invention is that:For technical problem of the existing technology, the present invention provides one Kind implementation method is simple, data processing amount is small, grading is efficient, grading performance is good and can be convenient for obtaining being easy to commenting for user's understanding The credit rating method that the feature based of grade rule is chosen.
In order to solve the above technical problems, technical solution proposed by the present invention is:
A kind of credit rating method that feature based is chosen, step include:
S1. characteristic attribute collection extracts:The user credit information collection for model training is obtained, extracts user's letter respectively The corresponding characteristic attribute constitutive characteristic property set of each information is concentrated with information;
S2. the model training that feature based is chosen:Multiple RIPPER (For Repeated are executed to the characteristic attribute collection Incremental Pruning to Produce Error Reduction) classification, according to classification after each RIPPER classification As a result it concentrates characteristic attribute to screen characteristic attribute, the characteristic attribute collection after screening is re-started into RIPPER classification, directly To the RIPPER rating models needed for generation;
S3. credit rating:It inputs the credit information of user to be assessed and extracts corresponding characteristic attribute, the spy that will be extracted Sign attribute, which is input in the RIPPER rating models, classifies, and obtains the output of credit rating result.
As a further improvement on the present invention:Occurrence is deleted out after classifying especially by each RIPPER in the step S2 Number is less than the characteristic attribute of specified threshold, and the characteristic attribute collection after being screened re-starts RIPPER classification, up to what is generated The precision or feature quantity of RIPPER rating models reach preset requirement, obtain final RIPPER rating models output.
As a further improvement on the present invention, in the step S2 generate RIPPER rating models the specific steps are:
S21. classified using RIPPER graders to current signature property set, according to each feature category in classification results Property the number that occurs count the weight of each characteristic attribute, and each characteristic attribute is ranked up according to the weight of statistics, is arranged Characteristic attribute collection after sequence;
S22. the characteristic attribute that characteristic attribute after the sequence concentrates occurrence number to be less than predetermined threshold value is deleted, is obtained more Characteristic attribute collection after new;
S23. the updated characteristic attribute collection step S22 obtained carries out RIPPER classification, judges currently available RIPPER rating models precision or feature quantity whether reach preset requirement, if so, obtaining final RIPPER gradings Model exports, and otherwise returns to step S21.
As a further improvement on the present invention:In the step S2 specifically used ten foldings cross validation mode be trained with Avoid model over-fitting, i.e., training set be divided into 10 parts, will wherein 9 parts as training datas, another be used as test data, pass through After crossing successive ignition, it is chosen at the model that nicety of grading on different test sets reaches corresponding to specified threshold and is trained as current Obtained RIPPER rating models output.
As a further improvement on the present invention:Further include being carried out to described in obtaining using ROC curve in the step S2 RIPPER rating models are assessed, if the RIPPER rating models correspond to the area under the ROC curve calculated default In range, final RIPPER rating models are exported, training is otherwise re-started.
As a further improvement on the present invention, in the step S1 the specific steps are:
S11. it extracts the user credit information and concentrates the corresponding characteristic attribute of each original credit information, obtain characteristic attribute Collection, to the characteristic attribute collection into exporting after data prediction;
S12. different Category Attributes are concentrated to be exported after reunification the characteristic attribute;
S13. composing training collection exports after the step S222 characteristic attribute collection exported being carried out classification grading.
As a further improvement on the present invention:Data prediction is carried out in the step S11, is specifically included the feature Missing values are filled processing in property set, and the characteristic attribute concentrates redundancy value, exceptional value to carry out delete processing;The missing When value is filled processing, a kind of filling in median, mode or Lagrange's interpolation specifically is used to concentrated missing values Mode uses context filling mode to discrete type missing values.
As a further improvement on the present invention:The user credit information include user base information, user's loaning bill information, Refund in user's liability information, user's history designated time period overdue information, user's future it is specified between section domestic demand refund information, It is one or more in user's bid information and user's liability information.
As a further improvement on the present invention:The characteristic attribute extracted is input to the RIPPER in the step S3 When being classified in rating model, initial credit rating result is specifically exported by the RIPPER rating models, according to described first The classifying rules of beginning credit rating result and the RIPPER rating models in carrying out assorting process obtains final grading As a result it exports.
As a further improvement on the present invention:When the step S2 generates the RIPPER rating models, it is specifically based on Adaboost (Adaptive Boostin, adaptive to enhance) algorithm uses multiple RIPPER graders to be trained as Weak Classifier It obtains, and when each RIPPER classifier trainings, what selected section training set sample and a upper RIPPER grader obtained The combination of partial error sample constitutes final training sample, and ADB strong classifiers are obtained simultaneously by each Weak Classifier after the completion of training As final RIPPER rating models.
Compared with the prior art, the advantages of the present invention are as follows:
1) the present invention is based on the credit rating method of Feature Selection, the spies such as retractility, the regularization of RIPPER are made full use of Property, by extracting the characteristic attribute of user credit information, carries out repeatedly classification using RIPPER graders and commented with building RIPPER Grade model, reuses the RIPPER rating models and grades to the credit of new user, and grading is efficient, grading performance is good, phase Than that in traditional scorecard mode, accurate grading can be provided for Different Individual, and grade compared to traditional machine learning Mode, when being graded to new user using RIPPER rating models, it may be convenient to obtain classifying rules therein and be somebody's turn to do Classifying rules is it can be readily appreciated that consequently facilitating policymaker provides final decision, while the basis point after executing RIPPER classification every time Class result screens characteristic attribute, the task amount of multidimensional characteristic training can greatly be reduced, to effectively reduce at ratings data Reason amount improves grading efficiency.
2) the present invention is based on the credit rating methods of Feature Selection, by deleting occurrence number after each RIPPER classification Less than the characteristic attribute of specified threshold, can remove uncorrelated and redundancy feature makes characteristic reduce, due to the reduction of characteristic, The example that repetition can also be removed, so as to be effectively prevented from " dimension disaster " and " multiple shot array ", simultaneously because characteristic With the reduction of instance number, it is possible to reduce the time of model learning, to further increase grading efficiency.
3) the present invention is based on the credit rating methods of Feature Selection, by combining characteristic attribute in data set to divide with RIPPER Class device carries out Feature Selection, can realize Feature Selection from grader and data set self character so that can be great The training mission amount of RIPPER rating models is reduced, while not interfering with the performance of model.
Description of the drawings
Fig. 1 is the implementation process schematic diagram for the credit rating method that the present embodiment feature based is chosen.
Fig. 2 is the principle schematic that decision tree is established in the RIPPER sorting algorithms that the present embodiment uses.
Fig. 3 is the principle schematic deleted into line discipline in the RIPPER sorting algorithms that the present embodiment uses.
Fig. 4 is the ROC curve schematic diagram for the RIPPER rating models being calculated in the specific embodiment of the invention.
Fig. 5 is the realization for training RIPPER rating models in the specific embodiment of the invention based on Ripper-ADB assembled classifications Principle schematic.
Specific implementation mode
Below in conjunction with Figure of description and specific preferred embodiment, the invention will be further described, but not therefore and It limits the scope of the invention.
As shown in Figure 1, the credit rating method that the present embodiment feature based is chosen, step include:
S1. characteristic attribute collection extracts:The user credit information collection for model training is obtained, extracts user credit letter respectively Breath concentrates the corresponding characteristic attribute constitutive characteristic property set of each information;
S2. the model training that feature based is chosen:Multiple RIPPER classification is executed to characteristic attribute collection, each RIPPER divides It concentrates characteristic attribute to screen characteristic attribute according to classification results after class, the characteristic attribute collection after screening is re-started RIPPER classifies, until the RIPPER rating models needed for generating;
S3. credit rating:It inputs the credit information of user to be assessed and extracts corresponding characteristic attribute, the spy that will be extracted Sign attribute, which is input in RIPPER rating models, classifies, and obtains the output of credit rating result.
RIPPER (rule inductive learning) is rule-based sorting algorithm, the established decision tree of classification as shown in Fig. 2, The rule of root node can be looked for one by one from leaf node, as shown in figure 3, being deleted if carrying out redundancy to rule shown in Fig. 3 (a) Subtract, according to the scale sequence (assigning the rule being triggered with " most harsh " requirement highest priority) of rule, when having judged When first rule is not met, remove the humidity=normal of the second rule, can similarly remove Article 4, Article 5 Outlook=rainy, outlook=rainy and windy=true in rule, as a result as shown in Fig. 3 (b).RIPPER In every RIPPER rule be made of some regular former pieces, include better beta pruning and stopping criterion and to regular collection after Processing, be using incrementally reduce error Pruning Algorithm, the example of training set is divided into two datasets:Growth collection and trimming Collection, growth collection are used for generation rule, and increase condition meets the requirements until rule, and trimming collection is for building rule, in deletion rule Condition, until obtaining better rule;Then rule value is evaluated, removes final condition and sees whether value changes, If do not changed, removal condition is continued to, until obtaining best grader version.
The accuracy of RIPPER is high, rule creation performance is good, and the sample of the efficiency of RIPPER algorithms and training dataset Number is linear, and time complexity is O (nlog2n), it is often more important that can be in the test set for including hundreds of thousands noise data On still maintain very high efficiency, while the decision rule of RIPPER classification is user oriented, and grader can generate classification Rule, and the classifying rules generated is easier to understanding for a user, i.e. RIPPER algorithms have retractility, regularization special Property.The present embodiment makes full use of the characteristics such as the above-mentioned retractility of RIPPER, regularization, by the feature for extracting user credit information Attribute carries out repeatedly classification using RIPPER graders and reuses the RIPPER rating models to build RIPPER rating models It grades to the credit of new user, grading is efficient, grading performance is good, compared to traditional scorecard mode, can be directed to not Accurate grading is provided with individual, and compared to traditional machine learning rating methods, in use RIPPER rating models to new When user grades, it may be convenient to obtain classifying rules therein and the classifying rules it can be readily appreciated that consequently facilitating decision Person provides final decision, while screening characteristic attribute according to classification results after executing RIPPER classification every time, can be in conjunction with spy The characteristic for levying attribute and RIPPER graders itself carries out Feature Selection, is significantly reduced the task amount of multidimensional characteristic training, from And ratings data treating capacity is effectively reduced, improve grading efficiency.
In the present embodiment, step S1 the specific steps are:
S11. extraction user credit information concentrates the corresponding characteristic attribute of each original credit information, obtains characteristic attribute collection, To characteristic attribute collection into exporting after data prediction;
S12. different Category Attributes are concentrated to be exported after reunification characteristic attribute;
S13. the characteristic attribute collection that step S12 is exported is subjected to composing training collection after classification grading.
After extracting data of the user about credit information in original user data library, extraction is every first believes the present embodiment It is the characteristic value for characterizing each credit information with the corresponding characteristic attribute of information, constitutive characteristic property set carries out characteristic attribute collection After data prediction, different Category Attributes are subjected to unification, classification grading then is carried out to the tag along sort of characteristic attribute collection Mark such as uses AA, A, B, C, D, E, F to be marked as grading, constitutes the training set for meeting RIPPER rating model demands, will instruct Practice collection and upset at random and subsequently reuse RIPPER graders after distribution repeatedly classification iteration is carried out to training set, per root after subseries Characteristic attribute is screened according to classification results, until obtaining required RIPPER rating models.
In the present embodiment, user credit information specifically includes user base information, loaning bill information, user's history specified time Refund in section overdue information, user's future it is specified between refund information, user's bid information, user's liability information etc., base in section Plinth information include the refund information such as name, gender, schooling include successfully refund number, normally pay off number, be overdue specified In number of days pay off number, it is overdue pay off number etc. more than given number of days, loaning bill information includes successfully loaning bill number, first time Success borrowing time, accumulative borrowing balance, the amount of money to be gone back, single highest borrowing balance etc., liability information include that historical high is negative Debt information etc., user credit information can specifically extract all kinds of information datas for characterizing user credit according to actual demand.
In the present embodiment, when step S11 carries out data prediction, specifically includes and fill out characteristic attribute concentration missing values Processing is filled, characteristic attribute concentrates redundancy value, exceptional value to carry out delete processing, when missing values are filled processing, specifically to concentrating Type missing values are filled discrete type missing values using context using filling modes such as median, mode or Lagrange's interpolations Etc. modes, certainly can also according to actual demand using other filling processing modes.
Occurrence number, which is deleted, in the present embodiment, after classifying especially by each RIPPER in step S2 is less than specified threshold Characteristic attribute, the characteristic attribute collection after being screened re-start RIPPER classification, until the RIPPER rating models generated Precision or feature quantity reach preset requirement, obtain final RIPPER rating models output.The present embodiment passes through each The characteristic attribute that occurrence number is less than specified threshold is deleted after RIPPER classification, that is, deletes and do not occur or spy that occurrence number is less It levies attribute, so that characteristic is reduced to remove uncorrelated and redundancy feature, i.e. the value of characteristic N becomes smaller, due to the reduction of characteristic, Some examples repeated can also be removed, instance number P is made also to reduce, so as to be effectively prevented from " dimension disaster " and " combination Explosion ", simultaneously because the reduction of N and P, it is possible to reduce the time of model learning, to further increase grading efficiency.
In the present embodiment, when RIPPER rating models generate, specifically when the precision of the RIPPER rating models generated is (accurate Degree) it no longer changes or when characteristic attribute number reaches preset quantity, obtains final RIPPER rating models output, i.e., The criterion that the precision of model or characteristic attribute number are completed as model training.
In the present embodiment, in step S2 generate RIPPER rating models the specific steps are:
S21. classified using RIPPER graders to current signature property set, according to each feature category in classification results Property the number that occurs count the weight of each characteristic attribute, and each characteristic attribute is ranked up according to the weight of statistics, is arranged Characteristic attribute collection after sequence;
S22. the characteristic attribute that characteristic attribute after sequence concentrates occurrence number to be less than predetermined threshold value is deleted, after obtaining update Characteristic attribute collection;
S23. updated characteristic attribute collection step S22 obtained carries out RIPPER classification, judges currently available Whether the precision or feature quantity of RIPPER rating models reach preset requirement, if so, obtaining final RIPPER grading moulds Type exports, and otherwise returns to step S21.
When the present embodiment realizes feature based extraction training RIPPER rating models, specifically being classified first according to RIPPER will All characteristic attribute collection data are trained, and such as when user credit information is judged as being satisfied by specified requirements, rating result is AA Rank etc.;
Again after each RIPPER classification, each feature category in RIPPER classifying rules is counted using python programming languages Property weight, i.e., characteristic attribute occur number, will not have in the rule occurred occur or occurrence number it is less Characteristic attribute is deleted, and is obtained new characteristic attribute collection and is re-started RIPPER classification, judges whether the accuracy rate of this subseries compares Last accuracy rate is high, if it is, retaining current attribute, otherwise resets attribute, and it is less to pick out occurrence number again Attribute as delete candidate item, repeat above step until accuracy rate can not update or reach required characteristic Amount, completes the training of model, exports final RIPPER rating models, it can be ensured that the performance of final RIPPER rating models, It is significantly reduced training mission amount and complexity simultaneously.
In concrete application embodiment, when carrying out Feature Selection, it can first define primitive character property set D, to be retained Attribute number K and screening after characteristic attribute collection S, build Si attribute RIPPER grader, obtain classifying rules result Ci, The weight that each attribute is counted after completion classification, generates dictionary Di, if the occurrence number of objective attribute target attribute is less than given threshold value, deletes Except the attribute, until screening obtains K attribute, the characteristic attribute collection S after being screened.
The present embodiment, can be from classification by combining characteristic attribute in data set to carry out Feature Selection with RIPPER graders Device and data set self character, which set out, realizes Feature Selection so that can be significantly reduced the training mission of RIPPER rating models Amount, while not interfering with the performance of model.
In the present embodiment, specifically used ten foldings cross validation mode is excessively quasi- to avoid model when being trained in step S2 Close, i.e., training set is divided into 10 parts, will wherein 9 parts as training datas, another as test data, by successive ignition Afterwards, it is chosen at nicety of grading on different test sets and reaches model corresponding to specified threshold and grade mould as required RIPPER Type.As in a particular embodiment, the data of training set are divided into a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a1, a2, A3, a4, a5, a6, a7, a8, a9 are as training data, and a10 is as test set or other combinations, by successive ignition Afterwards, it is chosen on different test sets and all shows good model as final mask.By using ten folding cross validation modes, by It is using a part for former data in test set, is not a part for training set, containing many uncertainties, compared to tradition It is direct be trained using whole training set datas, then use the data of a part of trained mistake as test set, can keep away Exempt from model over-fitting.
Further include using ROC curve comment obtained RIPPER rating models in the present embodiment, in step S2 Estimate, if RIPPER rating models correspond to the area under the ROC curve calculated within a preset range, exports final RIPPER and comment Grade model, otherwise re-starts training.It can effectively reflect that the performance of model, the area under ROC curve are got over using ROC curve Greatly, corresponding model performance is better, and for the present embodiment after initial training obtains RIPPER rating models, the ROC of computation model is bent Line reuses ROC curve and assesses model, the ROC curve being calculated in concrete application embodiment as shown in figure 4, Area AUC under ROC curve is 0.9403, meets model performance demand, i.e., carries out model evaluation, realization side by ROC curve Method is simple and effective, it can be ensured that the performance of RIPPER rating models.
To include specifically user credit characteristic attribute when being classified using RIPPER graders in the present embodiment step S2 Training set in be not belonging to rule data item be randomly divided into growth collection and two subsets of reduced set, to increase collect executing rule Process of expansion when, initially the condition of rule is emptied, then the following formula of addition (1) repeatedly condition so that information increases Beneficial Gain (D, At) reaches the value of bigger, and improves covering surface of the rule to data item, until rule covers growth data set In all data item, At be tree each node;
Ad=v, An≤ θ or An≥θ (1)
Wherein AdFor the attribute of character type, v AdA virtual value, AnFor the variable of Real-valued, θ is in training set There is AnVirtual value.
When reducing process to reduced set executing rule, the last one condition is rejected from the condition of rule successively, makes function Value v reaches maximum, and the expression formula of function v is:
Wherein p is to cut to concentrate by the affirmative sample number of rule coverage, and n is to cut to concentrate by the negative sample of rule coverage Number.
Above-mentioned formula (2) process is repeated until by reduction condition and deletion rule the value of v can not increase, it is raw At RIPPER rating models and classifying rules.
In the present embodiment, when step S2 generates RIPPER rating models, be specifically based on Adaboost algorithm use it is multiple RIPPER graders train to obtain as Weak Classifier, and when each RIPPER classifier trainings, selected section training set sample And the partial error sample combination that a upper RIPPER grader obtains constitutes final training sample, after the completion of training Each Weak Classifier obtains ADB strong classifiers and as final RIPPER rating model.Adaboost algorithm has to be followed by force by force very much Weak Classifier preferably can be combined reinforcement by ring learning ability, and the present embodiment is by combining Adaboost algorithm and frame Frame RIPPER classifier training disaggregated models, realize the assembled classification method of Ripper-ADB, enabling have both Adaboost The performance advantage of algorithm and RIPPER graders further increases classification grading performance, while selected section training when training The partial error sample combination that subset sample and a upper Weak Classifier obtain constitutes final training sample and is trained, can To realize the training method of cycle interpenterating sample, since each selected section etc. divides sample to be trained so that the mistake of expansion Accidentally sample be definite value, will not increase at multiple, and due to total data carry out decile after, each part of data will be overlapped instruction Practice, data from the sample survey will not be omitted, it can be ensured that training is complete, while when each progress error sample expansion, not only to error number It, can be to avoid the excessive training of repeatedly wrong data according to the effect for playing accumulation training, and due to the addition of new samples. It is realized using NSL-KDD data sets (modified versions of KDD CUP data minings 1999 annual data collection of match) in the present embodiment.
As shown in figure 5, training RIPPER rating models based on Ripper-ADB assembled classifications in concrete application embodiment Detailed process be:
1. training set sample is carried out decile first, in accordance with iterations, N parts of training subset sample S are obtained1,S2,Sn
2. by first part of training sample S1Classification based training is carried out using Ripper algorithms, obtains grader a1, error sample R1
3. to a1Classification results carry out statistics calculating, obtain a1The weight w of grader1
4. by a1The sample R of mistake point1Duplicate sampling expansion is carried out according to magnitude (50%) identical with equal portions sample, is obtained The error sample R of expansion1p
5. by the error sample R of expansion1pIt is added to second part of training sample S2In, obtain new sample S2R
6. to new samples S2RThe classification based training of Ripper algorithms is carried out again, generates grader a2, error sample w2
7. to grader a2Classification results carry out statistics calculating, obtain a2The weight w of grader2
8. steps be repeated alternatively until that all sample trainings finish;
9. the skilled weighting classification device of institute is overlapped, final strong classifier Ripper-ADB is constituted, is obtained most Whole RIPPER rating models.
The attribute value extracted is input in the present embodiment, in step S3 when being classified in RIPPER rating models, Initial credit rating result is specifically exported by RIPPER rating models, according to initial credit rating result and RIPPER grading moulds Classifying rules of the type in carrying out assorting process obtains final rating result output.Due to the classification in RIPPER assorting processes Rule it can be readily appreciated that the present embodiment after obtaining initial rating result using RIPPER rating models, in conjunction with RIPPER classification gauges Final rating result is then generated, can realize more rational grading in conjunction with RIPPER classification.
In concrete application embodiment, use the present embodiment above method realize credit rating detailed step for:
Step 1:The data of all about user information in specified database are extracted, are waited to be pre-treated.
Step 2:Data between different tables are associated with unique key User ID, as will be can first integrated Tables of data is read in memory, and after establishing the array of tables of data, searching loop array is associated union operation according to User ID.
Step 3:Numerical value missing values in step 2 treated data are filled processing, concentrated missing values are made It is handled with median, mode or Lagrange's interpolation mode, the methods of context filling processing is used for discrete type missing values.
Step 4:Different Category Attributes units is subjected to unification, such as the numerical value disunity in length of maturity attribute, including How many a month and two kinds of how many a day needs to be converted into unified format (if the moon is unit):It traverses in the time limit attribute Each numerical value removes the subsequent word of numerical value if number is followed by ' a month ':' a month ', is then converted into if it is " day " Numerical value as unit of the moon carries out preservation output.
Step 5:Data set is used into AA, A, B, C, D, E, F tag along sorts carry out grading mark, obtain characteristic attribute collection.
Step 6:RIPPER classification is carried out to characteristic attribute collection using RIPPER graders, obtains classification results;
Step 7:The weight of each characteristic attribute is counted according to classification results (rule), and carries out attribute weight sequence, is deleted RIPPER classification is re-started after the smaller characteristic attribute of weight;
Step 8:Repeat step 6,7, and judge whether classification accuracy changes or whether reach required feature Attribute number, until obtaining final RIPPER rating models;
Step 9:Using ROC curve assess the RIPPER rating models that step 8 obtains, RIPPER gradings Model includes code model and RIPPER rules;
Step 10:New user information to be assessed is input in the RIPPER rating models that step 9 obtains, output grading As a result, policymaker provides final grading decision according to rating result, the credit rating of user is completed.
Above-mentioned only presently preferred embodiments of the present invention, is not intended to limit the present invention in any form.Although of the invention Disclosed above with preferred embodiment, however, it is not intended to limit the invention.Therefore, every without departing from technical solution of the present invention Content, technical spirit any simple modifications, equivalents, and modifications made to the above embodiment, should all fall according to the present invention In the range of technical solution of the present invention protection.

Claims (10)

1. a kind of credit rating method that feature based is chosen, which is characterized in that step includes:
S1. characteristic attribute collection extracts:The user credit information collection for model training is obtained, extracts the user credit letter respectively Breath concentrates the corresponding characteristic attribute constitutive characteristic property set of each information;
S2. the model training that feature based is chosen:Multiple RIPPER classification is executed to the characteristic attribute collection, each RIPPER divides It concentrates characteristic attribute to screen characteristic attribute according to classification results after class, the characteristic attribute collection after screening is re-started RIPPER classifies, until the RIPPER rating models needed for generating;
S3. credit rating:It inputs the credit information of user to be assessed and extracts corresponding characteristic attribute, the feature category that will be extracted Property be input in the RIPPER rating models and classify, obtain the output of credit rating result.
2. the credit rating method that feature based according to claim 1 is chosen, which is characterized in that have in the step S2 Body deletes the characteristic attribute that occurrence number is less than specified threshold, the characteristic attribute after being screened after classifying by each RIPPER Collection re-starts RIPPER classification, until the precision or feature quantity of the RIPPER rating models generated reach preset requirement, obtains It is exported to final RIPPER rating models.
3. the credit rating method that feature based according to claim 2 is chosen, which is characterized in that raw in the step S2 At RIPPER rating models the specific steps are:
S21. classified using RIPPER graders to current signature property set, gone out according to each characteristic attribute in classification results Existing number counts the weight of each characteristic attribute, and is ranked up to each characteristic attribute according to the weight of statistics, after obtaining sequence Characteristic attribute collection;
S22. the characteristic attribute that characteristic attribute after the sequence concentrates occurrence number to be less than predetermined threshold value is deleted, after obtaining update Characteristic attribute collection;
S23. the updated characteristic attribute collection step S22 obtained carries out RIPPER classification, judges currently available Whether the precision or feature quantity of RIPPER rating models reach preset requirement, if so, obtaining final RIPPER grading moulds Type exports, and otherwise returns to step S21.
4. the credit rating method that feature based according to any one of claims 1 to 3 is chosen, which is characterized in that Specifically used ten foldings cross validation mode is trained to avoid model over-fitting in the step S2, i.e., training set is divided into 10 Part, will wherein 9 parts as training datas, another as test data, after successive ignition, be chosen on different test sets Nicety of grading reaches the RIPPER rating models output that the model corresponding to specified threshold is obtained as current training.
5. the credit rating method that feature based according to any one of claims 1 to 3 is chosen, which is characterized in that Further include using ROC curve assess the obtained RIPPER rating models in the step S2, if described RIPPER rating models correspond to the area under the ROC curve calculated within a preset range, export final RIPPER grading moulds Otherwise type re-starts training.
6. the credit rating method that feature based according to any one of claims 1 to 3 is chosen, which is characterized in that In the step S1 the specific steps are:
S11. it extracts the user credit information and concentrates the corresponding characteristic attribute of each original credit information, obtain characteristic attribute collection, To the characteristic attribute collection into exporting after data prediction;
S12. different Category Attributes are concentrated to be exported after reunification the characteristic attribute;
S13. composing training collection exports after the step S12 characteristic attribute collection exported being carried out classification grading.
7. the credit rating method that feature based according to claim 6 is chosen, it is characterised in that:In the step S11 Data prediction is carried out, specifically includes and characteristic attribute concentration missing values is filled processing, the characteristic attribute is concentrated Redundancy value, exceptional value carry out delete processing;When the missing values are filled processing, specifically to position in concentrated missing values use A kind of filling mode in number, mode or Lagrange's interpolation uses context filling mode to discrete type missing values.
8. the credit rating method that feature based according to any one of claims 1 to 3 is chosen, which is characterized in that The user credit information includes user base information, user's loaning bill information, user's liability information, user's history designated time period Between interior overdue information of refunding, user's future are specified in section domestic demand refund information, user's bid information and user's liability information It is one or more.
9. the credit rating method that feature based according to any one of claims 1 to 3 is chosen, which is characterized in that The characteristic attribute extracted is input in the step S3 when being classified in the RIPPER rating models, specifically by described RIPPER rating models export initial credit rating result, are graded according to the initial credit rating result and the RIPPER Classifying rules of the model in carrying out assorting process obtains final rating result output.
10. the credit rating method that feature based according to any one of claims 1 to 3 is chosen, which is characterized in that When the step S2 generates the RIPPER rating models, it is specifically based on Adaboost algorithm and is made using multiple RIPPER graders It trains to obtain for Weak Classifier, and when each RIPPER classifier trainings, selected section training set sample and one upper The partial error sample combination that RIPPER graders obtain constitutes final training sample, by each Weak Classifier after the completion of training Obtain ADB strong classifiers and as final RIPPER rating models.
CN201810414547.4A 2018-05-03 2018-05-03 A kind of credit rating method that feature based is chosen Pending CN108335200A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810414547.4A CN108335200A (en) 2018-05-03 2018-05-03 A kind of credit rating method that feature based is chosen

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810414547.4A CN108335200A (en) 2018-05-03 2018-05-03 A kind of credit rating method that feature based is chosen

Publications (1)

Publication Number Publication Date
CN108335200A true CN108335200A (en) 2018-07-27

Family

ID=62935005

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810414547.4A Pending CN108335200A (en) 2018-05-03 2018-05-03 A kind of credit rating method that feature based is chosen

Country Status (1)

Country Link
CN (1) CN108335200A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109102396A (en) * 2018-08-17 2018-12-28 北京玖富普惠信息技术有限公司 A kind of user credit ranking method, computer equipment and readable medium
CN109116150A (en) * 2018-08-03 2019-01-01 福州大学 A kind of converters method for diagnosing faults based on Cerebellar Model Articulation Controller
CN109214455A (en) * 2018-09-05 2019-01-15 北京国网富达科技发展有限责任公司 Oil colours modal data and the correlation of account data determine method and system
CN109242671A (en) * 2018-08-29 2019-01-18 厦门市七星通联科技有限公司 A kind of credit violation correction method and system based on multi-angle of view deficiency of data
CN109636433A (en) * 2018-10-16 2019-04-16 深圳壹账通智能科技有限公司 Feeding card identification method, device, equipment and storage medium based on big data analysis
CN110046229A (en) * 2019-04-18 2019-07-23 北京百度网讯科技有限公司 For obtaining the method and device of information
CN110046711A (en) * 2018-12-29 2019-07-23 阿里巴巴集团控股有限公司 A kind of aspect of model elimination method and device
CN111078749A (en) * 2019-11-28 2020-04-28 北京明略软件系统有限公司 Method and device for training model, and device for realizing information investigation
CN111694802A (en) * 2020-06-12 2020-09-22 百度在线网络技术(北京)有限公司 Duplicate removal information acquisition method and device and electronic equipment
CN112053142A (en) * 2020-09-30 2020-12-08 北京致远互联软件股份有限公司 Personnel sequencing management method based on cooperative office system
CN112232951A (en) * 2020-12-17 2021-01-15 中证信用云科技(深圳)股份有限公司 Credit evaluation method, device, equipment and medium based on multi-dimensional cross feature
CN116610662A (en) * 2023-07-17 2023-08-18 金锐同创(北京)科技股份有限公司 Filling method, filling device, computer equipment and medium for missing classification data

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105893766A (en) * 2016-04-06 2016-08-24 成都数联易康科技有限公司 Graded diagnosis and treatment evaluating method based on data mining

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105893766A (en) * 2016-04-06 2016-08-24 成都数联易康科技有限公司 Graded diagnosis and treatment evaluating method based on data mining

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
赵月爱 等: "AdaBoost 算法在网络入侵检测中的实验研究", 《计算机应用与软件》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109116150A (en) * 2018-08-03 2019-01-01 福州大学 A kind of converters method for diagnosing faults based on Cerebellar Model Articulation Controller
CN109102396A (en) * 2018-08-17 2018-12-28 北京玖富普惠信息技术有限公司 A kind of user credit ranking method, computer equipment and readable medium
CN109242671A (en) * 2018-08-29 2019-01-18 厦门市七星通联科技有限公司 A kind of credit violation correction method and system based on multi-angle of view deficiency of data
CN109214455A (en) * 2018-09-05 2019-01-15 北京国网富达科技发展有限责任公司 Oil colours modal data and the correlation of account data determine method and system
CN109636433A (en) * 2018-10-16 2019-04-16 深圳壹账通智能科技有限公司 Feeding card identification method, device, equipment and storage medium based on big data analysis
CN110046711B (en) * 2018-12-29 2023-08-04 创新先进技术有限公司 Model feature eliminating method and device
CN110046711A (en) * 2018-12-29 2019-07-23 阿里巴巴集团控股有限公司 A kind of aspect of model elimination method and device
CN110046229A (en) * 2019-04-18 2019-07-23 北京百度网讯科技有限公司 For obtaining the method and device of information
CN111078749A (en) * 2019-11-28 2020-04-28 北京明略软件系统有限公司 Method and device for training model, and device for realizing information investigation
CN111694802A (en) * 2020-06-12 2020-09-22 百度在线网络技术(北京)有限公司 Duplicate removal information acquisition method and device and electronic equipment
CN111694802B (en) * 2020-06-12 2023-04-28 百度在线网络技术(北京)有限公司 Method and device for obtaining duplicate removal information and electronic equipment
CN112053142A (en) * 2020-09-30 2020-12-08 北京致远互联软件股份有限公司 Personnel sequencing management method based on cooperative office system
CN112232951A (en) * 2020-12-17 2021-01-15 中证信用云科技(深圳)股份有限公司 Credit evaluation method, device, equipment and medium based on multi-dimensional cross feature
CN112232951B (en) * 2020-12-17 2021-04-27 中证信用云科技(深圳)股份有限公司 Credit evaluation method, device, equipment and medium based on multi-dimensional cross feature
CN116610662A (en) * 2023-07-17 2023-08-18 金锐同创(北京)科技股份有限公司 Filling method, filling device, computer equipment and medium for missing classification data
CN116610662B (en) * 2023-07-17 2023-10-03 金锐同创(北京)科技股份有限公司 Filling method, filling device, computer equipment and medium for missing classification data

Similar Documents

Publication Publication Date Title
CN108335200A (en) A kind of credit rating method that feature based is chosen
CN108596758A (en) A kind of credit rating method based on classification rule-based classification
Jin et al. A data-driven approach to predict default risk of loan for online peer-to-peer (P2P) lending
CN108564466A (en) A kind of credit rating method
Koh et al. A two-step method to construct credit scoring models with data mining techniques
CN113935434A (en) Data analysis processing system and automatic modeling method
CN110322085A (en) A kind of customer churn prediction method and apparatus
CN105117426B (en) A kind of intellectual coded searching method of customs
Alsubaie et al. Cost-sensitive prediction of stock price direction: Selection of technical indicators
CN109739844B (en) Data classification method based on attenuation weight
CN107368918A (en) Data processing method and device
CN104834940A (en) Medical image inspection disease classification method based on support vector machine (SVM)
CN108491511A (en) Data digging method and device, model training method based on diagram data and device
CN103839183A (en) Intelligent credit extension method and intelligent credit extension device
CN110458324A (en) Calculation method, device and the computer equipment of risk probability
CN107832425A (en) A kind of corpus labeling method, the apparatus and system of more wheel iteration
CN112559900A (en) Product recommendation method and device, computer equipment and storage medium
Cao et al. Bond rating using support vector machine
CN110263207A (en) Image search method, device, equipment and computer readable storage medium
Alinezhad An Integrated DEA and Data Mining Approach for Performance Assessment
Degife et al. Efficient predictive model for determining critical factors affecting commodity price: the case of coffee in Ethiopian Commodity Exchange (ECX)
Wei et al. Fraud detection by machine learning
CN116503158A (en) Enterprise bankruptcy risk early warning method, system and device based on data driving
CN110232154A (en) Products Show method, apparatus and medium based on random forest
CN114092215B (en) Auditing method and system for export tax refund loan

Legal Events

Date Code Title Description
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Yang Shenggang

Inventor after: Chen Zuo

Inventor after: Peng Hanqi

Inventor after: Zhao Hanfeng

Inventor after: Chen Bangdao

Inventor after: Mei Xuesong

Inventor after: Yu Xiangjun

Inventor after: Li Haozhi

Inventor after: Wang Shao

Inventor before: Yang Shenggang

Inventor before: Chen Zuo

Inventor before: Zhao Hanfeng

Inventor before: Chen Bangdao

Inventor before: Mei Xuesong

Inventor before: Yu Xiangjun

Inventor before: Li Haozhi

Inventor before: Wang Shao

PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200514

Address after: Guanxi Town, Dingcheng District, Changde, Hunan Province

Applicant after: Hunan Huda Jinke Technology Development Co., Ltd

Address before: Yuelu District City, Hunan province 410082 Changsha Lushan South Road, Hunan University College of information science and Engineering

Applicant before: HUNAN University

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180727