CN110223161A - Credit estimation method and device based on feature dependency degree - Google Patents

Credit estimation method and device based on feature dependency degree Download PDF

Info

Publication number
CN110223161A
CN110223161A CN201910441624.XA CN201910441624A CN110223161A CN 110223161 A CN110223161 A CN 110223161A CN 201910441624 A CN201910441624 A CN 201910441624A CN 110223161 A CN110223161 A CN 110223161A
Authority
CN
China
Prior art keywords
base classifier
credit
classifier
base
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910441624.XA
Other languages
Chinese (zh)
Inventor
王宝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Oriental Silver Valley (beijing) Technology Development Co Ltd
Original Assignee
Oriental Silver Valley (beijing) Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oriental Silver Valley (beijing) Technology Development Co Ltd filed Critical Oriental Silver Valley (beijing) Technology Development Co Ltd
Priority to CN201910441624.XA priority Critical patent/CN110223161A/en
Publication of CN110223161A publication Critical patent/CN110223161A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application discloses a kind of credit estimation method and device based on feature dependency degree.This method includes generating multiple base classifiers;Using Unsupervised clustering, dynamic select goes out the base classifier of the default credit evaluation condition of the satisfaction in the multiple base classifier;The base classifier prediction result is merged, credit evaluation classification results are obtained.Present application addresses carry out the ineffective technical problem of risk assessment to personal credit.The estimated performance of credit scoring is improved based on the dynamic heterogeneous integrated study model of feature dependency degree by the application.In addition, the application is suitable for the financial fields such as credit evaluation.

Description

Credit estimation method and device based on feature dependency degree
Technical field
This application involves financial fields, in particular to a kind of credit estimation method and dress based on feature dependency degree It sets.
Background technique
By the risk analysis to credit decisions, it can recognize that credit record is limited or the potential wind of undesirable crowd Danger.
Default risk can not be preferably reduced when inventors have found that carrying out risk assessment to personal credit.
For the ineffective problem of risk assessment is carried out to personal credit in the related technology, not yet propose at present effective Solution.
Summary of the invention
The main purpose of the application is to provide a kind of credit estimation method and device based on feature dependency degree, to solve The ineffective problem of risk assessment is carried out to personal credit.
To achieve the goals above, according to the one aspect of the application, a kind of credit based on feature dependency degree is provided Appraisal procedure.
The credit estimation method based on feature dependency degree according to the application includes: to generate multiple base classifiers;Using nothing Supervision clustering, dynamic select go out the base classifier of the default credit evaluation condition of the satisfaction in the multiple base classifier;It is right The base classifier prediction result merges, and obtains credit evaluation classification results.
Further, generating multiple base classifiers includes: acquisition training set;Using the attribute reduction subset based on rough set With the feature selecting based on feature dependency degree, the reduction subset for training the base classifier is generated;According to reduction Collection, training obtain heterogeneous collections learning model.
Further, using Unsupervised clustering, dynamic select goes out the preset condition that meets in the multiple base classifier The base classifier includes: to obtain N number of neighbouring sample of test sample using Unsupervised clustering;It is N number of according to base classifier calculated The classification results of neighbouring sample, and assess classification performance and sequence of N number of neighbour's sample on the base classifier;According to sequence As a result the base classifier of sample is determined.
Further, the base classifier prediction result is merged, obtaining credit evaluation classification results includes: basis The base classifier for meeting preset condition in the multiple base classifier that test sample and dynamic select go out, to sample into Row classification prediction;After the result of base classifier is merged, the Different Results of credit evaluation classification are obtained using temporal voting strategy.
Further, method further include: user credit data are obtained, as data set;In training and generate multiple bases point Before class device, the redundancy feature in the data set is filtered according to default feature dependency degree;The data set is divided into training Collection and verifying collection.
To achieve the goals above, according to the another aspect of the application, a kind of credit based on feature dependency degree is provided Assess device.
It include: generation module according to the credit evaluation device based on feature dependency degree of the application, for generating multiple bases Classifier;Dynamic select module, for using Unsupervised clustering, it is default that dynamic select goes out the satisfaction in the multiple base classifier The base classifier of condition;As a result merging module obtains credit and comments for merging to the base classifier prediction result Estimate classification results.
Further, the generation module includes: acquiring unit, for obtaining training set;Generation unit, for using base In the attribute reduction subset of rough set and feature selecting based on feature dependency degree, the pact for training the base classifier is generated Simple subset;Training unit, for according to the reduction subset, training to obtain heterogeneous collections learning model.
Further, the dynamic select module includes: that neighbouring sample unit is surveyed for using Unsupervised clustering N number of neighbouring sample of sample sheet;Performance Evaluation unit, for the classification results according to the N number of neighbouring sample of base classifier calculated, and Assess classification performance and sequence of N number of neighbour's sample on the base classifier;Determination unit, for being determined according to ranking results The base classifier of sample.
Further, the result merging module includes: classification predicting unit, for according to test sample and dynamic select The base classifier for meeting preset condition in the multiple base classifier out, carries out classification prediction to sample;Ballot is single Member obtains the Different Results of credit evaluation classification using temporal voting strategy after merging the result of base classifier.
Further, device further include: redundancy removal module, the redundancy removal module includes: data capture unit, is used In obtaining user credit data, as data set;Filter element, in training and before generating multiple base classifiers, according to Default feature dependency degree filters the redundancy feature in the data set;Division unit, for the data set to be divided into training Collection and verifying collection.
Credit estimation method and device in the embodiment of the present application based on feature dependency degree are classified using multiple bases are generated The mode of device, by using Unsupervised clustering, dynamic select goes out the satisfaction in the multiple base classifier and presets credit evaluation item The base classifier of part, has reached and has merged to the base classifier prediction result, obtained credit evaluation classification results Purpose to realize the technical effect for being applied in combination and carrying out user credit assessment, and then solves and carries out wind to personal credit The bad technical problem of dangerous Evaluated effect.
Detailed description of the invention
The attached drawing constituted part of this application is used to provide further understanding of the present application, so that the application's is other Feature, objects and advantages become more apparent upon.The illustrative examples attached drawing and its explanation of the application is for explaining the application, not Constitute the improper restriction to the application.In the accompanying drawings:
Fig. 1 is the credit estimation method flow diagram based on feature dependency degree according to the application first embodiment;
Fig. 2 is the credit estimation method flow diagram based on feature dependency degree according to the application second embodiment;
Fig. 3 is the credit estimation method flow diagram based on feature dependency degree according to the application 3rd embodiment;
Fig. 4 is the credit estimation method flow diagram based on feature dependency degree according to the application fourth embodiment;
Fig. 5 is the credit estimation method flow diagram based on feature dependency degree according to the 5th embodiment of the application;
Fig. 6 is the credit evaluation apparatus structure schematic diagram based on feature dependency degree according to the application first embodiment;
Fig. 7 is the credit evaluation apparatus structure schematic diagram based on feature dependency degree according to the application second embodiment;
Fig. 8 is the credit evaluation apparatus structure schematic diagram based on feature dependency degree according to the application 3rd embodiment;
Fig. 9 is the credit evaluation apparatus structure schematic diagram based on feature dependency degree according to the application fourth embodiment;
Figure 10 is the credit evaluation apparatus structure schematic diagram based on feature dependency degree according to the 5th embodiment of the application;
Figure 11 is the realization principle schematic diagram of the application;
Figure 12 is according to generation base classifier schematic diagram;
Figure 13 is dynamic select base classifier schematic diagram;
Figure 14 is the classification results schematic diagram for merging base classifier.
Specific embodiment
In order to make those skilled in the art more fully understand application scheme, below in conjunction in the embodiment of the present application Attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is only The embodiment of the application a part, instead of all the embodiments.Based on the embodiment in the application, ordinary skill people Member's every other embodiment obtained without making creative work, all should belong to the model of the application protection It encloses.
It should be noted that the description and claims of this application and term " first " in above-mentioned attached drawing, " Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way Data be interchangeable under appropriate circumstances, so as to embodiments herein described herein.In addition, term " includes " and " tool Have " and their any deformation, it is intended that cover it is non-exclusive include, for example, containing a series of steps or units Process, method, system, product or equipment those of are not necessarily limited to be clearly listed step or unit, but may include without clear Other step or units listing to Chu or intrinsic for these process, methods, product or equipment.
In this application, term " on ", "lower", "left", "right", "front", "rear", "top", "bottom", "inner", "outside", " in ", "vertical", "horizontal", " transverse direction ", the orientation or positional relationship of the instructions such as " longitudinal direction " be orientation based on the figure or Positional relationship.These terms are not intended to limit indicated dress primarily to better describe the application and embodiment Set, element or component must have particular orientation, or constructed and operated with particular orientation.
Also, above-mentioned part term is other than it can be used to indicate that orientation or positional relationship, it is also possible to for indicating it His meaning, such as term " on " also are likely used for indicating certain relations of dependence or connection relationship in some cases.For ability For the those of ordinary skill of domain, the concrete meaning of these terms in this application can be understood as the case may be.
In addition, term " installation ", " setting ", " being equipped with ", " connection ", " connected ", " socket " shall be understood in a broad sense.For example, It may be a fixed connection, be detachably connected or monolithic construction;It can be mechanical connection, or electrical connection;It can be direct phase It even, or indirectly connected through an intermediary, or is two connections internal between device, element or component. For those of ordinary skills, the concrete meaning of above-mentioned term in this application can be understood as the case may be.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
The credit estimation method based on feature dependency degree of the embodiment of the present application, while by the training based on feature dependency degree Collection generates, and heterogeneous collections model and dynamic select base classifier combine carry out credit scoring.In addition, being relied on by feature The index of degree can further eliminate the information redundancy in data set, to improve the accuracy for the base classifier that training obtains.
As shown in Figure 1, this method includes the following steps, namely S102 to step S106:
Step S102 generates multiple base classifiers;
After acquiring data set and carrying out relevant redundancy processing, for the base classification as training set training isomery Device.
Base classifier can be use: the classifier of LR, SVM, KNN, XgBoost, DT etc..
It should be noted that the quantity or type of base classifier are not defined in embodiments herein, this Field technical staff can select according to actual use scene, as long as can satisfy the requirement of classifier.
Step S104, using Unsupervised clustering, the satisfaction that dynamic select goes out in the multiple base classifier is preset credit and is commented Estimate the base classifier of condition;
By the way of Unsupervised clustering, training verifying collection and sample set.By judging whether to meet default credit evaluation Condition dynamic select can go out qualified base classifier in the multiple base classifier.Base classifier is based on not belonging to Property data subset training, can obtain more have multifarious base classifier.
Go out the base classifier of the default credit evaluation condition of the satisfaction in the multiple base classifier by dynamic select When, based on the performance evaluation result to base classifier, and the base separator for sorting forward according to performance evaluation result is as more Suitable classifier.
Step S106 merges the base classifier prediction result, obtains credit evaluation classification results.
Correlated results merging is carried out to the base classifier prediction result, obtains personal credit file classification results.Such as The result of classification output can be " good " client or " bad " client.
By merging the classification results of base classifier, it is able to ascend the assessment result to personal user's credit.
It can be seen from the above description that the application realizes following technical effect:
In the embodiment of the present application, by the way of generating multiple base classifiers, by using Unsupervised clustering, dynamic is selected The base classifier for selecting out the default credit evaluation condition of satisfaction in the multiple base classifier, has reached and has classified to the base Device prediction result merges, and obtains the purpose of credit evaluation classification results, and carry out user credit is applied in combination to realize The technical effect of assessment, and then solve and the ineffective technical problem of risk assessment is carried out to personal credit.
According to the embodiment of the present application, as preferred in the present embodiment, as shown in Fig. 2, generating multiple base classifiers and including:
Step S202 obtains training set;
Step S204 is generated using the attribute reduction subset based on rough set and the feature selecting based on feature dependency degree For training the reduction subset of the base classifier;
Step S206, according to the reduction subset, training obtains heterogeneous collections learning model.
According to the various reduction subsets for generating training base classifier, training obtains heterogeneous collections learning model.
Specifically, since feature dependency degree describes the reconstruction ability between feature, the Measure Indexes can be used Optimization attributes about subtract subset.If a certain feature can consider to delete completely by other feature constructions from information redundancy angle This is characterized in lossless.Further, it in order to generate diversified Property element subset, is first randomly generated one and about subtracts subset, Before generating other and about subtracting subset, the dependence about subtracted in subset between feature generated at random is calculated using feature dependency degree Degree, finding can be by this feature method that other features are rebuild well.Therefore, to ensure about subtracting in subset for next selection Without a certain feature, this feature will be removed from data set before generating other and about subtracting subset.Therefore this feature will not wrap Containing about subtracting in subset what is generated later.
According to the embodiment of the present application, as preferred in the present embodiment, as shown in figure 3, dynamic is selected using Unsupervised clustering The base classifier for meeting preset condition selected out in the multiple base classifier includes:
Step S202 obtains N number of neighbouring sample of test sample using Unsupervised clustering;
Step S304 according to the classification results of the N number of neighbouring sample of base classifier calculated, and assesses N number of neighbour's sample in institute State the classification performance on base classifier and sequence;
Step S306 determines the base classifier of sample according to ranking results.
Specifically, for test sample, firstly, finding N number of nearest-neighbors from verifying collection using unsupervised algorithm, so Afterwards, the classification results that the basic classification device in heterogeneous collections learning model precalculates N number of neighbours are obtained using in training.Most Afterwards, the best several base classifiers of selection assessment performance in each type of base classifier.
By being based on feature dependency degree, and improve by dynamic select Manufacturing resource learning model the prediction of credit scoring Performance, the accuracy of aggregation model can be improved in filtering redundancy feature before training fundamental classifier.
According to the embodiment of the present application, as preferred in the present embodiment, as shown in figure 4, predicting to tie to the base classifier Fruit merges, and obtains credit evaluation classification results and includes:
Step S402 meets preset condition in the multiple base classifier gone out according to test sample and dynamic select The base classifier carries out classification prediction to sample;
Step S404 after merging the result of base classifier, obtains the different of credit evaluation classification using temporal voting strategy and ties Fruit.
Specifically, for test sample, the base classifier obtained in abovementioned steps by dynamic select is initially used for pre- Classification results are surveyed, final classification result is then generated using majority vote rule.
It should be noted that preferred temporal voting strategy generates final classification as a result, in the application in embodiments herein Embodiment in mode is not merged to specific prediction classification results and is defined, as long as can satisfy associated credit assessment point The demand of class.
According to the embodiment of the present application, as preferred in the present embodiment, as shown in Figure 5, further includes:
Step S502 obtains user credit data, as data set;
Step S504 filters the data according to default feature dependency degree before training and generating multiple base classifiers The redundancy feature of concentration;
The data set is divided into training set and verifying collects by step S506.
Specifically, information redundancy can further be eliminated by the pre-set level of feature dependency degree.Based on base classifier Performance screen feature, original recognition capability will not be weakened while generating different character subsets.
It should be noted that step shown in the flowchart of the accompanying drawings can be in such as a group of computer-executable instructions It is executed in computer system, although also, logical order is shown in flow charts, and it in some cases, can be with not The sequence being same as herein executes shown or described step.
According to the embodiment of the present application, additionally provides a kind of credit based on feature dependency degree for implementing the above method and comment Device is estimated, as shown in fig. 6, the device includes: generation module 10, for generating multiple base classifiers;Dynamic select module 20 is used In using Unsupervised clustering, dynamic select goes out the base classifier for meeting preset condition in the multiple base classifier;Knot Fruit merging module 30 obtains credit evaluation classification results for merging to the base classifier prediction result.
In the generation module 10 of the embodiment of the present application by acquiring data set and carrying out relevant redundancy processing after, be used for Base classifier as training set training isomery.
Base classifier can be use: the classifier of LR, SVM, KNN, XgBoost, DT etc..
It should be noted that the quantity or type of base classifier are not defined in embodiments herein, this Field technical staff can select according to actual use scene, as long as can satisfy the requirement of classifier.
In the dynamic select module 20 of the embodiment of the present application by the way of Unsupervised clustering, training verifying collection and sample Collection.By judging whether to meet default credit evaluation condition, dynamic select can go out to meet item in the multiple base classifier The base classifier of part.Data subset training of the base classifier based on different attribute, can obtain more has multifarious base point Class device.
Go out the base classifier of the default credit evaluation condition of the satisfaction in the multiple base classifier by dynamic select When, based on the performance evaluation result to base classifier, and the base separator for sorting forward according to performance evaluation result is as more Suitable classifier.
Correlated results merging is carried out to the base classifier prediction result in the result merging module 30 of the embodiment of the present application, Obtain personal credit file classification results.For example the result for output of classifying can be " good " client or " bad " client.
By merging the classification results of base classifier, it is able to ascend the assessment result to personal user's credit.
According to the embodiment of the present application, as preferred in the present embodiment, as shown in fig. 7, the generation module includes: to obtain Unit 101, for obtaining training set;Generation unit 102, for using the attribute reduction subset based on rough set and based on feature The feature selecting of dependency degree generates the reduction subset for training the base classifier;Training unit 103, for according to Reduction subset, training obtain heterogeneous collections learning model.
According to the various reduction subsets for generating training base classifier in the embodiment of the present application, training obtains heterogeneous collections study Model.
Specifically, since feature dependency degree describes the reconstruction ability between feature, the Measure Indexes can be used Optimization attributes about subtract subset.If a certain feature can consider to delete completely by other feature constructions from information redundancy angle This is characterized in lossless.Further, it in order to generate diversified Property element subset, is first randomly generated one and about subtracts subset, Before generating other and about subtracting subset, the dependence about subtracted in subset between feature generated at random is calculated using feature dependency degree Degree, finding can be by this feature method that other features are rebuild well.Therefore, to ensure about subtracting in subset for next selection Without a certain feature, this feature will be removed from data set before generating other and about subtracting subset.Therefore this feature will not wrap Containing about subtracting in subset what is generated later.
According to the embodiment of the present application, as preferred in the present embodiment, as shown in figure 8, the dynamic select module 20 is wrapped Include: neighbouring sample unit 201 obtains N number of neighbouring sample of test sample for using Unsupervised clustering;Performance Evaluation unit 202, for the classification results according to the N number of neighbouring sample of base classifier calculated, and N number of neighbour's sample is assessed in the base classifier On classification performance and sequence;Determination unit 203, for determining the base classifier of sample according to ranking results.
In the embodiment of the present application specifically, for test sample, firstly, being found using unsupervised algorithm from verifying collection Then N number of nearest-neighbors precalculate N number of neighbours using the basic classification device in heterogeneous collections learning model is obtained in training Classification results.Finally, the best several base classifiers of selection assessment performance in each type of base classifier.
By being based on feature dependency degree, and improve by dynamic select Manufacturing resource learning model the prediction of credit scoring Performance, the accuracy of aggregation model can be improved in filtering redundancy feature before training fundamental classifier.
According to the embodiment of the present application, as preferred in the present embodiment, as shown in figure 9, the result merging module 30 is wrapped Include: classification predicting unit 301, the satisfaction in the multiple base classifier for being gone out according to test sample and dynamic select are default The base classifier of condition, carries out classification prediction to sample;Ballot unit 302, after the result of base classifier is merged, The Different Results of credit evaluation classification are obtained using temporal voting strategy.
In the embodiment of the present application specifically, for test sample, divided in abovementioned steps by the base that dynamic select obtains Class device is initially used for prediction classification results, and final classification result is then generated using majority vote rule.
According to the embodiment of the present application, as preferred in the present embodiment, as shown in Figure 10, device further include: redundancy removal Module, the redundancy removal module includes: data capture unit 401, for obtaining user credit data, as data set;It crosses Unit 402 is filtered, for being filtered in the data set before training and generating multiple base classifiers according to default feature dependency degree Redundancy feature;Division unit 403, for the data set to be divided into training set and verifying collection.
In the embodiment of the present application specifically, can further to eliminate information by the pre-set level of feature dependency degree superfluous It is remaining.Performance based on base classifier screens feature, will not weaken original identification energy while generating different character subsets Power.
It is the realization principle substep schematic diagram of the application such as Figure 11-Figure 14.
Wherein, include main three steps in Figure 11: step 1, generating base classifier;Step 2, dynamic select base is classified Step 3 device merges the classification results of base classifier.
In step 1, specifically as shown in figure 12, it for training set (U, A), attribute reduction based on rough set and is based on The feature selecting of feature dependency degree obtains training set 1 (U1, A1);Training set 2 (U2, A2);Training set 3 (U3, A3) ....And Based on the training set 1 (U1, A1);Training set 2 (U2, A2);Training set 3 (U3, A3) obtains the base learning period device of isomery: LR, The classifier etc. of SVM, KNN, XgBoost, DT etc..
C1(LR)、C1(SVM)、C1(KNN)、C1(XgBoost)、C1(DT)、
C2(LR)、C2(SVM)、C2(KNN)、C2(XgBoost)、C2(DT)、
C3(LR)、C3(SVM)、C3(KNN)、C3(XgBoost)、C3(DT)….。
In step 1 since feature dependency degree describes the reconstruction ability between feature, the measurement can be used and refer to Mark optimization attributes about subtract subset.If a certain feature can delete completely by other feature constructions from the consideration of information redundancy angle Except this be characterized in it is lossless.Further, it in order to generate diversified Property element subset, is first randomly generated one and about subtracts son Collection is generating before other about subtract subset, calculated using feature dependency degree generate at random about subtract in subset between feature according to Lai Du, finding can be by this feature method that other features are rebuild well.Therefore, about subtract subset for ensure next selection In do not have a certain feature, will generate other about subtract subset before this feature will be removed from data set.Therefore this feature will not About subtract in subset included in what is generated later.
In step 2, specifically as shown in figure 13, using verifying collection and test sample X, Unsupervised clustering is carried out, is obtained K neighbouring samples of the result as test sample X.Ci (LR), the Ci (SVM), Ci obtained simultaneously based on last step (KNN), Ci (XgBoost), Ci (DT), i=1,2 ... .n.By the cyclone performance on assessment classifier and it is ranked up, Obtain the base classifier of sample X.In step 2, it for test sample X, is found first using unsupervised algorithm from verifying collection K nearest-neighbors, the classification results of its k neighbour are then precalculated using the basic classification device generated in the first stage. Finally, in each type of base classification wherein best multiple base classifiers of selection performance, such as 3 base classifiers.
In step 3, specifically as shown in figure 14, according to test specimens X and the base separator device of sample X selected, to X into Row classification prediction is then combined with result output " good " client or the result of " bad " client of base classifier.For X, in step 2 The classifier of selection is initially used for prediction classification results, and final classification result is then generated using majority vote rule.
Obviously, those skilled in the art should be understood that each module of above-mentioned the application or each step can be with general Computing device realize that they can be concentrated on a single computing device, or be distributed in multiple computing devices and formed Network on, optionally, they can be realized with the program code that computing device can perform, it is thus possible to which they are stored Be performed by computing device in the storage device, perhaps they are fabricated to each integrated circuit modules or by they In multiple modules or step be fabricated to single integrated circuit module to realize.In this way, the application be not limited to it is any specific Hardware and software combines.
The foregoing is merely preferred embodiment of the present application, are not intended to limit this application, for the skill of this field For art personnel, various changes and changes are possible in this application.Within the spirit and principles of this application, made any to repair Change, equivalent replacement, improvement etc., should be included within the scope of protection of this application.

Claims (10)

1. a kind of credit estimation method based on feature dependency degree characterized by comprising
Generate multiple base classifiers;
Using Unsupervised clustering, dynamic select goes out the base of the default credit evaluation condition of the satisfaction in the multiple base classifier Classifier;
The base classifier prediction result is merged, credit evaluation classification results are obtained.
2. credit estimation method according to claim 1, which is characterized in that generating multiple base classifiers includes:
Obtain training set;
Using the attribute reduction subset based on rough set and the feature selecting based on feature dependency degree, generate for training the base The reduction subset of classifier;
According to the reduction subset, training obtains heterogeneous collections learning model.
3. credit estimation method according to claim 1, which is characterized in that use Unsupervised clustering, dynamic select goes out institute The base classifier for meeting preset condition stated in multiple base classifiers includes:
Using Unsupervised clustering, N number of neighbouring sample of test sample is obtained;
According to the classification results of the N number of neighbouring sample of base classifier calculated, and N number of neighbour's sample is assessed on the base classifier Classification performance simultaneously sorts;
The base classifier of sample is determined according to ranking results.
4. credit estimation method according to claim 1, which is characterized in that closed to the base classifier prediction result And it obtains credit evaluation classification results and includes:
The base classifier for meeting preset condition in the multiple base classifier gone out according to test sample and dynamic select, Classification prediction is carried out to sample;
After the result of base classifier is merged, the Different Results of credit evaluation classification are obtained using temporal voting strategy.
5. credit estimation method according to claim 1, which is characterized in that further include:
User credit data are obtained, as data set;
Before training and generating multiple base classifiers, the spy of the redundancy in the data set is filtered according to default feature dependency degree Sign;
The data set is divided into training set and verifying collection.
6. a kind of credit evaluation device based on feature dependency degree characterized by comprising
Generation module, for generating multiple base classifiers;
Dynamic select module, for using Unsupervised clustering, dynamic select goes out the satisfaction in the multiple base classifier and presets item The base classifier of part;
As a result merging module obtains credit evaluation classification results for merging to the base classifier prediction result.
7. credit evaluation device according to claim 6, which is characterized in that the generation module includes:
Acquiring unit, for obtaining training set;
Generation unit is generated for using the attribute reduction subset based on rough set and the feature selecting based on feature dependency degree For training the reduction subset of the base classifier;
Training unit, for according to the reduction subset, training to obtain heterogeneous collections learning model.
8. credit evaluation device according to claim 6, which is characterized in that the dynamic select module includes:
Neighbouring sample unit obtains N number of neighbouring sample of test sample for using Unsupervised clustering;
Performance Evaluation unit for the classification results according to the N number of neighbouring sample of base classifier calculated, and assesses N number of neighbour's sample Classification performance and sequence on the base classifier;
Determination unit, for determining the base classifier of sample according to ranking results.
9. credit evaluation device according to claim 6, which is characterized in that the result merging module includes:
Classification predicting unit, the satisfaction in the multiple base classifier for being gone out according to test sample and dynamic select preset item The base classifier of part, carries out classification prediction to sample;
Unit of voting obtains the different of credit evaluation classification using temporal voting strategy and ties after merging the result of base classifier Fruit.
10. credit evaluation device according to claim 6, which is characterized in that further include: redundancy removal module, it is described superfluous Remaining removal module includes:
Data capture unit, for obtaining user credit data, as data set;
Filter element, for filtering the data according to default feature dependency degree before training and generating multiple base classifiers The redundancy feature of concentration;
Division unit, for the data set to be divided into training set and verifying collection.
CN201910441624.XA 2019-05-24 2019-05-24 Credit estimation method and device based on feature dependency degree Pending CN110223161A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910441624.XA CN110223161A (en) 2019-05-24 2019-05-24 Credit estimation method and device based on feature dependency degree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910441624.XA CN110223161A (en) 2019-05-24 2019-05-24 Credit estimation method and device based on feature dependency degree

Publications (1)

Publication Number Publication Date
CN110223161A true CN110223161A (en) 2019-09-10

Family

ID=67818365

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910441624.XA Pending CN110223161A (en) 2019-05-24 2019-05-24 Credit estimation method and device based on feature dependency degree

Country Status (1)

Country Link
CN (1) CN110223161A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766418A (en) * 2017-09-08 2018-03-06 广州汪汪信息技术有限公司 A kind of credit estimation method based on Fusion Model, electronic equipment and storage medium
CN107943818A (en) * 2017-10-09 2018-04-20 中国电子科技集团公司第二十八研究所 A kind of Urban Data service system and method based on Multi-source Information Fusion
CN109767312A (en) * 2018-12-10 2019-05-17 江西师范大学 A kind of training of credit evaluation model, appraisal procedure and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766418A (en) * 2017-09-08 2018-03-06 广州汪汪信息技术有限公司 A kind of credit estimation method based on Fusion Model, electronic equipment and storage medium
CN107943818A (en) * 2017-10-09 2018-04-20 中国电子科技集团公司第二十八研究所 A kind of Urban Data service system and method based on Multi-source Information Fusion
CN109767312A (en) * 2018-12-10 2019-05-17 江西师范大学 A kind of training of credit evaluation model, appraisal procedure and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王宝等: "基于粗糙集的动态异构集成信用评分模型", 《经济统计学(季刊)》 *

Similar Documents

Publication Publication Date Title
CN103198161B (en) Microblog water army recognition methods and equipment
CN109389181B (en) Association rule generation method and device for power grid abnormal event
CN106327209A (en) Multi-standard collaborative fraud detection method based on credit accumulation
CN108304427A (en) A kind of user visitor's heap sort method and apparatus
CN110533116A (en) Based on the adaptive set of Euclidean distance at unbalanced data classification method
CN104732186B (en) Single sample face recognition method based on Local Subspace rarefaction representation
CN109118119A (en) Air control model generating method and device
CN104809393B (en) A kind of support attack detecting algorithm based on popularity characteristic of division
CN105893637A (en) Link prediction method in large-scale microblog heterogeneous information network
CN108921604A (en) A kind of ad click rate prediction technique integrated based on Cost-Sensitive Classifiers
CN109919252A (en) The method for generating classifier using a small number of mark images
CN110232405A (en) Method and device for personal credit file
CN112488716B (en) Abnormal event detection system
CN103336771A (en) Data similarity detection method based on sliding window
CN105760649A (en) Big-data-oriented creditability measuring method
CN105825232A (en) Classification method and device for electromobile users
CN105447520A (en) Sample classification method based on weighted PTSVM (projection twin support vector machine)
CN107909038A (en) A kind of social networks disaggregated model training method, device, electronic equipment and medium
CN112232526A (en) Geological disaster susceptibility evaluation method and system based on integration strategy
CN113761359A (en) Data packet recommendation method and device, electronic equipment and storage medium
CN112685272B (en) Interpretable user behavior abnormity detection method
CN117556369B (en) Power theft detection method and system for dynamically generated residual error graph convolution neural network
CN104680118B (en) A kind of face character detection model generation method and system
Luceri et al. Unmasking the web of deceit: Uncovering coordinated activity to expose information operations on twitter
Yu et al. Detecting group shilling attacks in recommender systems based on maximum dense subtensor mining

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190910