CN110516741A - Classification based on dynamic classifier selection is overlapped unbalanced data classification method - Google Patents
Classification based on dynamic classifier selection is overlapped unbalanced data classification method Download PDFInfo
- Publication number
- CN110516741A CN110516741A CN201910802242.5A CN201910802242A CN110516741A CN 110516741 A CN110516741 A CN 110516741A CN 201910802242 A CN201910802242 A CN 201910802242A CN 110516741 A CN110516741 A CN 110516741A
- Authority
- CN
- China
- Prior art keywords
- sample
- classification
- classifier
- cluster
- cmaj
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 26
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 35
- 238000005070 sampling Methods 0.000 claims description 5
- 238000012549 training Methods 0.000 claims description 5
- 239000012141 concentrate Substances 0.000 claims description 4
- 238000013480 data collection Methods 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 2
- 239000011159 matrix material Substances 0.000 claims description 2
- 238000012360 testing method Methods 0.000 abstract description 8
- 230000002093 peripheral effect Effects 0.000 abstract description 3
- 238000007418 data mining Methods 0.000 description 4
- 238000003066 decision tree Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 206010028980 Neoplasm Diseases 0.000 description 2
- 201000011510 cancer Diseases 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- HUTDUHSNJYTCAR-UHFFFAOYSA-N ancymidol Chemical compound C1=CC(OC)=CC=C1C(O)(C=1C=NC=NC=1)C1CC1 HUTDUHSNJYTCAR-UHFFFAOYSA-N 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
- G06F18/2155—Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
Abstract
The invention discloses the classifications based on dynamic classifier selection to be overlapped unbalanced data classification method, applies half unsupervised hierarchical clustering algorithm first, and data set is divided into multiple balanced subsets, classification overlap problem is not present in subset sample space therein.Then fundamental classifier is constructed in these subsets to form candidate classification device pond.The classification of each test sample is carried out in order to from candidate classification device pond select most suitable fundamental classifier, the ability of classifier is protruded using weight mechanism, these classifiers ability in the minority class sample for the peripheral region where belonging to test sample of classifying is more powerful.
Description
Technical field
The invention belongs to artificial intelligence fields, and the classification specifically based on dynamic classifier selection is overlapped unbalanced data
Classification method.
Background technique
Unbalanced data refers to the sample in learning sample containing plurality of classes, wherein the number of samples of a certain classification
To be far less than the number of samples of other classifications, those classifications with a small amount of sample usually be called minority class, that
The remaining classification containing more sample size is known as most classes a bit.The classification problem of unbalanced data is on a kind of sample space
The unbalanced Data Mining Classification problem of different classes of sample distribution.Rare information generally comprises more values, is worth discovery
And concern, and more accurately screening and classification.There is a situation where in many real world problems it is similar, that is, exist some
The sample size of classification will be far smaller than the sample size in other classifications, and the sample of these classifications is obviously critically important, however such as
Fruit does classification with those traditional classification learning algorithms for not using any special modification, is to be difficult classification correctly, tradition
Sorting algorithm is often partial to most class classifications, these minority class classification samples are mistakenly classified as most classes.So working as use
When traditional machine learning Classification Algorithms in Data Mining processing has the data of uneven feature, it often will appear final classification
Precision is not achieved the case where expectation, and the classifier that thus training obtains also has significant limitation.The most common performance is more
The classification accuracy rate of several classes of samples is much higher than the classification accuracy rate of minority class sample, and the sample for originally belonging to minority class is easy to
Accidentally it is divided into most classes.
In real world real life example, the phenomenon that data nonbalance, is also very common.Many classification problems exist
Scene just determined classification tilt, such as detection credit card trade in the presence or absence of fraud transaction data, detection postal
With the presence or absence of the categorical data of luxury goods and examining for medical field in the text data of fraud spam, recommender system in part
Disconnected data etc..Such issues that people be often all more concerned about be the classification accuracy rate of minority class sample, for example judging a disease
When whether people suffers from cancer, the people of people's misclassification not illness of illness is mistakenly classified as to people's cost of illness than the people of not illness
It is much higher, because of available more judgements and treatment in the latter cases.But training sample normally not illness in practice
Data sample can account for the overwhelming majority, and only a few sample is with cancer, if directlying adopt traditional Data Mining Classification side
Method will identify that the people of these illness is very difficult.Some during Sample Data Collection due to human factor and
Lead to the generation of unbalanced data sample, for example certain classifications involve other people privacy concerns and cause to be difficult to acquire in data record
Or acquisition cost is too high.Also some unbalanced data problems decomposition for coming from multiclass classification problem.Some sorting algorithms
Such as logistic regression and support vector machines (SVM), these algorithms can not be directly applied in multi-class classification problem, first
Primal problem is resolved into the subproblem of multiple two classifications to solve, easilys lead to data sample distribution in this way by original
Balanced sort problem become imbalance problem, originally unbalanced problem can also become more uneven.So raw in reality
The classification problem of unbalanced data is widely present in work, and the research for the data mining problem of this kind of data set is that have very much
Realistic meaning.Currently, these are all absorbed in for most of the algorithm that unbalanced data proposes solves original unbalanced data
Problem.However, class imbalance problem is usually overlapped along with other data complexity problems, such as classification.Classification overlapping refers to
Be that minority class sample appears in most class sample positions, mostly occur near decision boundary.It is current some for uneven number
According to after the algorithm process unbalanced data of proposition even can deteriorate classification overlap problem, and eventually lead to these algorithms performance damage
It loses.
Summary of the invention
To solve disadvantages mentioned above of the existing technology, the application provides a kind of classification weight based on dynamic classifier selection
Folded unbalanced data classification method improves the final nicety of grading of the unbalanced data of classification overlapping.
To achieve the above object, the technical solution of the application are as follows: the classification based on dynamic classifier selection is overlapped uneven
Data classification method applies half unsupervised hierarchical clustering algorithm first, data set is divided into multiple balanced subsets, son therein
Collect and classification overlap problem is not present in sample space.Then fundamental classifier is constructed in these subsets to form candidate classification
Device pond.The classification of each test sample is carried out in order to from candidate classification device pond select most suitable fundamental classifier, is used
Weight mechanism protrudes the ability of classifier, these classifiers belong to test sample in classification where peripheral region minority class
Ability is more powerful when sample.The specific implementation steps are as follows for it:
Step 1 generates candidate classification device pond;
Classification overlapping is the obstacle practised from unbalanced data middle school.Different classes of sample should be balance and not include
Classification overlapping region.At present dynamic select it is integrated in most of Data Preprocessing Technology all use bagging;Namely
It says, original study collection is sampled, classifier is established in these sampled data sets to generate candidate classification device pond;However it is logical
Cross bagging sampling policy acquisition each data subset be still it is unbalanced, lead to the Generalization Capability of final integrated model still
It is so very poor.Half unsupervised hierarchical algorithms obtain the data subset without classification overlapping according to the following steps:
Step 11: regarding N number of most class samples as an individual cluster, generate the cluster that sample size is 1 in N number of cluster
Cmaj;
Step 12: the shortest cluster Cmaj of square Euclidean distance is calculatedaAnd Cmajb, record square Euclidean distance at this time
For Dist;
Step 13: calculating each minority class sample to cluster CmajaAnd CmajbSquare Euclidean distance, it is few if there is some
The distance of several classes of samples to two clusters is respectively less than Dist, then illustrates cluster CmajaAnd CmajbBetween there are minority class sample, mark cluster
CmajaAnd CmajbAnd it does not remerge;Conversely, then minority class sample is not present therebetween in explanation, merge cluster CmajaAnd Cmajb
As new cluster Cmajc, the quantity of total cluster is N-1 after merging;
Step 14: newly-generated cluster CmajcA square Euclidean distance will be recalculated with remaining N-1 cluster;
Step 15: repeating step 12- step 14 until not new cluster can merge;
By upper generation m most class sample clusters, this m cluster is generated to m subset in conjunction with minority class sample respectively, each
Most class sample sizes and minority class sample may be unequal inside subset, concentrate over-sampling to obtain in m son using smote method
The subset balanced to m;Then, using the selected fundamental classifier of equilibrium data collection training of generation to form candidate classification
Device pond;
The most suitable fundamental classifier set of step 2, dynamic select;
It is overlapped on sample space without classification by the fundamental classifier subset that half unsupervised hierarchical clustering generates, and each
It is all that classification sample similar in attribute is combined with minority class sample in subset.It is necessary to select from candidate classification device pond
Effective classifier is to be used for each training sample xqueryClassification.In consideration of it, a most important step is how to measure
The ability of candidate classification device.Although many methods have been proposed to estimate the ability of classifier, all these methods are all
It is that the premise of scene is balanced based on data sample and is proposed.
In order to select more powerful classifier set appropriate in classification minority class sample, a kind of dynamic select is proposed
Algorithm, the minority class sample for more belonging to capacity locations of correctly classifying for candidate classification device will have higher ability.
Therefore, it is each sample x to be sorted in unbalanced data scene that main target, which is description,querySelect suitable base
The process of plinth classifier.Here, committed step is that assessment belongs to needs and each of classifies sample x to be sortedqueryCapacity locations
Candidate classification device performance.By using sample x to be sortedqueryK nearest-neighbors define capacity locations.Pass through DES
Classifier system is come the problem of handling unbalanced dataset.Selection to the minority class sample in limit of power region into
More powerful classifier when row classification.Specific step is as follows:
Step 21: verifying centralized calculation currently sample x to be sortedqueryK nearest-neighbors sample be denoted as Ψ;
Step 22: respectively to each fundamental classifier h in candidate classification device pondi, the Ψ that step 21 is obtained is as defeated
Enter prediction to be exported;
Step 23: for prediction output and true label, TP, FN, FP, TN being calculated according to table 1.
The case where minority class sample and most class samples in table 1 refer to the sample distribution within the scope of forecast sample Ψ.
Table 1: confusion matrix
It is predicted as minority class | It is predicted as most classes | |
It is really minority class | TP | FN |
True is most classes | FP | TN |
The precision of minority class sample is calculated according to formula (1):
The recall rate of minority class sample is calculated according to formula (2):
The ability of each classifier in classifier pond is calculated according to formula (3):
Wi=Precision*Recall (3)
Since classification overlapping region is not present in the subset of division in sample space, sample to be predicted should be categorized into recently
Subset class in, according to formula (4) calculate weighting after each fundamental classifier ability:
Wherein DiRepresent the average distance of current sample to be predicted sample into i-th of subset;To CompeniNumerical value according to
It arranges from big to small, selects the fundamental classifier to rank the first.
Step 3, the classifier weighting output to selecting;
The fundamental classifier currently selected can export two kinds of Probability ps 1 to sample to be predicted and p2 respectively corresponds classification c1
And c2, final output is obtained according to formula (5):
Wherein DistjIndicate the average distance of sample to be predicted sample into jth classification.
Step 2 and step 3 is repeated to complete to the classification of all forecast samples.
The present invention due to using the technology described above, can obtain following technical effect:
(1) traditional Ensemble Learning Algorithms are mostly equilibrium datas after bagging (bagging) or bagging, obtain in this way
Classification overlap problem is still remained in balanced subset, there are problems that the integrated model established in classification overlapping is extensive at this
Ability is still poor.Therefore, it is based on half unsupervised hierarchical clustering algorithm, most class samples are divided into not comprising classification weight
Folded subset, the submodel established on this can effectively promote generalization ability.
(2) traditional classifier algorithm is mostly the algorithm of calculating accuracy rate to select optimal classifier individual or collection
It closing, the application had both considered the accuracy rate of the prediction minority class sample of classifier using the dynamic classifier selection algorithm of weighting,
The relationship between forecast sample and learning sample is had also contemplated, this relationship is exactly that it should more be classified into apart from nearest class
Not in.
Detailed description of the invention
Fig. 1 is the flow chart of the application.
Specific embodiment
Specific embodiment refers to Fig. 1, it is the flow chart that the present invention realizes step, in conjunction with the figure to implementation of the invention
Process is described in detail.The embodiment of the present invention is implemented under the premise of the technical scheme of the present invention, gives
Detailed embodiment and specific operating process, but protection scope of the present invention is not limited to following embodiments.
Embodiment 1
The classification based on dynamic classifier selection that the present embodiment provides a kind of is overlapped unbalanced data classification method, including waits
The generation in classifier pond, the weighting output of the strongest fundamental classifier of dynamic select classification capacity and fundamental classifier are selected, according to
It is secondary the following steps are included:
(1) it is drawn with most class samples that half unsupervised hierarchical clustering algorithm concentrates the unbalanced data being overlapped comprising classification
Get m most class sample submanifold;
(2) m of acquisition most class sample submanifolds are merged to obtain the son of m uneven two classifications with minority class sample
Collection;
(3) subset that over-sampling obtains two classifications of m balance is carried out in uneven subset using SMOTE algorithm;
(4) m homogeneous classification device is obtained with same learning algorithm in the subset that step (3) obtains constitute candidate classification
Device pond;
(5) using dynamic classifier selection algorithm from classifier pond by test sample peripheral region sample classification ability most
Strong candidate sub-classifier is picked out;
(6) prediction result that the rule using a kind of based on distance weighted mechanism provides the classifier that step (5) obtain
Output;
The data majority class sample being overlapped comprising classification is divided into using half unsupervised hierarchical clustering algorithm and is not included
The submanifold of classification overlapping, the scheme for being different from the building subset of traditional data preprocess method and integrated study take into consideration only injustice
The difference to weigh on data bulk, these processing schemes even can deteriorate classification overlapping phenomenon after handling data, using half without prison
Superintending and directing hierarchical clustering algorithm then and can guarantee the position of sample in each subset will not overlap phenomenon with minority class sample.
The dynamic classifier selection algorithm of use strongest classifier of selective power from classifier pond, fundamental classifier energy
The assessment of power is calculated by the verifying collection classification situation of test sample arest neighbors, and specific method is to concentrate and take in verifying
K nearest-neighbors of current test sample, each sub-classifier in classifier pond predict to this k nearest-neighbors defeated
Out, it is different from traditional dynamic select algorithm, the dynamic select algorithm newly proposed is distinguished while guaranteeing classification accuracy rate
The more classifier of less several classes of samples in this k sample has stronger classification capacity, the dynamic classifier selection newly proposed
Algorithm will select the fundamental classifier to participate in last decision.
Embodiment 2
The present embodiment uses a common unbalanced data library, the glass2 data set that KEEL is collected.Glass2 number
It in total include 214 samples according to collection, each sample has 9 attribute, wherein 17 minority class samples, 197 most class samples.
Degree of unbalancedness is 11.59.Specific unbalanced data assorting process is as follows:
The generation of step 1, candidate classification device pond
(1) half unsupervised hierarchical clustering is carried out to 197 most class samples first and obtains m most class sample submanifold, it will
This m cluster merges to obtain m binary data set with minority class sample.
(2) over-sampling is concentrated to obtain the binary data set of m balance in this m data with smote algorithm.
(3) m candidate fundamental classifier pond is obtained with decision tree classification learning algorithm on this m sample.
Step 2, dynamic classifier selection
(1) it to current sample to be predicted, is concentrated in verifying and selects its 7 nearest-neighbors;
(2) the classification situation for recording this 7 nearest-neighbors each fundamental classifier in candidate classification device pond, according to formula
(1)-(4) the ability Compen of each fundamental classifier after weighting is calculated.
(3) the maximum corresponding fundamental classifier of Compen value is selected;
Step 3, classifier output
The selected sample of current class device can export two kinds of Probability ps 1 and p2 to sample to be predicted, respectively correspond classification
C1 and c2 is exported according to formula (5).
Step 2 and step 3 are repeated until the classification of all test samples is completed.
In order to better illustrate the validity of algorithm, respectively only with decision Tree algorithms classification glass2 data set and smote
Decision Tree algorithms classification glass2 data set is used to compare after processing as algorithm, while in order to quantify last result output,
It the use of AUC is algorithm index.
Table 2: distinct methods compare the classification results of glass2 database
It can be seen from Table 2 that based on the dynamic cataloging that in glass2 unbalanced data classification experiments, the application is proposed
The AUC value that device selection method obtains is 0.8608, is had on classification performance compared to other typical sorting algorithms larger
Raising.Experimental result illustrates that this method can be effectively combined half unsupervised hierarchical clustering and dynamic classifier selection is respective
Advantage improves the precision of the unbalanced data classification of classification overlapping.
Claims (4)
1. the classification based on dynamic classifier selection is overlapped unbalanced data classification method, which is characterized in that specific implementation step
It is as follows:
Step 1 generates candidate classification device pond;
Step 2, dynamic select fundamental classifier set;
Step 3, the classifier weighting output to selecting;
Step 4 repeats step 2 and step 3 to the classification completion of all forecast samples.
2. the classification based on dynamic classifier selection is overlapped unbalanced data classification method, feature according to claim 1
It is, step 1 specifically obtains the data subset without classification overlapping using half unsupervised hierarchical algorithms according to the following steps:
Step 11: regarding N number of most class samples as an individual cluster, generate the cluster Cmaj that sample size is 1 in N number of cluster;
Step 12: the shortest cluster Cmaj of square Euclidean distance is calculatedaAnd Cmajb, square Euclidean distance recorded at this time is
Dist;
Step 13: calculating each minority class sample to cluster CmajaAnd CmajbSquare Euclidean distance, if there is some minority class
The distance of sample to two clusters is respectively less than Dist, then illustrates cluster CmajaAnd CmajbBetween there are minority class sample, mark cluster
CmajaAnd CmajbAnd it does not remerge;Conversely, then minority class sample is not present therebetween in explanation, merge cluster CmajaAnd Cmajb
As new cluster Cmajc, the quantity of total cluster is N-1 after merging;
Step 14: newly-generated cluster CmajcA square Euclidean distance will be recalculated with remaining N-1 cluster;
Step 15: repeating step 12- step 14 until not new cluster can merge;
By upper generation m most class sample clusters, this m cluster is generated to m subset, each subset in conjunction with minority class sample respectively
The inside majority class sample size and minority class sample may be unequal, concentrate over-sampling to obtain m in m son using smote method
The subset of a balance;Then, using the selected fundamental classifier of equilibrium data collection training of generation to form candidate classification device
Pond.
3. the classification based on dynamic classifier selection is overlapped unbalanced data classification method, feature according to claim 1
It is, specific step is as follows for step 2:
Step 21: verifying centralized calculation currently sample x to be sortedq0eryK nearest-neighbors sample be denoted as Ψ;
Step 22: respectively to each fundamental classifier h in candidate classification device pondi, the Ψ that step 21 is obtained is as input prediction
It is exported;
Step 23: for prediction output and true label, TP, FN, FP, TN being calculated according to table 1;
Table 1: confusion matrix
The precision of minority class sample is calculated according to formula (1):
The recall rate of minority class sample is calculated according to formula (2):
The ability of each classifier in classifier pond is calculated according to formula (3):
Wi=Precision*Recall (3)
Since classification overlapping region is not present in the subset of division in sample space, sample to be predicted should be categorized into nearest son
Collect in classification, the ability of each fundamental classifier after weighting calculated according to formula (4):
Wherein DiRepresent the average distance of current sample to be predicted sample into i-th of subset;To CompeniNumerical value is according to from big
To minispread, the fundamental classifier to rank the first is selected.
4. the classification based on dynamic classifier selection is overlapped unbalanced data classification method, feature according to claim 1
It is, specific step is as follows for step 3:
The fundamental classifier currently selected can export two kinds of Probability ps 1 to sample to be predicted and p2 respectively corresponds classification c1 and c2,
Final output is obtained according to formula (5):
Wherein DistjIndicate the average distance of sample to be predicted sample into jth classification.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910802242.5A CN110516741A (en) | 2019-08-28 | 2019-08-28 | Classification based on dynamic classifier selection is overlapped unbalanced data classification method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910802242.5A CN110516741A (en) | 2019-08-28 | 2019-08-28 | Classification based on dynamic classifier selection is overlapped unbalanced data classification method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110516741A true CN110516741A (en) | 2019-11-29 |
Family
ID=68628384
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910802242.5A Pending CN110516741A (en) | 2019-08-28 | 2019-08-28 | Classification based on dynamic classifier selection is overlapped unbalanced data classification method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110516741A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111210343A (en) * | 2020-02-21 | 2020-05-29 | 浙江工商大学 | Credit card fraud detection method based on unbalanced stream data classification |
CN111695626A (en) * | 2020-06-10 | 2020-09-22 | 湖南湖大金科科技发展有限公司 | High-dimensional unbalanced data classification method based on mixed sampling and feature selection |
-
2019
- 2019-08-28 CN CN201910802242.5A patent/CN110516741A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111210343A (en) * | 2020-02-21 | 2020-05-29 | 浙江工商大学 | Credit card fraud detection method based on unbalanced stream data classification |
CN111210343B (en) * | 2020-02-21 | 2022-03-29 | 浙江工商大学 | Credit card fraud detection method based on unbalanced stream data classification |
CN111695626A (en) * | 2020-06-10 | 2020-09-22 | 湖南湖大金科科技发展有限公司 | High-dimensional unbalanced data classification method based on mixed sampling and feature selection |
CN111695626B (en) * | 2020-06-10 | 2023-10-31 | 湖南湖大金科科技发展有限公司 | High-dimensionality unbalanced data classification method based on mixed sampling and feature selection |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
García et al. | Exploring the synergetic effects of sample types on the performance of ensembles for credit risk and corporate bankruptcy prediction | |
Skryjomski et al. | Influence of minority class instance types on SMOTE imbalanced data oversampling | |
Srinivas et al. | A hybrid CNN-KNN model for MRI brain tumor classification | |
CN104598586B (en) | The method of large-scale text categorization | |
CN110533116A (en) | Based on the adaptive set of Euclidean distance at unbalanced data classification method | |
CN109886284B (en) | Fraud detection method and system based on hierarchical clustering | |
CN107688831A (en) | A kind of unbalanced data sorting technique based on cluster down-sampling | |
Rahman et al. | Effect of ensemble classifier composition on offline cursive character recognition | |
CN103092931A (en) | Multi-strategy combined document automatic classification method | |
Wang et al. | A spectral clustering method with semantic interpretation based on axiomatic fuzzy set theory | |
CN103679207A (en) | Handwriting number identification method and system | |
Pinto et al. | Iris flower species identification using machine learning approach | |
CN108416369A (en) | Based on Stacking and the random down-sampled sorting technique of overturning, system, medium and equipment | |
Huang et al. | Imbalanced data classification algorithm based on clustering and SVM | |
CN103631753A (en) | Progressively-decreased subspace ensemble learning algorithm | |
CN110516741A (en) | Classification based on dynamic classifier selection is overlapped unbalanced data classification method | |
CN112001788A (en) | Credit card default fraud identification method based on RF-DBSCAN algorithm | |
CN109472453A (en) | Power consumer credit assessment method based on global optimum's fuzzy kernel clustering model | |
Li et al. | Support cluster machine | |
Zhao et al. | Combining multiple SVM classifiers for adult image recognition | |
CN108229507A (en) | Data classification method and device | |
Ostvar et al. | HDEC: A heterogeneous dynamic ensemble classifier for binary datasets | |
Brucker et al. | An empirical comparison of flat and hierarchical performance measures for multi-label classification with hierarchy extraction | |
CN105512675A (en) | Memory multi-point crossover gravitational search-based feature selection method | |
Xiao et al. | An improved siamese network model for handwritten signature verification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191129 |