CN110533116A - Based on the adaptive set of Euclidean distance at unbalanced data classification method - Google Patents

Based on the adaptive set of Euclidean distance at unbalanced data classification method Download PDF

Info

Publication number
CN110533116A
CN110533116A CN201910832525.4A CN201910832525A CN110533116A CN 110533116 A CN110533116 A CN 110533116A CN 201910832525 A CN201910832525 A CN 201910832525A CN 110533116 A CN110533116 A CN 110533116A
Authority
CN
China
Prior art keywords
sample
classifier
classification
test
fundamental
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910832525.4A
Other languages
Chinese (zh)
Inventor
王宾
陈东
张强
魏小鹏
周昌军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University
Original Assignee
Dalian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University filed Critical Dalian University
Priority to CN201910832525.4A priority Critical patent/CN110533116A/en
Publication of CN110533116A publication Critical patent/CN110533116A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/285Selection of pattern recognition techniques, e.g. of classifiers in a multi-classifier system

Abstract

The invention discloses the adaptive set based on Euclidean distance at unbalanced data classification method, several multifarious balanced subsets are obtained by stochastic equilibrium method first, then on each balanced subset establish obtain multiple fundamental classifiers.It joined the preselected algorithm of classifier before dynamic select algorithm.After the fundamental classifier screened, a kind of new dynamic select algorithm is proposed, by assessing the sample classification device situation in sample peripheral region to be sorted, when to belong to the more more then abilities of minority class sample in range stronger for correct classification.Finally exported based on the adaptive set of distance at the prediction result that the fundamental classifier that rule will be selected obtains using a kind of.This method can obtain establishing fundamental classifier in the subset for generating multiplicity, it is proposed that dynamic select algorithm can pick out the strongest sub-classifier of classification capacity simultaneously, the integrated rule finally proposed is capable of providing preferably output as a result, finally effectively increasing unbalanced data nicety of grading.

Description

Based on the adaptive set of Euclidean distance at unbalanced data classification method
Technical field
The invention belongs to artificial intelligence field, it is specifically a kind of based on the adaptive set of Euclidean distance at uneven number According to classification method.
Background technique
Unbalanced data refers in training sample the sample an of classification or the sample of multiple classifications and other classification samples The case where quantity differs greatly.According to research report, class imbalance problem occurs in the various fields of real world, If the facial age estimate, detect satellite image oil leak, abnormality detection, identify fraudulent credit card trade, software defect prediction and Image labeling etc..Therefore, researcher pays much attention to data nonbalance problem and has held symposium several times and meeting, such as Artificial intelligence promotes association (AAAI) 2000, international Conference on Machine Learning (ICML) Knowledge Discovery sum number in 2003 and 2004 The special interest group of ACM is explored according to excavation (SIGKDD).
For two classification imbalance problems, learning sample is generally divided into most classes and minority class.In general, people are to few The degree of concern of several classes of samples will be more than most class samples, for example credit card fraud transaction identification is wanted at the cost of arm's length dealing It is much higher that fraudulent trading cost is identified as than credit card arm's length dealing, because the latter can contact credit clamping by staff Someone confirms what whether transaction was initiated by me.But the quantity of minority class sample is well below most this feelings of class sample size The consequence that condition band comes may be very serious.Due to most of traditional sorting algorithm such as decision trees, k- arest neighbors and RIPPER Tend to generate the model for maximizing whole classification accuracy, minority class sample is typically ignored.For example, for only 1% sample belongs to the data set of minority class, even if all sample classifications are most classes by model, it still can achieve 99% Overall accuracy, the minority class mistake of desired Accurate classification can be classified with the classifier of this high accuracy.
At present applied to the integrated learning approach of machine learning and the field of data mining in terms of unbalanced data classification Practical application is more and more to be suggested, but such most of algorithm can only limited raising unbalanced data classification prediction essence Degree, each fundamental classifier is the expert of regional area, does not account for each fundamental classifier for different test specimens This classification capacity is different, and the poor fundamental classifier of these performances, which is participated in final integrate, will affect the general of integrated model Change ability, and generate for fundamental classifier study subset should be it is various, guarantee the diversity of fundamental classifier, together When most of integrated studies integrated rule be all voted by most classes it is determining, do not consider training sample and test sample it Between relationship, the prediction result that provides of fundamental classifier after optimization also cannot be improved further immediately.
Summary of the invention
To solve, integrated study neutron taxonomic diversity is insufficient, does not consider that the poor fundamental classifier sum aggregate of performance is set at rule The problem of meter, the application propose it is a kind of based on the adaptive set of Euclidean distance at unbalanced data classification method, improve uneven Weigh data classification precision.
To achieve the above object, the technical solution of the present invention is as follows: based on the adaptive set of Euclidean distance at uneven number According to classification method, specifically comprise the following steps:
Step 1: data prediction, obtains diversity balanced subset;
Step 2: obtaining m homogeneous classification device building candidate using same classification learning algorithm on m balanced subset Classifier pond;
Step 3: the preselected fundamental classifier in candidate classification device pond, will not have the classification of minority class sample ability Device is deleted;
Step 4: using dynamic select algorithm by test sample peripheral region from the classifier pond that step 3 is screened The strong candidate sub-classifier of sample classification ability picks out formation base classifier set;
Step 5: using a kind of fundamental classifier set that will be selected based on the adaptive set of distance at rule for surveying The prediction result of sample sheet exports.
Further, in step 1, to data prediction: including the balanced subset obtained to training set stochastic equilibrium, Verifying collection and test set;Specific steps are as follows:
1. according to training set Strain, verifying collection SvaWith test set sample StestQuantitative proportion is a:b:c, in raw data set Middle division sample, and guarantee the training set after division, the ratio of verifying collection and most classes in test set sample and minority class The ratio of most classes and minority class is concentrated to be consistent with initial data;
2. being randomly assigned a random number num according to formula (1)rand
numrand=Smin+rand(0,1)*(Smax-Smin) (1)
Wherein SminFor training set StrainMiddle minority class sample size, rand (0,1) are the random number between 0 and 1, SmaxIt is Training set StrainMiddle majority class sample size;
3. in training set StrainIt takes at random in most class samples and does not put back to sample until the sample newly formed reaches sample Quantity is numrand, while over-sampling is carried out to minority class sample according to formula (2) and generates new sample z addition minority class sample In, repeating minority class number of samples of the over-sampling after being added is numrand, the most class samples and over-sampling that will newly form Minority class sample merging afterwards then obtains a balanced subset;
Z=β p+ (1- β) q (2)
Wherein p, q are StrainMiddle minority class sample, β are the random numbers between 0 to 1;
4. repeating step 2. and 3. until obtaining m balanced subset.
Further, in step 2, construct candidate classification device pond, specific steps: the m subset obtained to step 1 is equal M homogeneity fundamental classifier composition candidate classification device pond is obtained using same classification learning algorithm.
Further, it in step 3, needs preselected to the fundamental classifier in candidate classification device pond;Specific steps Are as follows:
1. to currently in test set StestIn sample x to be sortedq, collect S in verifyingvaMiddle k nearest-neighbors for calculating it, If there are different classes of samples in k nearest-neighbors, recording k current neighbours is Ψ;If existing in k nearest-neighbors Same category of sample, then enter step four;
2. each fundamental classifier h using the Ψ of acquisition as input, in candidate classification device pondiFor the Ψ for label of erasing Prediction obtains output yp
3. comparison basis classification prediction output ypWith the label y of true Ψ, if there is cannot at least correct classification simultaneously The fundamental classifier of the sample of one group of minority class and most classes, which is given, deletes;Fundamental classifier after deletion in candidate classification device is N.
Further, it in step 4, needs to be dynamically selected the candidate classification device after preselected, specific steps Are as follows:
1. to currently in test set StestIn sample x to be sortedq, collect S in verifyingvaMiddle k nearest-neighbors for calculating it, K sample is denoted as £;
2. each fundamental classifier h using the £ of acquisition as input, in candidate classification device pondiFor the £ for label of erasing Prediction obtains output yout;Y is exported for predictionoutWith true label y, each fundamental classifier is calculated according to formula (3) Ability weight:
Wherein I () is indicator function, θjFor the weight coefficient of j-th of sample class, θjIt is defined as follows:
3. it sorts after ability weight has been calculated according to numerical values recited, P% formation base before being taken from n fundamental classifier Classifier set C'.
Further, in step 5, classifier set C' is obtained to selection and provides prediction to current sample to be sorted Integrated output, specific steps are as follows:
1. calculating separately out parameter R1 and R2 according to formula (4) and (5)
Wherein t is the fundamental classifier quantity in set C', Pi1And Pi2It corresponds respectively in i-th of classifier for surveying The probability of minority class and most classes that sample originally provides, Di1And Di2Test sample is corresponded respectively into i-th of fundamental classifier The average Euclidean distance of the training sample of minority class and most classes, α is auto-adaptive parameter, is needed true according to different sorting algorithms It is vertical;
Before calculating distance, need that sample is normalized by formula (6):
WhereinxiRespectively represent the value of normalization front and back, xmax、xminRespectively indicate maximum value in sample data, most Small value;
2. comparing the value of parameter R1 and R2, if R1>R2, then current sample classification be minority class, on the contrary it is then for majority classes;
It repeats Step 3: step 4 and step 5 are to all test set sample StestIn sample classification complete.
The present invention can be achieved that by above method
(1) has the characteristics that the basis multifarious, guarantee is established on it in the subset obtained using stochastic equilibrium method Classifier has diversity.
(2) it joined pre-selection selection method, ensure that next step dynamic select algorithm can more preferably select basis point faster Class device.
(3) with dynamic select algorithm be each stronger fundamental classifier of samples selection ability to be sorted, avoid by The poor fundamental classifier of performance brings Generalization Capability decline problem caused by final decision exports into.
(4) the integrated rule proposed combines the output of each fundamental classifier, and considers training set and test set Between relationship, this relationship is exactly that sample to be sorted should be more categorized into nearest sample class.It is integrated using this Multiple output end values effectively can be merged integrated output by rule, improve integrated output accuracy.
Detailed description of the invention
Fig. 1 is implementation flow chart of the invention.
Specific embodiment
With reference to Fig. 1, it is the flow chart that the present invention realizes step, is made in conjunction with the figure to implementation process of the invention detailed Explanation.The embodiment of the present invention is implemented under the premise of the technical scheme of the present invention, gives detailed embodiment party Formula and specific operating process, but protection scope of the present invention is not limited to following embodiments.
It is a kind of based on the adaptive set of Euclidean distance at unbalanced data classification method, the life including candidate classification device pond At the adaptive set of, the stronger fundamental classifier set of dynamic select classification capacity and fundamental classifier at output, successively wrap Include following steps:
(1) data prediction obtains training set, verifying collection and test set;And in training pooled applications stochastic equilibrium method Obtain m balanced subset;
(2) candidate point of m homogeneous classification device building is obtained using same classification learning algorithm on this m balanced subset In class device pond;
(3) the preselected fundamental classifier in candidate classification device pond, by point for the sample ability for not having classification minority class Class device is deleted;
(4) it is screened test sample peripheral region from step (3) using dynamic select algorithm in obtained classifier pond The strongest candidate sub-classifier of sample classification ability is picked out;
(5) using a kind of fundamental classifier that will be selected based on the adaptive set of distance at rule for test sample Prediction result output;
Diversity subset is obtained using stochastic equilibrium method, is randomly assigned most class sample sizes and minority class sample number Most class samples are carried out lack sampling and carry out over-sampling to minority class sample, reach data balancing by a value between amount Purpose steps be repeated alternatively until that the subset quantity of generation reaches desired subset quantity.
In order to reduce the efficiency and more conducively dynamic select algorithm that the quantity of candidate classification device improves dynamic select algorithm The stronger fundamental classifier of selection ability deletes a part of fundamental classifier using pre-selection selection method, and specific method is to verify K nearest-neighbors for taking current sample to be tested are concentrated, it is every in candidate classification device pond using this k nearest-neighbors as inputting A sub-classifier carries out prediction output to it, deletes the fundamental classifier for not having and distinguishing minority class sample ability.
Classify situation to calculate each fundamental classifier classification capacity, specifically by the verifying collection of test sample arest neighbors Method is that k nearest-neighbors for taking current test sample are concentrated in verifying, respectively with each of candidate classification device pond basis point Class device carries out prediction output to this k nearest-neighbors, and a small number of classification abilities of correct classification relatively can guarantee totally by force just again simultaneously The fundamental classifier of true rate chooses, and is different from traditional dynamic select algorithm, and traditional selection algorithm design is mostly being protected It is carried out in the case where demonstrate,proving overall accuracy, but this fundamental classifier chosen in uneven sample can be partial to majority Class.
Each fundamental classifier can provide the output of sample to be predicted, not only allow for the defeated of each fundamental classifier Out, while the relationship between sample and training sample to be sorted, formula being had also contemplated are as follows:
Wherein t is basic classifier quantity, Pi1And Pi2It corresponds respectively in i-th of classifier provide test sample Classification 1 and classification 2 probability, Di1And Di2Correspond respectively to test sample classification 1 and classification 2 into i-th of fundamental classifier Training sample average Euclidean distance, α is auto-adaptive parameter, needs to be established according to different sorting algorithm.
If R1>R2, then current sample classification be classification 1, on the contrary it is then be classification 2.
The present embodiment uses ecoli046vs5 data collected by a disclosed standard unbalanced data library KEEL Collection.Ecoli046vs5 data set includes 203 samples in total, and each sample has 7 attribute, wherein 20 minority class samples, 183 most class samples.Degree of unbalancedness is 9.15.Specific unbalanced data assorting process is as follows:
(1) according to sample size in training set, it is 8:1:1's that sample size in sample size and test set is concentrated in verifying The original uneven learning sample collection of ratio cut partition, and guarantee most class samples and minority class sample size in the data set after dividing Ratio and original uneven learning sample collection ratio it is consistent.
(2) specific step is as follows for stochastic equilibrium on training set:
1. being randomly assigned a random number num according to formula (1)rand
2. to training set StrainMiddle minority class sample carries out reaching number of samples according to formula (2) over-sampling being numrand, Most class samples are carried out lack sampling to reach number of samples being numrand, obtain a balanced subset;
3. repeating step 1. and 2. until obtaining 100 balanced subsets.
(3) 100 homogeneous classification device building candidate classification devices are obtained using decision Tree algorithms on this 100 balanced subsets Chi Zhong;
(4) pre-selection selection method is executed to the fundamental classifier that step (3) obtain, the specific steps are as follows:
1. to currently in training set StestIn sample x to be sortedq, collect S in verifyingvaMiddle 7 nearest-neighbors for calculating it, If there are different classes of samples in 7 nearest-neighbors, recording 7 current neighbours is Ψ.If existing in 7 nearest-neighbors Same category of sample then enters step (5);
2. each fundamental classifier h using the Ψ of acquisition as input, in candidate classification device pondiFor the Ψ for label of erasing Prediction obtains output yp
3. comparison basis classification prediction output ypWith the label y of true Ψ, if there is cannot simultaneously at least classify just Really the fundamental classifier of the sample of one group of minority class of classification and most classes, which is given, deletes.Basis point after deletion in candidate classification device Class device is n.
(5) the n fundamental classifier acquired to step (4) is dynamically selected, the specific steps are as follows:
1. to currently in training set StestIn sample x to be sortedq, collect S in verifyingvaMiddle 7 nearest-neighbors for calculating it, 7 samples are denoted as £;
2. each fundamental classifier h using the £ of acquisition as input, in candidate classification device pondiFor the £ for label of erasing Prediction obtains output yout.Y is exported for predictionoutWith true label y, each fundamental classifier is calculated according to formula (3) Ability weight;
3. sorting after ability weight has been calculated according to numerical values recited, preceding 15% is taken to constitute base from n fundamental classifier Plinth classifier set C'.
(6) in order to determine the α value in formula (4)-(5), cross validation is carried out to different α values using verifying collection, finally Obtaining α value with decision tree is 1, brings α value into formula (2)-(3) and calculates separately the value of R1 and R2 and compare, if R1>R2, then current sample classification be minority class, on the contrary it is then for majority classes.
Step (4) (5) and step (6) are repeated until all test set sample StestIn sample classification complete.
In order to better illustrate the validity of algorithm, decision Tree algorithms are used after only being handled with decision Tree algorithms and smote It is compared as algorithm, while the use of AUC being algorithm index to quantify last result output.
Table 1: distinct methods compare the classification results of ecoli046vs5 data set
It can be seen from Table 1 that based on the base that in ecoli046vs5 unbalanced data classification experiments, the application is proposed In Euclidean distance adaptive set at the obtained AUC value of unbalanced data classification method be 0.9192, compared to other allusion quotations The processing method of type is enhanced on classification performance.Experimental result illustrates that this method can be effectively combined dynamic and select Select algorithm sum aggregate and design respective advantage at rule, can effectively improve unbalanced data precision of prediction and integrated model it is extensive Ability.

Claims (5)

1. based on the adaptive set of Euclidean distance at unbalanced data classification method, which is characterized in that specifically include following step It is rapid:
Step 1: data prediction, obtains diversity balanced subset;
Step 2: obtaining m homogeneous classification device building candidate classification using same classification learning algorithm on m balanced subset Device pond;
Step 3: the preselected fundamental classifier in candidate classification device pond, the classifier for not having minority class sample ability is deleted It removes;
Step 4: using dynamic select algorithm by test sample peripheral region sample from the classifier pond that step 3 is screened The strong candidate sub-classifier of classification capacity picks out formation base classifier set;
Step 5: using a kind of fundamental classifier set that will be selected based on the adaptive set of distance at rule for test specimens This prediction result output.
2. according to claim 1 based on the adaptive set of Euclidean distance at unbalanced data classification method, feature exists In in step 1, to data prediction: including the balanced subset obtained to training set stochastic equilibrium, verifying collection and test Collection;Specific steps are as follows:
1. according to training set Strain, verifying collection SvaWith test set sample StestQuantitative proportion is a:b:c, concentrates and draws in initial data Divide sample, and guarantees the training set after division, the ratio and original of verifying collection and most classes in test set sample and minority class The ratio of most classes and minority class is consistent in beginning data set;
2. being randomly assigned a random number num according to formula (1)rand
numrand=Smin+rand(0,1)*(Smax-Smin) (1)
Wherein SminFor training set StrainMiddle minority class sample size, rand (0,1) are the random number between 0 and 1, SmaxIt is trained Collect StrainMiddle majority class sample size;
3. in training set StrainIt takes at random in most class samples and does not put back to sample until the sample newly formed reaches sample size For numrand, while over-sampling is carried out to minority class sample according to formula (2) and is generated in new sample z addition minority class sample, Repeating minority class number of samples of the over-sampling after being added is numrand, after the most class samples and over-sampling that newly form The merging of minority class sample then obtains a balanced subset;
Z=β p+ (1- β) q (2)
Wherein p, q are StrainMiddle minority class sample, β are the random numbers between 0 to 1;
4. repeating step 2. and 3. until obtaining m balanced subset.
3. according to claim 1 based on the adaptive set of Euclidean distance at unbalanced data classification method, feature exists In needing preselected to the fundamental classifier in candidate classification device pond in step 3;Specific steps are as follows:
1. to currently in test set StestIn sample x to be sortedq, collect S in verifyingvaMiddle k nearest-neighbors for calculating it, if k There are different classes of samples in nearest-neighbors, then recording k current neighbours is ψ;If in k nearest-neighbors, there are same class Other sample, then enter step four;
2. each fundamental classifier h using the Ψ of acquisition as input, in candidate classification device pondiΨ prediction for label of erasing Obtain output yp
3. comparison basis classification prediction output ypWith the label y of true Ψ, if there is cannot at least correct one group of classification simultaneously it is few The fundamental classifier of the sample of several classes of and most classes, which is given, deletes;Fundamental classifier after deletion in candidate classification device is n.
4. according to claim 1 based on the adaptive set of Euclidean distance at unbalanced data classification method, feature exists In needing to be dynamically selected the candidate classification device after preselected, specific steps in step 4 are as follows:
1. to currently in test set StestIn sample x to be sortedq, collect S in verifyingvaMiddle k nearest-neighbors for calculating it, by k Sample is denoted as £;
2. each fundamental classifier h using the £ of acquisition as input, in candidate classification device pondi£ prediction for label of erasing Obtain output yout;Y is exported for predictionoutWith true label y, the ability of each fundamental classifier is calculated according to formula (3) Weight:
Wherein I () is indicator function, θjFor the weight coefficient of j-th of sample class, θjIt is defined as follows:
3. it sorts after ability weight has been calculated according to numerical values recited, P% formation base classification before being taken from n fundamental classifier Device set C'.
5. according to claim 4 based on the adaptive set of Euclidean distance at unbalanced data classification method, feature exists In, in step 5, classifier set C' is obtained to selection and is provided to the integrated output of the prediction of current sample to be sorted, it is specific to walk Suddenly are as follows:
1. calculating separately out parameter R1 and R2 according to formula (4) and (5)
Wherein t is the fundamental classifier quantity in set C', Pi1And Pi2It corresponds respectively in i-th of classifier for test sample The probability of the minority class and most classes that provide, Di1And Di2Correspond respectively to test sample minority class into i-th of fundamental classifier With the average Euclidean distance of the training sample of most classes, α is auto-adaptive parameter;
Before calculating distance, need that sample is normalized by formula (6):
WhereinFor the value after normalization, xiFor the value before normalization, xmax、xminRespectively indicate maximum value in sample data, Minimum value;
2. comparing the value of parameter R1 and R2, if R1>R2, then current sample classification be minority class, on the contrary it is then for majority classes.
CN201910832525.4A 2019-09-04 2019-09-04 Based on the adaptive set of Euclidean distance at unbalanced data classification method Pending CN110533116A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910832525.4A CN110533116A (en) 2019-09-04 2019-09-04 Based on the adaptive set of Euclidean distance at unbalanced data classification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910832525.4A CN110533116A (en) 2019-09-04 2019-09-04 Based on the adaptive set of Euclidean distance at unbalanced data classification method

Publications (1)

Publication Number Publication Date
CN110533116A true CN110533116A (en) 2019-12-03

Family

ID=68666803

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910832525.4A Pending CN110533116A (en) 2019-09-04 2019-09-04 Based on the adaptive set of Euclidean distance at unbalanced data classification method

Country Status (1)

Country Link
CN (1) CN110533116A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111080442A (en) * 2019-12-21 2020-04-28 湖南大学 Credit scoring model construction method, device, equipment and storage medium
CN111210343A (en) * 2020-02-21 2020-05-29 浙江工商大学 Credit card fraud detection method based on unbalanced stream data classification
CN112035719A (en) * 2020-09-01 2020-12-04 渤海大学 Class imbalance data classification method and system based on convex polyhedron classifier
CN113204481A (en) * 2021-04-21 2021-08-03 武汉大学 Class imbalance software defect prediction method based on data resampling
CN113673573A (en) * 2021-07-22 2021-11-19 华南理工大学 Anomaly detection method based on self-adaptive integrated random fuzzy classification
CN114220026A (en) * 2021-12-30 2022-03-22 杭州电子科技大学 Sea surface small target detection method based on multi-classification idea
CN114548327A (en) * 2022-04-27 2022-05-27 湖南工商大学 Software defect prediction method, system, device and medium based on balanced subsets
CN113673573B (en) * 2021-07-22 2024-04-30 华南理工大学 Abnormality detection method based on self-adaptive integrated random fuzzy classification

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111080442A (en) * 2019-12-21 2020-04-28 湖南大学 Credit scoring model construction method, device, equipment and storage medium
CN111210343A (en) * 2020-02-21 2020-05-29 浙江工商大学 Credit card fraud detection method based on unbalanced stream data classification
CN111210343B (en) * 2020-02-21 2022-03-29 浙江工商大学 Credit card fraud detection method based on unbalanced stream data classification
CN112035719A (en) * 2020-09-01 2020-12-04 渤海大学 Class imbalance data classification method and system based on convex polyhedron classifier
CN112035719B (en) * 2020-09-01 2024-02-20 渤海大学 Category imbalance data classification method and system based on convex polyhedron classifier
CN113204481A (en) * 2021-04-21 2021-08-03 武汉大学 Class imbalance software defect prediction method based on data resampling
CN113204481B (en) * 2021-04-21 2022-03-04 武汉大学 Class imbalance software defect prediction method based on data resampling
CN113673573A (en) * 2021-07-22 2021-11-19 华南理工大学 Anomaly detection method based on self-adaptive integrated random fuzzy classification
CN113673573B (en) * 2021-07-22 2024-04-30 华南理工大学 Abnormality detection method based on self-adaptive integrated random fuzzy classification
CN114220026A (en) * 2021-12-30 2022-03-22 杭州电子科技大学 Sea surface small target detection method based on multi-classification idea
CN114548327A (en) * 2022-04-27 2022-05-27 湖南工商大学 Software defect prediction method, system, device and medium based on balanced subsets

Similar Documents

Publication Publication Date Title
CN110533116A (en) Based on the adaptive set of Euclidean distance at unbalanced data classification method
CN109492026B (en) Telecommunication fraud classification detection method based on improved active learning technology
CN106326913A (en) Money laundering account determination method and device
WO2019179403A1 (en) Fraud transaction detection method based on sequence width depth learning
Ahalya et al. Data clustering approaches survey and analysis
CN110147321A (en) A kind of recognition methods of the defect high risk module based on software network
CN107766418A (en) A kind of credit estimation method based on Fusion Model, electronic equipment and storage medium
CN108363810A (en) A kind of file classification method and device
CN106228554B (en) Fuzzy coarse central coal dust image partition method based on many attribute reductions
CN108319987A (en) A kind of filtering based on support vector machines-packaged type combined flow feature selection approach
CN109886284B (en) Fraud detection method and system based on hierarchical clustering
CN107273387A (en) Towards higher-dimension and unbalanced data classify it is integrated
CN109739844A (en) Data classification method based on decaying weight
CN110147760A (en) A kind of efficient electrical energy power quality disturbance image characteristics extraction and identification new method
CN112633337A (en) Unbalanced data processing method based on clustering and boundary points
CN110377605A (en) A kind of Sensitive Attributes identification of structural data and classification stage division
CN112417176B (en) Method, equipment and medium for mining implicit association relation between enterprises based on graph characteristics
CN110134719A (en) A kind of identification of structural data Sensitive Attributes and stage division of classifying
CN104850868A (en) Customer segmentation method based on k-means and neural network cluster
CN112001788A (en) Credit card default fraud identification method based on RF-DBSCAN algorithm
CN106934410A (en) The sorting technique and system of data
CN109993042A (en) A kind of face identification method and its device
CN110334773A (en) Model based on machine learning enters the screening technique of modular character
Dong Application of Big Data Mining Technology in Blockchain Computing
CN110516741A (en) Classification based on dynamic classifier selection is overlapped unbalanced data classification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191203