CN107688831A - A kind of unbalanced data sorting technique based on cluster down-sampling - Google Patents
A kind of unbalanced data sorting technique based on cluster down-sampling Download PDFInfo
- Publication number
- CN107688831A CN107688831A CN201710784810.4A CN201710784810A CN107688831A CN 107688831 A CN107688831 A CN 107688831A CN 201710784810 A CN201710784810 A CN 201710784810A CN 107688831 A CN107688831 A CN 107688831A
- Authority
- CN
- China
- Prior art keywords
- sample
- training set
- cluster
- multiclass
- sampling
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
Abstract
The invention discloses a kind of unbalanced data sorting technique based on cluster down-sampling, this method comprises the steps:Using fast search and density peaks clustering algorithm, which clusters, to be found to the multiclass sample of training set, cluster result is obtained, the multiclass sample in training set is divided into N clusters;Few class sample in every cluster sample of multiclass sample in training set and training set forms to new sample set, and with support vector cassification, the supporting vector of multiclass sample in acquisition training set;The few class sample extracted in supporting vector and training set per cluster forms new training set together;New training set is trained by SVMs, and Performance Evaluation is carried out by cross validation collection.The present invention can not only shorten the training time of grader, and the discrimination of few class sample is improved in the case where not endangering multiclass sample identification rate, improve the performance of grader.
Description
Technical field
The present invention relates to the research field of pattern-recognition, more particularly to a kind of unbalanced data based on cluster down-sampling
Sorting technique.
Background technology
Classification problem is a very important research contents in the fields such as pattern-recognition, machine learning, in actual life
In have very extensive application, such as the Handwritten Digit Recognition in banking system, the recognition of face in safety and protection monitoring system and network
Intrusion detection in safety etc..At present, treatment classification problem has had the sorting technique of some relative maturities, such as:Decision tree,
The methods of K- neighbours, neutral net, SVMs, wherein, SVMs is with its complete theoretical explanation and good reality
Result is tested to receive significant attention.These traditional sorting techniques are all based on what class distribution equilibrium was assumed and proposed, its main mesh
Be to improve overall classification performance, good effect is shown to the data set being evenly distributed.But the institute in actual life
The features such as uneven sample size and noise jamming often occurs between classification in the data of acquisition, traditional grader is not reached
To Expected Results.
Unbalanced dataset is widely present in actual life, such as the defect ware detection on production line, credit card fraud inspection
Survey and medical diagnosis on disease etc., in these data sets, the more classification of sample number is referred to as multiclass, and the less classification of sample number is referred to as few
Class, the sample number of multiclass are far longer than the sample number of few class.In the classification problem of unbalanced dataset, the identification of few class sample
The emphasis often classified, such as the product on production line, most of to belong to qualified products, only sub-fraction is defect ware,
If using traditional sorting technique, the discrimination of defect ware is very low, just can not really realize the purpose of detection defect ware.Cause
How this, improve performance of the grader in uneven classification problem, is improved in the case where not endangering multicategory classification precision few
The discrimination of class sample is urgent problem to be solved.
The Research of Classification of unbalanced dataset can be divided into two aspects, and one is started with itself from algorithm, by changing
Enter existing algorithm, make the few class of classification deviation, typical such as Cost Sensitive Support Vector Machines, pass through the power higher to few class sample
Weigh to improve the nicety of grading of few class.Second, being pre-processed in data plane by Sampling techniques to unbalanced dataset, make
The sample number of few class and multiclass is in a basic balance in training set.
Sampling techniques can be divided into two kinds of up-sampling and down-sampling, and up-sampling technology is by simple copy or using didactic
Method typically has random up-sampling and SMOTE to increase the quantity of few class sample(Synthetic Minority Over-
sampling Technique)Algorithm.SMOTE algorithms pass through in given few random interpolation between class sample point and its K neighbour
New sample point is constructed, improves the performance of unbalanced data classification to a certain extent.But either random up-sampling is still
SMOTE algorithms, the regularity of distribution of data in itself is not followed, when the sample of generation and the inconsistent distribution of initial data,
Noise will be unavoidably introduced, not only easy over-fitting also add algorithm complex, it is impossible to adapt to the development of current big data
Trend.
Down-sampling by deleting some multiclass sample points to reduce the number of multiclass sample, typically have random down-sampling and
OSS(One Side Selection)Algorithm.Multiclass sample is divided into noise sample by OSS algorithms, boundary sample, redundant samples and
Safe sample, noise spot and boundary point are removed according to Tomek Links technologies to reduce few class number of samples.Because reduce sample
This point, down-sampling technology can reduce the complexity of algorithm, reduce the training time.But down-sampling technology is by multiclass sample
It is possible to that representative multiclass sample information can be lost while deletion, and classifying face is shifted.
The content of the invention
The main object of the present invention be the shortcomings that overcoming prior art with deficiency, there is provided it is a kind of based on cluster down-sampling not
Equilibrium criterion sorting technique, the discrimination of few class sample is improved in the case where ensureing multicategory classification precision, to improve imbalance
The classification performance of data set.
The present invention principle be:SVMs is the grader highly dependent upon supporting vector, the present invention according to support to
A kind of this key property of amount machine, it is proposed that unbalanced data sorting technique based on cluster down-sampling.First by quickly searching
Multiclass is divided into different clusters by rope and discovery density peaks clustering algorithm;Then every cluster of multiclass and few class sample point are built
Training set, the supporting vector per cluster is obtained by SVMs training, retains all supporting vectors of all clusters, deletes non-
Supporting vector builds new multiclass sample point to obtain the data set of relative equilibrium;Finally the new data set of acquisition is supported
Vector machine is classified.
The present invention uses following technical scheme:
A kind of unbalanced data sorting technique based on cluster down-sampling, comprises the steps:
(1)Unbalanced dataset is divided into training set and cross validation collection two parts;
(2)Multiclass sample and few class sample are extracted from training set;
(3)Using fast search and density peaks clustering algorithm, which clusters, to be found to the multiclass sample of training set, is clustered
As a result, the multiclass sample in training set is divided into N clusters;
(4)Few class sample in every cluster sample of multiclass sample in training set and training set is formed to new sample set, is used in combination
Support vector cassification, obtain the supporting vector of multiclass sample in training set;
(5)The few class sample extracted in supporting vector and training set per cluster forms new training set together;
(6)New training set is trained by SVMs, and Performance Evaluation is carried out by cross validation collection.
Further, step(1)In, the ratio that training set intersects collection can be allocated as needed, typically can be with
Take ten folding cross validations, i.e., be divided into data set very, will wherein 9 parts be used as training set, 1 part is used as test set.
Further, step(3)In, clustering algorithm implementation steps are:1)According to the definition of local density, calculate each more
Class sample pointLocal density;2)According toCarry out descending sort;3)Order, according to proximity density point
Range formula tries to achieve distance;4)According toWithRelationships decision figure, select cluster center, cluster center is regarded asValue compared with
Big sample point;5)After obtaining cluster center, left point is assigned in each cluster according to cluster center;The definition of local density is, whereinDefinition,For multiclass sample pointThe distance put to other,For distance threshold;Proximity density point distance definition is, its implication is than multiclass sample pointDensity is high
In sample point, withMost adjacent point arrivesDistance.
Further, step(4)In, can in the supporting vector of every cluster sample of multiclass sample in obtaining training set
The number of supporting vector is controlled by adjusting punishment parameter C and the kernel functional parameter of SVMs;Supporting vector is being supported
Decision-making Function is played in the classification of vector machine, contains the important information of multiclass sample, the supporting vector of every cluster is remained, that is, protects
Stayed multiclass sample to include the maximum sample of information content, weed out in multiclass sample be not supporting vector sample point, reach and subtract
The purpose of few multiclass sample point.
Further, step(5)In, preferably, the intersection of the supporting vector per cluster should be with lacking in training set
Class number of samples approaches.
Further, step(6)In, preferably, the standard that classification performance is assessed can use geometric average accuracy G-
The average value F-measure of the accuracy and recall rate of mean and few class.
The present invention compared with prior art, has the following advantages that and beneficial effect:
(1)The complexity of SVMs is O(N3), wherein N is training sample number, and the Downsapling method that the present invention uses subtracts
The scale of training sample is lacked, with conventional top sampling method(Such as random up-sampling and SMOTE algorithms)Compare, when shortening training
Between, more adapt to the development trend of current big data.
(2)Technical scheme provided by the invention, catch important decision effect of the supporting vector in supporting vector grader
This characteristic, multiclass is divided into more clusters using clustering algorithm, and in extracting per cluster to the support that classification plays a decisive role to
The message sample for reservation is measured, with respect to other Downsapling methods(Such as random down-sampling and OSS algorithms), preferably remain multiclass
The information of sample.
(3)In the present invention, few class sample of negligible amounts all participates in classification based training process in training set, ensure that few class
Effect of the sample in classification, the contribution rate of few class sample is improved, enhances grader overall performance.
Brief description of the drawings
Fig. 1 is that the inventive method realizes block diagram;
Fig. 2 is original support vector cassification face in the present invention, ideal sort face, relation between two class sample points and supporting vector
Schematic diagram;
Fig. 3 is support vector cassification face, two class sample points and supporting vector triadic relation after clustered down-sampling in the present invention
Schematic diagram;
The F-measure values of data set compare figure under Fig. 4 distinct methods;
The G-mean values of data set compare figure under Fig. 5 distinct methods.
Embodiment
With reference to embodiment and accompanying drawing, the present invention is described in further detail, but embodiments of the present invention are unlimited
In this.In order to not obscure the present invention, will be no longer described in detail for some common technologies such as SVMs theory etc..
A kind of unbalanced data sorting technique based on cluster down-sampling provided by the invention, specific implementation step are as follows:
(1)Unbalanced dataset is divided into training set and cross validation collection two parts, is represented by, wherein D is
Unbalanced dataset, Tr are training set, and Te is used to represent cross validation collection.The ratio that training set intersects collection can be as needed
Be allocated, can typically take ten folding cross validations, i.e., be divided into data set very, will wherein 9 parts be used as training set, 1 part of work
For test set.
(2)Tr extracts multiclass sample Ma and few class sample Mi from training set.
(3)To the multiclass sample of training setUsing fast search and find density peaks cluster
Algorithm is clustered, and obtains cluster result, the multiclass sample in training set is divided into N clusters.
Fast search and discovery density peaks clustering algorithm are based on two it is assumed that 1)Class cluster center is by close with relatively low part
Neighbours' point of degree surrounds, and 2)With having relatively large distance with more highdensity any point.
Local density is defined as, whereinIt is defined as,For multiclass
Sample pointThe distance put to other,For distance threshold, in this example,It is taken as 0.01.
Proximity density point distance definition is, its implication is than sample pointIn the high sample point of density, withMost adjacent point arrivesDistance.
The implementation steps of clustering algorithm are as follows in the present invention:
1)According to the definition of local density, the local density of every bit is calculated;
2)According toCarry out descending sort;
3)Order, distance is tried to achieve according to proximity density point range formula;
4)According toWithRelationships decision figure, select cluster center;
5)After obtaining cluster center, left point is assigned in each cluster according to cluster center.
Involved clustering algorithm need to only calculate once distance in the present invention, it is not necessary to interative computation.
(4)Few class sample in every cluster sample of multiclass sample in training set and training set is formed to new sample set,
And with support vector cassification, obtain the supporting vector of multiclass sample in training set.Classification of the supporting vector in SVMs
In play Decision-making Function, contain the important information of multiclass sample, remain the supporting vector of every cluster, that is, remain multiclass sample
Include the maximum sample of information content, weed out in multiclass sample be not supporting vector sample point, reach and reduce multiclass sample point
Purpose., can be by adjusting SVMs in the supporting vector of every cluster sample of multiclass sample in obtaining training set
Punishment parameter C and kernel functional parameter control the number of supporting vector.
(5)The few class sample extracted in supporting vector and training set per cluster forms new training set together.According to branch
The characteristic of vector machine is held, the number of supporting vector will be less than the sample number included per cluster.Preferably, the branch per cluster
Holding the intersection of vector should approach with few class number of samples in training set.
(6)New training set is trained by SVMs, and Performance Evaluation is carried out by cross validation collection.Make
For preferably, the standard that classification performance is assessed can use geometric average accuracy G-mean and few class accuracy and putting down for recall rate
Average F-measure.G-mean and F-measure is built upon what is proposed on the basis of hybrid matrix, wherein, G-mean is same
When taken into account precision ratio and precision ratio, available for evaluation system entirety classification performance, G-mean values are bigger, then system is integrally classified
Better.In unbalanced system, F-measure is used for the classification performance for evaluating few class sample, and F-measure values are bigger, then few
Class sample classification better performances.
The present embodiment is illustrated below by way of actual scene.
The present embodiment is chosen the data set that two unbalance factors differ greatly and tested.Data set is all from plus sharp welfare
Sub- university Irving branch school machine learning databases UCI, one is Haberman's Survival Data Set, the data set bag
Existence and death of the hospital of Chicago University to the patient that suffers from breast cancer after completing to perform the operation between 1958 and 1970 are contained
The judgement of situation, i.e. two classification problems, share 306 samples, each sample is surrounded by 3 attributes, respectively patient during operation
Age, the operation time of patient and detection axillary gland number positive;Another data set is Letter Recognition, number
According to concentration sample Shi white pixel 26 English alphabets, i.e., classification number be 26, share 20000 samples, each letter
16 numerical characteristics are converted into, i.e. attribute is 16 dimensions.In the present embodiment, the details of two datasets are shown in Table 1, wherein not
Balanced ratio refers to the ratio of multiclass and few class in data set.
The data set of table 1
Data set | Sample | Attribute number | It is more/few | Unbalance factor |
Haberman | 306 | 3 | 225/81 | 2.78 |
Letter | 20000 | 16 | 19266/734 | 26.25 |
It should be noted that being simplified experiment in embodiment, Letter data sets are converted into the processing of two-value class, quantity 734
Zee be few class, it is remaining to merge into multiclass.
In embodiment, by data set random division, ensure that unbalance factor does not change in partition process, wherein taking instruction
Practice 90% that collection is total sample set, test set is the 10% of sample set.
By method proposed by the invention and SVMs Direct Classification in embodiment(SVM), prop up after random down-sampling
Hold vector machine classification(RUS+SVM)It is compared, experimental result is as follows:
The F-measure values of data set under the distinct methods of table 2
Data set | SVM | RUS+SVM | The inventive method |
Haberman | 0.627 | 0.612 | 0.635 |
Letter | 0.576 | 0.581 | 0.594 |
The G-mean values of data set under the distinct methods of table 3
Data set | SVM | RUS+SVM | The inventive method |
Haberman | 0.683 | 0.677 | 0.691 |
Letter | 0.607 | 0.615 | 0.627 |
From result, it can be seen that, method proposed by the invention has some superiority, F- on the data set of different balanced ratios
Measure and G-mean raising, illustrate that the method that invention proposes can not only improve the overall classification performance of unbalanced data,
The nicety of grading of few class is also improved.
Claims (6)
1. a kind of unbalanced data sorting technique based on cluster down-sampling, it is characterised in that comprise the steps:
(1)Unbalanced dataset is divided into training set and cross validation collection two parts;
(2)Multiclass sample and few class sample are extracted from training set;
(3)Using fast search and density peaks clustering algorithm, which clusters, to be found to the multiclass sample of training set, is clustered
As a result, the multiclass sample in training set is divided into N clusters;
(4)Few class sample in every cluster sample of multiclass sample in training set and training set is formed to new sample set, is used in combination
Support vector cassification, obtain the supporting vector of multiclass sample in training set;
(5)The few class sample extracted in supporting vector and training set per cluster forms new training set together;
(6)New training set is trained by SVMs, and Performance Evaluation is carried out by cross validation collection.
A kind of 2. unbalanced data sorting technique based on cluster down-sampling as claimed in claim 1, it is characterised in that step
(1)In, the ratio that training set intersects collection is allocated as needed, is taken ten folding cross validations, i.e., is divided into data set very,
To wherein 9 parts and be used as training set, 1 part is used as test set.
A kind of 3. unbalanced data sorting technique based on cluster down-sampling as claimed in claim 1, it is characterised in that step
Suddenly(3)In, clustering algorithm implementation steps are:1)According to the definition of local density, each multiclass sample point is calculatedPart it is close
Degree;2)According toCarry out descending sort;3)Order, distance is tried to achieve according to proximity density point range formula;4)
According toWithRelationships decision figure, select cluster center, cluster center is regarded asIt is worth larger sample point;5)According to cluster
Center is assigned in each cluster by remaining sample point;The definition of local density is, whereinIt is defined as,For multiclass sample pointThe distance put to other,For distance threshold;Proximity density point distance is fixed
Justice is。
A kind of 4. unbalanced data sorting technique based on cluster down-sampling as claimed in claim 1, it is characterised in that step
(4)In, in the supporting vector of every cluster sample of multiclass sample in obtaining training set, by the punishment for adjusting SVMs
Parameter C and kernel functional parameter control the number of supporting vector, and supporting vector plays Decision-making Function in the classification of SVMs,
The important information of multiclass sample is contained, remains the supporting vector of every cluster, that is, remains multiclass sample and includes information content most
Big sample, weed out in multiclass sample be not supporting vector sample point, reach reduce multiclass sample point purpose.
A kind of 5. unbalanced data sorting technique based on cluster down-sampling as claimed in claim 1, it is characterised in that step
(5)In, the intersection of the supporting vector per cluster should approach with few class number of samples in training set.
A kind of 6. unbalanced data sorting technique based on cluster down-sampling as claimed in claim 1, it is characterised in that step
(6)In, the standard that classification performance is assessed is geometric average accuracy G-mean and the accuracy of few class and the average value of recall rate
F-measure。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710784810.4A CN107688831A (en) | 2017-09-04 | 2017-09-04 | A kind of unbalanced data sorting technique based on cluster down-sampling |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710784810.4A CN107688831A (en) | 2017-09-04 | 2017-09-04 | A kind of unbalanced data sorting technique based on cluster down-sampling |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107688831A true CN107688831A (en) | 2018-02-13 |
Family
ID=61155779
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710784810.4A Pending CN107688831A (en) | 2017-09-04 | 2017-09-04 | A kind of unbalanced data sorting technique based on cluster down-sampling |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107688831A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108629633A (en) * | 2018-05-09 | 2018-10-09 | 浪潮软件股份有限公司 | A kind of method and system for establishing user's portrait based on big data |
CN108875365A (en) * | 2018-04-22 | 2018-11-23 | 北京光宇之勋科技有限公司 | A kind of intrusion detection method and intrusion detection detection device |
CN109360206A (en) * | 2018-09-08 | 2019-02-19 | 华中农业大学 | Crop field spike of rice dividing method based on deep learning |
CN109490704A (en) * | 2018-10-16 | 2019-03-19 | 河海大学 | A kind of Fault Section Location of Distribution Network based on random forests algorithm |
CN109783586A (en) * | 2019-01-21 | 2019-05-21 | 福州大学 | Waterborne troops's comment detection system and method based on cluster resampling |
CN109871901A (en) * | 2019-03-07 | 2019-06-11 | 中南大学 | A kind of unbalanced data classification method based on mixing sampling and machine learning |
CN111080442A (en) * | 2019-12-21 | 2020-04-28 | 湖南大学 | Credit scoring model construction method, device, equipment and storage medium |
US20210158078A1 (en) * | 2018-09-03 | 2021-05-27 | Ping An Technology (Shenzhen) Co., Ltd. | Unbalanced sample data preprocessing method and device, and computer device |
CN113936185A (en) * | 2021-09-23 | 2022-01-14 | 杭州电子科技大学 | Software defect data self-adaptive oversampling method based on local density information |
US11954685B2 (en) | 2019-03-07 | 2024-04-09 | Sony Corporation | Method, apparatus and computer program for selecting a subset of training transactions from a plurality of training transactions |
-
2017
- 2017-09-04 CN CN201710784810.4A patent/CN107688831A/en active Pending
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108875365A (en) * | 2018-04-22 | 2018-11-23 | 北京光宇之勋科技有限公司 | A kind of intrusion detection method and intrusion detection detection device |
CN108875365B (en) * | 2018-04-22 | 2023-04-07 | 湖南省金盾信息安全等级保护评估中心有限公司 | Intrusion detection method and intrusion detection device |
CN108629633A (en) * | 2018-05-09 | 2018-10-09 | 浪潮软件股份有限公司 | A kind of method and system for establishing user's portrait based on big data |
US20210158078A1 (en) * | 2018-09-03 | 2021-05-27 | Ping An Technology (Shenzhen) Co., Ltd. | Unbalanced sample data preprocessing method and device, and computer device |
US11941087B2 (en) * | 2018-09-03 | 2024-03-26 | Ping An Technology (Shenzhen) Co., Ltd. | Unbalanced sample data preprocessing method and device, and computer device |
CN109360206A (en) * | 2018-09-08 | 2019-02-19 | 华中农业大学 | Crop field spike of rice dividing method based on deep learning |
CN109490704A (en) * | 2018-10-16 | 2019-03-19 | 河海大学 | A kind of Fault Section Location of Distribution Network based on random forests algorithm |
CN109783586B (en) * | 2019-01-21 | 2022-10-21 | 福州大学 | Water army comment detection method based on clustering resampling |
CN109783586A (en) * | 2019-01-21 | 2019-05-21 | 福州大学 | Waterborne troops's comment detection system and method based on cluster resampling |
CN109871901A (en) * | 2019-03-07 | 2019-06-11 | 中南大学 | A kind of unbalanced data classification method based on mixing sampling and machine learning |
US11954685B2 (en) | 2019-03-07 | 2024-04-09 | Sony Corporation | Method, apparatus and computer program for selecting a subset of training transactions from a plurality of training transactions |
CN111080442A (en) * | 2019-12-21 | 2020-04-28 | 湖南大学 | Credit scoring model construction method, device, equipment and storage medium |
CN113936185A (en) * | 2021-09-23 | 2022-01-14 | 杭州电子科技大学 | Software defect data self-adaptive oversampling method based on local density information |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107688831A (en) | A kind of unbalanced data sorting technique based on cluster down-sampling | |
Li et al. | Adaptive multi-objective swarm fusion for imbalanced data classification | |
CN105224695B (en) | A kind of text feature quantization method and device and file classification method and device based on comentropy | |
CN104866572B (en) | A kind of network short text clustering method | |
CN108304884A (en) | A kind of cost-sensitive stacking integrated study frame of feature based inverse mapping | |
CN105389480B (en) | Multiclass imbalance genomics data iteration Ensemble feature selection method and system | |
CN105760889A (en) | Efficient imbalanced data set classification method | |
CN108509982A (en) | A method of the uneven medical data of two classification of processing | |
CN106537422A (en) | Systems and methods for capture of relationships within information | |
CN105069470A (en) | Classification model training method and device | |
CN105184316A (en) | Support vector machine power grid business classification method based on feature weight learning | |
CN109284626A (en) | Random forests algorithm towards difference secret protection | |
CN107122382A (en) | A kind of patent classification method based on specification | |
CN107832412B (en) | Publication clustering method based on literature citation relation | |
CN110991653A (en) | Method for classifying unbalanced data sets | |
CN105045913B (en) | File classification method based on WordNet and latent semantic analysis | |
CN106156372A (en) | The sorting technique of a kind of internet site and device | |
Dubey et al. | A systematic review on k-means clustering techniques | |
CN105893876A (en) | Chip hardware Trojan horse detection method and system | |
CN106228554A (en) | Fuzzy coarse central coal dust image partition methods based on many attribute reductions | |
CN105938523A (en) | Feature selection method and application based on feature identification degree and independence | |
CN110533116A (en) | Based on the adaptive set of Euclidean distance at unbalanced data classification method | |
CN105389471A (en) | Method for reducing training set of machine learning | |
CN102629272A (en) | Clustering based optimization method for examination system database | |
Devi et al. | A relative evaluation of the performance of ensemble learning in credit scoring |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180213 |
|
RJ01 | Rejection of invention patent application after publication |