CN108229507A - Data classification method and device - Google Patents
Data classification method and device Download PDFInfo
- Publication number
- CN108229507A CN108229507A CN201611149072.8A CN201611149072A CN108229507A CN 108229507 A CN108229507 A CN 108229507A CN 201611149072 A CN201611149072 A CN 201611149072A CN 108229507 A CN108229507 A CN 108229507A
- Authority
- CN
- China
- Prior art keywords
- data
- grader
- sample data
- negative sample
- class
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
Abstract
The invention discloses a kind of data classification method and devices, are related to data analysis field.The present invention is divided into multiple secondary classes for the negative sample data in the data of positive and negative sample imbalance, further, negative sample data in each secondary class are further divided into two classes after being combined with positive sample data, obtain the weight of the degree of closeness of multiple graders and each grader based on grouped data, finally, final classification device is determined based on weight and each grader, the weight assigned for the closer grader of positive and negative sample data is bigger, can will not be divided to a few sample as the outlier of most samples in the classification of most samples in actual classification.The method of the present invention will not decrease or increase sample data, it will not cause the loss of sample data important information, it will not lead to over-fitting, and the feature of negative sample data is considered in assorting process, and the classifying quality of each grader, it can effectively improve the classifying quality of sample data entirety.
Description
Technical field
The present invention relates to data analysis field, more particularly to a kind of data classification method and device.
Background technology
The positive and negative sample data that we can obtain in many problems in reality is unbalanced, as quality detector is examined daily
Defect rate is well below qualification rate in the product of survey;Resident's number with cancer is to be far less than in resident's Carcinoma screening
Healthy population, it is generally the case that this kind of a few sample is referred to as positive sample data for the meaning bigger of the research of data characteristics,
And the sample data to occupy the majority is referred to as negative sample data.
Traditional sorting algorithm reduces error rate by minimizing loss function, and data distribution feelings are not accounted in algorithm
Condition is often partial to most classes.In the worst case, the example of minority class can be considered as the outlier of most classes and be neglected
Slightly.
The existing positive and negative unbalanced method of sample data of processing is big by reducing based on lack sampling method and over-sampling method
The data of class increase the data of group to reach data set balance, but lack sampling method leaves out many data and can cause major class
Many important informations are lost, and over-sampling method increases group repeated sample and is easy to cause over-fitting and increases the calculating time and deposit
Store up expense.The effect that both methods classifies for the data of positive and negative sample imbalance is bad.
Invention content
A purpose being realized of the invention is:It proposes a kind of method of data classification, improves for positive negative sample not
The effect that the data of balance are classified.
According to an aspect of the present invention, a kind of data classification method provided, including:Sample data is divided into positive sample
Notebook data and negative sample data, wherein, the ratio of the quantity of negative sample data and positive sample data is more than threshold value;According to negative sample
Negative sample data are divided into multiple secondary classes by the similitude between each data point of data;By the negative sample data of each secondary class
One group of training data is incorporated as with positive sample data, obtains multigroup training data;Supporting vector is utilized to every group of training data
Machine is trained, and obtains the degree of closeness of two class data that a grader and the grader divide;According to each grader
The degree of closeness of the two class data divided determines the weight of each grader, wherein, the two class data that grader divides approach
The weight of the smaller then grader of degree is bigger;Final classification is determined according to the weight of each grader and each grader
Device classifies to testing data using final classification device.
In one embodiment, negative sample data are divided according to the similitude between each data point of negative sample data
Include for multiple secondary classes:According to the ratio of negative sample data and the quantity of positive sample data, time that negative sample data divide is determined
The quantity of class;Negative sample data are drawn according to the similitude between each data points of negative sample data using cluster algorithm
It is divided into the secondary class of determining quantity.
In one embodiment, grader is the optimum segmentation planar representation obtained using support vector machines training;Each
The degree of closeness for the two class data that grader divides is the maximum fractionation spacing of the grader.
In one embodiment, determine that final classification device includes according to the weight of each grader and each grader:
The optimum segmentation planar representation of each grader is weighted by read group total according to the weight of each grader, obtains final point
The optimum segmentation planar representation of class device.
In one embodiment, inverse of the weight of grader for the maximum fractionation spacing of the grader.
According to another aspect of the present invention, a kind of device for classifying data provided, including:Positive negative sample division module,
For sample data to be divided into positive sample data and negative sample data, wherein, the quantity of negative sample data and positive sample data
Ratio be more than threshold value;Negative sample division module, will be negative for the similitude between each data point according to negative sample data
Sample data is divided into multiple secondary classes;Training data generation module, for by the negative sample data of each secondary class and positive sample number
According to one group of training data is incorporated as, multigroup training data is obtained;Station work module, for utilizing branch to every group of training data
It holds vector machine to be trained, obtains the degree of closeness of two class data that a grader and the grader divide;Grader is weighed
Weight determining module, for determining the weight of each grader according to the degree of closeness of two class data that each grader divides,
In, the weight of the smaller then grader of degree of closeness for the two class data that grader divides is bigger;Final classification device determining module,
For determining final classification device, data categorization module, for using most according to the weight of each grader and each grader
Whole grader classifies to testing data.
In one embodiment, negative sample division module, for according to the quantity of negative sample data and positive sample data
Ratio determines the quantity for the secondary class that negative sample data divide, using cluster algorithm according to each data of negative sample data
Negative sample data are divided into the secondary class of determining quantity by the similitude between point.
In one embodiment, grader is the optimum segmentation planar representation obtained using support vector machines training;Each
The degree of closeness for the two class data that grader divides is the maximum fractionation spacing of the grader.
In one embodiment, final classification device determining module, for the weight according to each grader by each classification
The optimum segmentation planar representation of device is weighted read group total, obtains the optimum segmentation planar representation of final classification device.
In one embodiment, inverse of the weight of grader for the maximum fractionation spacing of the grader.
The present invention is divided into multiple secondary classes for the negative sample data in the data of positive and negative sample imbalance, in each class
Negative sample data tailed off relative to the total quantity of negative sample data and each class represents a type of negative sample number
According to the negative sample data in further each secondary class are further divided into two classes after being combined with positive sample data, obtain multiple classification
The weight of the degree of closeness of device and each grader based on grouped data finally, is determined most based on weight and each grader
Whole grader, the weight assigned for the closer grader of positive and negative sample data is bigger, can will not be incited somebody to action in actual classification
A few sample is divided to as the outlier of most samples in the classification of most samples.The present invention method will not reduce or
Increase sample data, the loss of sample data important information will not be caused, over-fitting will not be led to, and examined in assorting process
Consider the feature of negative sample data and the classifying quality of each grader, can effectively improve whole point of sample data
Class effect.
By referring to the drawings to the detailed description of exemplary embodiment of the present invention, other feature of the invention and its
Advantage will become apparent.
Description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, to embodiment or will show below
There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention, for those of ordinary skill in the art, without creative efforts, can be with
Other attached drawings are obtained according to these attached drawings.
Fig. 1 shows the flow diagram of the data classification method of one embodiment of the present of invention.
Fig. 2 shows the schematic diagrames of the data classification method of an alternative embodiment of the invention.
Fig. 3 shows the structure diagram of the device for classifying data of one embodiment of the present of invention.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete
Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.Below
Description only actually at least one exemplary embodiment is illustrative, is never used as to the present invention and its application or makes
Any restrictions.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premise
Lower all other embodiments obtained, shall fall within the protection scope of the present invention.
When being classified for the data for being directed to positive and negative sample imbalance in the prior art, lack sampling method leaves out many data
Major class can be caused to lose many important informations, and over-sampling method increases group repeated sample and is easy to cause over-fitting and increases meter
Evaluation time and storage overhead.The problem of effect that both methods classifies for the data of positive and negative sample imbalance is bad,
It is proposed this programme.
With reference to the data classification method of Fig. 1 and Fig. 2 description present invention.
Fig. 1 is the flow chart of data classification method one embodiment of the present invention.As shown in Figure 1, the method packet of the embodiment
It includes:
Sample data is divided into positive sample data and negative sample data by step S102.
In data classification field, positive sample data will be known as the more valuable a kind of data of data analysis and classification,
Positive sample data are less under normal conditions, and therefore, it is necessary to whether judge the ratio of the quantity of negative sample data and positive sample data
More than threshold value.When negative sample quantity reaches certain proportion relative to positive sample quantity using the method effect of this programme more.
Such as in credit card fraud behavioral value, the sample data of acquisition includes the data of arm's length dealing and the data of fraud are used
In subsequent sorter model training process, the data volume of wherein arm's length dealing is generally far larger than the data volume of fraud,
And for identifying that the data of the obvious fraud of credit card fraud behavior more have researching value, therefore, by the data of fraud
As positive sample data, using the data of arm's length dealing as negative sample data.
As shown in Fig. 2, rectangle with shade represents the smaller positive sample data of data volume above, below larger blank square
Shape represents the larger negative sample data of data volume.
Negative sample data are divided into multiple by step S104 according to the similitude between each data point of negative sample data
Secondary class.
Specifically, according to the ratio of negative sample data and the quantity of positive sample data, time that negative sample data divide is determined
The quantity of class, for example, the quantity of negative sample data is F, the quantity of positive sample data is T, negative sample data and positive sample data
Quantity ratio for F/T, K=[F/T] is obtained to the ratio rounding, the number as the secondary class that negative sample data divide.So
Afterwards, negative sample data are divided by K using cluster algorithm according to the similitude between each data point of negative sample data
Class, wherein, cluster algorithm is, for example, K mean values (K-means) algorithm, K centers (K-medoids) algorithm, DBSCAN
(Density-Based Spatial Clustering of Applications with Noise, tool are noisy based on close
The clustering algorithm of degree) and EM algorithms (Expectation Maximization Algorithm, expectation-maximization algorithm) etc.
Deng.
Cluster algorithm is the algorithm classified based on the similitude between sample data, and the similitude of sample data can
To be weighed by taking the distance between data point or similarity as an example.By taking K mean algorithms as an example, negative sample data are divided into K
Secondary class, is simply described:
(1) K=[F/T] a object is randomly selected from negative sample data as initial cluster center;(2) by negative sample number
According to the sample in set closest cluster is assigned to according to minimal distance principle;(3) k cluster is recalculated according to cluster result
Center, and as new cluster centre;(4) step (2), (3) are repeated until cluster centre no longer changes or amplitude of variation is small
In scheduled threshold values.Since cluster algorithm belongs to more common algorithm, other several cluster algorithms, herein
It repeats no more.
Negative sample data can be divided into the secondary class with different characteristic using cluster algorithm, by each secondary class point
The negative sample data of each secondary class and positive sample data can effectively be reflected when being classified by not combined with positive sample data
Difference, the grader finally obtained can also separate the negative sample data of different characteristic and positive sample data field, if with
Negative sample data are divided into multiple secondary classes by meaning, the data characteristics difference unobvious between secondary class, so as to cause with positive sample
When data combination is classified, the result of grader is not much different, and can not effectively improve classifying quality, therefore, using poly-
Negative sample data are divided into multiple secondary classes by alanysis algorithm first, help to improve classifying quality.
As shown in Fig. 2, the space rectangles for representing the larger negative sample data of data volume are divided into multiple small rectangles.
The negative sample data of each secondary class and positive sample data are incorporated as one group of training data, obtained by step S106
Multigroup training data.
For example, negative sample data are divided into K time classes, the negative sample data of each class are with positive sample data group cooperation
One group of training data obtains K group training datas.
As shown in Fig. 2, the small rectangle of each negative sample data merges with the shaded rectangle of positive sample data, multiple packets are formed
Rectangle containing two kinds of data.
Step S108 is trained every group of training data using support vector machines, obtains a grader and this point
The degree of closeness for the two class data that class device divides.
It is super can to find a classification in n-dimensional space by SVM (Support Vector Machine, support vector machines)
Data point in space is divided into two classes by plane.The hyperplane of grader is expressed as f (x)=WTX+b, can as f (x)=0
To obtain the optimum segmentation plane of grader.For example, K group training datas obtain K grader after being trained, grader i is
For the optimum segmentation planar representation that SVM training obtains, the formula of optimum segmentation planar representation isWherein,
X be space in data point, WiAnd biFor the parameter of optimum segmentation plane, 1≤i≤K, i represent i-th of grader.Grader is drawn
The degree of closeness of two class data being divided to can use the maximum fractionation spacing L of the graderiIt represents, LiThat is the branch of optimum segmentation plane
The distance between vector is held, 1≤i≤K, i represent i-th of grader.LiIt is more big, represent the two class data that SVM classifier divides
Degree of closeness it is smaller.
As shown in Fig. 2, it is that two classes, wherein circle and fork represent that two classes are separated by SVM respectively that every group of data are divided to using SVM
Data, intermediate line are the optimum segmentation plane of grader.
Step S110, the degree of closeness of the two class data divided according to each grader determine the weight of each grader.
Wherein, the weight of the smaller then grader of degree of closeness for the two class data that grader divides is bigger, i.e. LiIt is smaller
Then the weight of grader i is bigger.The smaller differentiation for representing this two classes data of degree of closeness for the two class data that grader divides
Degree is smaller, needs also accurately to distinguish the smaller two classes data of this differentiation degree in actual classification, therefore, by this
The bigger of the weight setting of the grader of sample.It specifically, can be by 1/LiIt is set as the weight of grader i.It can also be according to need
It asks using other weight set-up modes.
Step S112 determines final classification device according to the weight of each grader and each grader.
Specifically, the optimum segmentation planar representation of each grader is weighted by summation according to the weight of each grader
It calculates, obtains the optimum segmentation planar representation of final classification device.
As shown in Fig. 2, each segmentation optimal planar is weighted summation, final segmentation plane is obtained in sample data
Positive sample data and negative sample data field separate.
Step S114 classifies to testing data using final classification device.
The final classification device obtained by training can classify to new testing data.Such as in credit card fraud row
In detection, to obtain final classification device by step S102~S114 and finally dividing what fraud and normal behaviour were classified
Transaction data input final classification device when there is new transaction data, that is, can determine whether new transaction data is to take advantage of by class device
Swindleness behavior.
The method of above-described embodiment is divided into the negative sample data in the data of positive and negative sample imbalance multiple times
Class, each the negative sample data in time class have tailed off relative to the total quantity of negative sample data and each class represents a type
The negative sample data of type, the negative sample data in further each secondary class are further divided into two classes after being combined with positive sample data,
The weight of the degree of closeness of multiple graders and each grader based on grouped data is obtained, finally, based on weight and each
Grader determines final classification device, and the weight assigned for the closer grader of positive and negative sample data is bigger, can be in reality
During classification, the smaller two classes data of this differentiation degree are also accurately distinguished, it will not be using a few sample as most samples
Outlier be divided in the classification of most samples.The method of the present invention will not decrease or increase sample data, Bu Huizao
Into the loss of sample data important information, over-fitting will not be caused, and the spy of negative sample data is considered in assorting process
The classifying quality of sign and each grader can effectively improve the classifying quality of sample data entirety.
The present invention also provides a kind of devices of data classification, are described with reference to Fig. 3.
Fig. 3 is the structure chart of device for classifying data one embodiment of the present invention.As shown in figure 3, the device 30 includes:
Positive negative sample division module 302, for sample data to be divided into positive sample data and negative sample data, wherein,
The ratio of the quantity of negative sample data and positive sample data is more than threshold value.
Negative sample division module 304, for the similitude between each data point according to negative sample data by negative sample
Data are divided into multiple secondary classes.
Specifically, negative sample division module 304, for the ratio according to negative sample data and the quantity of positive sample data,
The quantity for the secondary class that negative sample data divide is determined, using between each data point of the cluster algorithm according to negative sample data
Similitude negative sample data are divided into the secondary class of determining quantity.
Training data generation module 306, for the negative sample data of each secondary class and positive sample data to be incorporated as one
Group training data, obtains multigroup training data.
Station work module 308 for being trained to every group of training data using support vector machines, obtains a classification
The degree of closeness of two class data that device and the grader divide.
Wherein, grader is the optimum segmentation planar representation obtained using support vector machines training;Each grader divides
Two class data degree of closeness be the grader maximum fractionation spacing.
Grader weight determination module 310, for being determined according to the degree of closeness of two class data that each grader divides
The weight of each grader.
Wherein, the weight of the smaller then grader of degree of closeness for the two class data that grader divides is bigger.For example, classification
Inverse of the weight of device for the maximum fractionation spacing of the grader.
Final classification device determining module 312, for being determined finally according to the weight of each grader and each grader
Grader.
Specifically, final classification device determining module 312, for the weight according to each grader by each grader most
The expression of optimal sorting cutting plane is weighted read group total, obtains the optimum segmentation planar representation of final classification device.
Data categorization module 314, for being classified using final classification device to testing data.
One of ordinary skill in the art will appreciate that hardware can be passed through by realizing all or part of step of above-described embodiment
It completes, relevant hardware can also be instructed to complete by program, the program can be stored in a kind of computer-readable
In storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all the present invention spirit and
Within principle, any modification, equivalent replacement, improvement and so on should all be included in the protection scope of the present invention.
Claims (10)
1. a kind of data classification method, which is characterized in that including:
Sample data is divided into positive sample data and negative sample data, wherein, the quantity of negative sample data and positive sample data
Ratio be more than threshold value;
The negative sample data are divided by multiple secondary classes according to the similitude between each data point of the negative sample data;
The negative sample data of each secondary class and the positive sample data are incorporated as one group of training data, obtain multigroup trained number
According to;
Every group of training data using support vector machines is trained, obtains two classes that a grader and the grader divide
The degree of closeness of data;
The degree of closeness of two class data divided according to each grader determines the weight of each grader, wherein, classification
The weight of the smaller then grader of degree of closeness for the two class data that device divides is bigger;
Final classification device is determined according to the weight of each grader and each grader;
Classified using the final classification device to testing data.
2. according to the method described in claim 1, it is characterized in that,
The negative sample data are divided by multiple secondary classes according to the similitude between each data point of the negative sample data
Including:
According to the ratio of the negative sample data and the quantity of positive sample data, the secondary class that the negative sample data divide is determined
Quantity;
Using cluster algorithm according to the similitude between each data points of the negative sample data by the negative sample number
According to the secondary class for being divided into the determining quantity.
3. according to the method described in claim 1, it is characterized in that,
The grader is the optimum segmentation planar representation obtained using support vector machines training;
The degree of closeness of two class data that each grader divides is the maximum fractionation spacing of the grader.
4. according to the method described in claim 3, it is characterized in that,
Determine that final classification device includes according to the weight of each grader and each grader:
The optimum segmentation planar representation of each grader is weighted by read group total according to the weight of each grader, is obtained most
The optimum segmentation planar representation of whole grader.
5. according to the method described in claim 3, it is characterized in that,
Inverse of the weight of the grader for the maximum fractionation spacing of the grader.
6. a kind of device of data classification, which is characterized in that including:
Positive negative sample division module, for sample data to be divided into positive sample data and negative sample data, wherein, negative sample number
It is more than threshold value according to the ratio of the quantity with positive sample data;
Negative sample division module, for the similitude between each data point according to the negative sample data by the negative sample
Data are divided into multiple secondary classes;
Training data generation module, for the negative sample data of each secondary class and the positive sample data to be incorporated as one group of instruction
Practice data, obtain multigroup training data;
Station work module, for being trained to every group of training data using support vector machines, obtain a grader and
The degree of closeness for the two class data that the grader divides;
Grader weight determination module, the degree of closeness of two class data for being divided according to each grader determine each
The weight of grader, wherein, the weight of the smaller then grader of degree of closeness for the two class data that grader divides is bigger;
Final classification device determining module, for determining final classification device according to the weight of each grader and each grader;
Data categorization module, for being classified using the final classification device to testing data.
7. device according to claim 6, which is characterized in that
The negative sample division module for the ratio according to the negative sample data and the quantity of positive sample data, determines institute
State the quantity of the secondary class of negative sample data division, using cluster algorithm according to each data points of the negative sample data it
Between similitude the negative sample data are divided into the secondary class of the determining quantity.
8. device according to claim 6, which is characterized in that
The grader is the optimum segmentation planar representation obtained using support vector machines training;
The degree of closeness of two class data that each grader divides is the maximum fractionation spacing of the grader.
9. device according to claim 8, which is characterized in that
The final classification device determining module, for the weight according to each grader by the optimum segmentation plane of each grader
Expression is weighted read group total, obtains the optimum segmentation planar representation of final classification device.
10. device according to claim 8, which is characterized in that
Inverse of the weight of the grader for the maximum fractionation spacing of the grader.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611149072.8A CN108229507A (en) | 2016-12-14 | 2016-12-14 | Data classification method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611149072.8A CN108229507A (en) | 2016-12-14 | 2016-12-14 | Data classification method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108229507A true CN108229507A (en) | 2018-06-29 |
Family
ID=62638197
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611149072.8A Pending CN108229507A (en) | 2016-12-14 | 2016-12-14 | Data classification method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108229507A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109272056A (en) * | 2018-10-30 | 2019-01-25 | 成都信息工程大学 | The method of data balancing method and raising data classification performance based on pseudo- negative sample |
CN109558543A (en) * | 2018-12-11 | 2019-04-02 | 拉扎斯网络科技(上海)有限公司 | A kind of specimen sample method, specimen sample device, server and storage medium |
CN109670971A (en) * | 2018-11-30 | 2019-04-23 | 平安医疗健康管理股份有限公司 | Judgment method, device, equipment and the computer storage medium of abnormal medical expenditure |
CN111666872A (en) * | 2020-06-04 | 2020-09-15 | 电子科技大学 | Efficient behavior identification method under data imbalance |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7421417B2 (en) * | 2003-08-28 | 2008-09-02 | Wisconsin Alumni Research Foundation | Input feature and kernel selection for support vector machine classification |
CN101901345A (en) * | 2009-05-27 | 2010-12-01 | 复旦大学 | Classification method of differential proteomics |
CN103995821A (en) * | 2014-03-14 | 2014-08-20 | 盐城工学院 | Selective clustering integration method based on spectral clustering algorithm |
CN104573708A (en) * | 2014-12-19 | 2015-04-29 | 天津大学 | Ensemble-of-under-sampled extreme learning machine |
CN104809226A (en) * | 2015-05-07 | 2015-07-29 | 武汉大学 | Method for early classifying imbalance multi-variable time sequence data |
-
2016
- 2016-12-14 CN CN201611149072.8A patent/CN108229507A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7421417B2 (en) * | 2003-08-28 | 2008-09-02 | Wisconsin Alumni Research Foundation | Input feature and kernel selection for support vector machine classification |
CN101901345A (en) * | 2009-05-27 | 2010-12-01 | 复旦大学 | Classification method of differential proteomics |
CN103995821A (en) * | 2014-03-14 | 2014-08-20 | 盐城工学院 | Selective clustering integration method based on spectral clustering algorithm |
CN104573708A (en) * | 2014-12-19 | 2015-04-29 | 天津大学 | Ensemble-of-under-sampled extreme learning machine |
CN104809226A (en) * | 2015-05-07 | 2015-07-29 | 武汉大学 | Method for early classifying imbalance multi-variable time sequence data |
Non-Patent Citations (2)
Title |
---|
汪洪桥,蔡艳宁,王仕成,付光远,孙富春: "《模式分析的多核方法及其应用》", 31 March 2014, 国防工业出版社 * |
陈瑞雪: "基于不平衡数据的支持向量机分类方法研究", 《中国优秀硕士学位论文全文数据库(电子期刊)》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109272056A (en) * | 2018-10-30 | 2019-01-25 | 成都信息工程大学 | The method of data balancing method and raising data classification performance based on pseudo- negative sample |
CN109272056B (en) * | 2018-10-30 | 2021-09-21 | 成都信息工程大学 | Data balancing method based on pseudo negative sample and method for improving data classification performance |
CN109670971A (en) * | 2018-11-30 | 2019-04-23 | 平安医疗健康管理股份有限公司 | Judgment method, device, equipment and the computer storage medium of abnormal medical expenditure |
CN109558543A (en) * | 2018-12-11 | 2019-04-02 | 拉扎斯网络科技(上海)有限公司 | A kind of specimen sample method, specimen sample device, server and storage medium |
CN111666872A (en) * | 2020-06-04 | 2020-09-15 | 电子科技大学 | Efficient behavior identification method under data imbalance |
CN111666872B (en) * | 2020-06-04 | 2022-08-05 | 电子科技大学 | Efficient behavior identification method under data imbalance |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109952614B (en) | Biological particle classification system and method | |
CN103632168B (en) | Classifier integration method for machine learning | |
CN103136504B (en) | Face identification method and device | |
CN108229507A (en) | Data classification method and device | |
CN107194803A (en) | A kind of P2P nets borrow the device of borrower's assessing credit risks | |
CN108363810A (en) | A kind of file classification method and device | |
CN107682109B (en) | A kind of interference signal classifying identification method suitable for UAV Communication system | |
CN106326913A (en) | Money laundering account determination method and device | |
CN111062425B (en) | Unbalanced data set processing method based on C-K-SMOTE algorithm | |
CN110533116A (en) | Based on the adaptive set of Euclidean distance at unbalanced data classification method | |
CN109886284A (en) | Fraud detection method and system based on hierarchical clustering | |
CN111861103A (en) | Fresh tea leaf classification method based on multiple features and multiple classifiers | |
CN108629373A (en) | A kind of image classification method, system, equipment and computer readable storage medium | |
CN112633337A (en) | Unbalanced data processing method based on clustering and boundary points | |
CN103177266A (en) | Intelligent stock pest identification system | |
CN104850868A (en) | Customer segmentation method based on k-means and neural network cluster | |
CN110264454A (en) | Cervical cancer tissues pathological image diagnostic method based on more hidden layer condition random fields | |
CN106570076A (en) | Computer text classification system | |
CN110046593A (en) | The complex electric energy quality disturbance recognition methods of S-transformation and random forest is improved based on segmentation | |
CN109829498A (en) | Rough sort method, apparatus, terminal device and storage medium based on clustering | |
CN104134073B (en) | One kind is based on the normalized remote sensing image list class sorting technique of a class | |
CN109359680A (en) | Explosion sillar automatic identification and lumpiness feature extracting method and device | |
CN108875801A (en) | A kind of Classification of Load Curves system based on smart grid | |
CN105760471B (en) | Based on the two class text classification methods for combining convex linear perceptron | |
CN110516741A (en) | Classification based on dynamic classifier selection is overlapped unbalanced data classification method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180629 |