CN108229507A - Data classification method and device - Google Patents

Data classification method and device Download PDF

Info

Publication number
CN108229507A
CN108229507A CN201611149072.8A CN201611149072A CN108229507A CN 108229507 A CN108229507 A CN 108229507A CN 201611149072 A CN201611149072 A CN 201611149072A CN 108229507 A CN108229507 A CN 108229507A
Authority
CN
China
Prior art keywords
data
grader
sample data
negative sample
class
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611149072.8A
Other languages
Chinese (zh)
Inventor
陈新河
李慧芳
赵静
詹文浩
张诺亚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Corp Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN201611149072.8A priority Critical patent/CN108229507A/en
Publication of CN108229507A publication Critical patent/CN108229507A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Abstract

The invention discloses a kind of data classification method and devices, are related to data analysis field.The present invention is divided into multiple secondary classes for the negative sample data in the data of positive and negative sample imbalance, further, negative sample data in each secondary class are further divided into two classes after being combined with positive sample data, obtain the weight of the degree of closeness of multiple graders and each grader based on grouped data, finally, final classification device is determined based on weight and each grader, the weight assigned for the closer grader of positive and negative sample data is bigger, can will not be divided to a few sample as the outlier of most samples in the classification of most samples in actual classification.The method of the present invention will not decrease or increase sample data, it will not cause the loss of sample data important information, it will not lead to over-fitting, and the feature of negative sample data is considered in assorting process, and the classifying quality of each grader, it can effectively improve the classifying quality of sample data entirety.

Description

Data classification method and device
Technical field
The present invention relates to data analysis field, more particularly to a kind of data classification method and device.
Background technology
The positive and negative sample data that we can obtain in many problems in reality is unbalanced, as quality detector is examined daily Defect rate is well below qualification rate in the product of survey;Resident's number with cancer is to be far less than in resident's Carcinoma screening Healthy population, it is generally the case that this kind of a few sample is referred to as positive sample data for the meaning bigger of the research of data characteristics, And the sample data to occupy the majority is referred to as negative sample data.
Traditional sorting algorithm reduces error rate by minimizing loss function, and data distribution feelings are not accounted in algorithm Condition is often partial to most classes.In the worst case, the example of minority class can be considered as the outlier of most classes and be neglected Slightly.
The existing positive and negative unbalanced method of sample data of processing is big by reducing based on lack sampling method and over-sampling method The data of class increase the data of group to reach data set balance, but lack sampling method leaves out many data and can cause major class Many important informations are lost, and over-sampling method increases group repeated sample and is easy to cause over-fitting and increases the calculating time and deposit Store up expense.The effect that both methods classifies for the data of positive and negative sample imbalance is bad.
Invention content
A purpose being realized of the invention is:It proposes a kind of method of data classification, improves for positive negative sample not The effect that the data of balance are classified.
According to an aspect of the present invention, a kind of data classification method provided, including:Sample data is divided into positive sample Notebook data and negative sample data, wherein, the ratio of the quantity of negative sample data and positive sample data is more than threshold value;According to negative sample Negative sample data are divided into multiple secondary classes by the similitude between each data point of data;By the negative sample data of each secondary class One group of training data is incorporated as with positive sample data, obtains multigroup training data;Supporting vector is utilized to every group of training data Machine is trained, and obtains the degree of closeness of two class data that a grader and the grader divide;According to each grader The degree of closeness of the two class data divided determines the weight of each grader, wherein, the two class data that grader divides approach The weight of the smaller then grader of degree is bigger;Final classification is determined according to the weight of each grader and each grader Device classifies to testing data using final classification device.
In one embodiment, negative sample data are divided according to the similitude between each data point of negative sample data Include for multiple secondary classes:According to the ratio of negative sample data and the quantity of positive sample data, time that negative sample data divide is determined The quantity of class;Negative sample data are drawn according to the similitude between each data points of negative sample data using cluster algorithm It is divided into the secondary class of determining quantity.
In one embodiment, grader is the optimum segmentation planar representation obtained using support vector machines training;Each The degree of closeness for the two class data that grader divides is the maximum fractionation spacing of the grader.
In one embodiment, determine that final classification device includes according to the weight of each grader and each grader: The optimum segmentation planar representation of each grader is weighted by read group total according to the weight of each grader, obtains final point The optimum segmentation planar representation of class device.
In one embodiment, inverse of the weight of grader for the maximum fractionation spacing of the grader.
According to another aspect of the present invention, a kind of device for classifying data provided, including:Positive negative sample division module, For sample data to be divided into positive sample data and negative sample data, wherein, the quantity of negative sample data and positive sample data Ratio be more than threshold value;Negative sample division module, will be negative for the similitude between each data point according to negative sample data Sample data is divided into multiple secondary classes;Training data generation module, for by the negative sample data of each secondary class and positive sample number According to one group of training data is incorporated as, multigroup training data is obtained;Station work module, for utilizing branch to every group of training data It holds vector machine to be trained, obtains the degree of closeness of two class data that a grader and the grader divide;Grader is weighed Weight determining module, for determining the weight of each grader according to the degree of closeness of two class data that each grader divides, In, the weight of the smaller then grader of degree of closeness for the two class data that grader divides is bigger;Final classification device determining module, For determining final classification device, data categorization module, for using most according to the weight of each grader and each grader Whole grader classifies to testing data.
In one embodiment, negative sample division module, for according to the quantity of negative sample data and positive sample data Ratio determines the quantity for the secondary class that negative sample data divide, using cluster algorithm according to each data of negative sample data Negative sample data are divided into the secondary class of determining quantity by the similitude between point.
In one embodiment, grader is the optimum segmentation planar representation obtained using support vector machines training;Each The degree of closeness for the two class data that grader divides is the maximum fractionation spacing of the grader.
In one embodiment, final classification device determining module, for the weight according to each grader by each classification The optimum segmentation planar representation of device is weighted read group total, obtains the optimum segmentation planar representation of final classification device.
In one embodiment, inverse of the weight of grader for the maximum fractionation spacing of the grader.
The present invention is divided into multiple secondary classes for the negative sample data in the data of positive and negative sample imbalance, in each class Negative sample data tailed off relative to the total quantity of negative sample data and each class represents a type of negative sample number According to the negative sample data in further each secondary class are further divided into two classes after being combined with positive sample data, obtain multiple classification The weight of the degree of closeness of device and each grader based on grouped data finally, is determined most based on weight and each grader Whole grader, the weight assigned for the closer grader of positive and negative sample data is bigger, can will not be incited somebody to action in actual classification A few sample is divided to as the outlier of most samples in the classification of most samples.The present invention method will not reduce or Increase sample data, the loss of sample data important information will not be caused, over-fitting will not be led to, and examined in assorting process Consider the feature of negative sample data and the classifying quality of each grader, can effectively improve whole point of sample data Class effect.
By referring to the drawings to the detailed description of exemplary embodiment of the present invention, other feature of the invention and its Advantage will become apparent.
Description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, to embodiment or will show below There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention, for those of ordinary skill in the art, without creative efforts, can be with Other attached drawings are obtained according to these attached drawings.
Fig. 1 shows the flow diagram of the data classification method of one embodiment of the present of invention.
Fig. 2 shows the schematic diagrames of the data classification method of an alternative embodiment of the invention.
Fig. 3 shows the structure diagram of the device for classifying data of one embodiment of the present of invention.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.Below Description only actually at least one exemplary embodiment is illustrative, is never used as to the present invention and its application or makes Any restrictions.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premise Lower all other embodiments obtained, shall fall within the protection scope of the present invention.
When being classified for the data for being directed to positive and negative sample imbalance in the prior art, lack sampling method leaves out many data Major class can be caused to lose many important informations, and over-sampling method increases group repeated sample and is easy to cause over-fitting and increases meter Evaluation time and storage overhead.The problem of effect that both methods classifies for the data of positive and negative sample imbalance is bad, It is proposed this programme.
With reference to the data classification method of Fig. 1 and Fig. 2 description present invention.
Fig. 1 is the flow chart of data classification method one embodiment of the present invention.As shown in Figure 1, the method packet of the embodiment It includes:
Sample data is divided into positive sample data and negative sample data by step S102.
In data classification field, positive sample data will be known as the more valuable a kind of data of data analysis and classification, Positive sample data are less under normal conditions, and therefore, it is necessary to whether judge the ratio of the quantity of negative sample data and positive sample data More than threshold value.When negative sample quantity reaches certain proportion relative to positive sample quantity using the method effect of this programme more. Such as in credit card fraud behavioral value, the sample data of acquisition includes the data of arm's length dealing and the data of fraud are used In subsequent sorter model training process, the data volume of wherein arm's length dealing is generally far larger than the data volume of fraud, And for identifying that the data of the obvious fraud of credit card fraud behavior more have researching value, therefore, by the data of fraud As positive sample data, using the data of arm's length dealing as negative sample data.
As shown in Fig. 2, rectangle with shade represents the smaller positive sample data of data volume above, below larger blank square Shape represents the larger negative sample data of data volume.
Negative sample data are divided into multiple by step S104 according to the similitude between each data point of negative sample data Secondary class.
Specifically, according to the ratio of negative sample data and the quantity of positive sample data, time that negative sample data divide is determined The quantity of class, for example, the quantity of negative sample data is F, the quantity of positive sample data is T, negative sample data and positive sample data Quantity ratio for F/T, K=[F/T] is obtained to the ratio rounding, the number as the secondary class that negative sample data divide.So Afterwards, negative sample data are divided by K using cluster algorithm according to the similitude between each data point of negative sample data Class, wherein, cluster algorithm is, for example, K mean values (K-means) algorithm, K centers (K-medoids) algorithm, DBSCAN (Density-Based Spatial Clustering of Applications with Noise, tool are noisy based on close The clustering algorithm of degree) and EM algorithms (Expectation Maximization Algorithm, expectation-maximization algorithm) etc. Deng.
Cluster algorithm is the algorithm classified based on the similitude between sample data, and the similitude of sample data can To be weighed by taking the distance between data point or similarity as an example.By taking K mean algorithms as an example, negative sample data are divided into K Secondary class, is simply described:
(1) K=[F/T] a object is randomly selected from negative sample data as initial cluster center;(2) by negative sample number According to the sample in set closest cluster is assigned to according to minimal distance principle;(3) k cluster is recalculated according to cluster result Center, and as new cluster centre;(4) step (2), (3) are repeated until cluster centre no longer changes or amplitude of variation is small In scheduled threshold values.Since cluster algorithm belongs to more common algorithm, other several cluster algorithms, herein It repeats no more.
Negative sample data can be divided into the secondary class with different characteristic using cluster algorithm, by each secondary class point The negative sample data of each secondary class and positive sample data can effectively be reflected when being classified by not combined with positive sample data Difference, the grader finally obtained can also separate the negative sample data of different characteristic and positive sample data field, if with Negative sample data are divided into multiple secondary classes by meaning, the data characteristics difference unobvious between secondary class, so as to cause with positive sample When data combination is classified, the result of grader is not much different, and can not effectively improve classifying quality, therefore, using poly- Negative sample data are divided into multiple secondary classes by alanysis algorithm first, help to improve classifying quality.
As shown in Fig. 2, the space rectangles for representing the larger negative sample data of data volume are divided into multiple small rectangles.
The negative sample data of each secondary class and positive sample data are incorporated as one group of training data, obtained by step S106 Multigroup training data.
For example, negative sample data are divided into K time classes, the negative sample data of each class are with positive sample data group cooperation One group of training data obtains K group training datas.
As shown in Fig. 2, the small rectangle of each negative sample data merges with the shaded rectangle of positive sample data, multiple packets are formed Rectangle containing two kinds of data.
Step S108 is trained every group of training data using support vector machines, obtains a grader and this point The degree of closeness for the two class data that class device divides.
It is super can to find a classification in n-dimensional space by SVM (Support Vector Machine, support vector machines) Data point in space is divided into two classes by plane.The hyperplane of grader is expressed as f (x)=WTX+b, can as f (x)=0 To obtain the optimum segmentation plane of grader.For example, K group training datas obtain K grader after being trained, grader i is For the optimum segmentation planar representation that SVM training obtains, the formula of optimum segmentation planar representation isWherein, X be space in data point, WiAnd biFor the parameter of optimum segmentation plane, 1≤i≤K, i represent i-th of grader.Grader is drawn The degree of closeness of two class data being divided to can use the maximum fractionation spacing L of the graderiIt represents, LiThat is the branch of optimum segmentation plane The distance between vector is held, 1≤i≤K, i represent i-th of grader.LiIt is more big, represent the two class data that SVM classifier divides Degree of closeness it is smaller.
As shown in Fig. 2, it is that two classes, wherein circle and fork represent that two classes are separated by SVM respectively that every group of data are divided to using SVM Data, intermediate line are the optimum segmentation plane of grader.
Step S110, the degree of closeness of the two class data divided according to each grader determine the weight of each grader.
Wherein, the weight of the smaller then grader of degree of closeness for the two class data that grader divides is bigger, i.e. LiIt is smaller Then the weight of grader i is bigger.The smaller differentiation for representing this two classes data of degree of closeness for the two class data that grader divides Degree is smaller, needs also accurately to distinguish the smaller two classes data of this differentiation degree in actual classification, therefore, by this The bigger of the weight setting of the grader of sample.It specifically, can be by 1/LiIt is set as the weight of grader i.It can also be according to need It asks using other weight set-up modes.
Step S112 determines final classification device according to the weight of each grader and each grader.
Specifically, the optimum segmentation planar representation of each grader is weighted by summation according to the weight of each grader It calculates, obtains the optimum segmentation planar representation of final classification device.
As shown in Fig. 2, each segmentation optimal planar is weighted summation, final segmentation plane is obtained in sample data Positive sample data and negative sample data field separate.
Step S114 classifies to testing data using final classification device.
The final classification device obtained by training can classify to new testing data.Such as in credit card fraud row In detection, to obtain final classification device by step S102~S114 and finally dividing what fraud and normal behaviour were classified Transaction data input final classification device when there is new transaction data, that is, can determine whether new transaction data is to take advantage of by class device Swindleness behavior.
The method of above-described embodiment is divided into the negative sample data in the data of positive and negative sample imbalance multiple times Class, each the negative sample data in time class have tailed off relative to the total quantity of negative sample data and each class represents a type The negative sample data of type, the negative sample data in further each secondary class are further divided into two classes after being combined with positive sample data, The weight of the degree of closeness of multiple graders and each grader based on grouped data is obtained, finally, based on weight and each Grader determines final classification device, and the weight assigned for the closer grader of positive and negative sample data is bigger, can be in reality During classification, the smaller two classes data of this differentiation degree are also accurately distinguished, it will not be using a few sample as most samples Outlier be divided in the classification of most samples.The method of the present invention will not decrease or increase sample data, Bu Huizao Into the loss of sample data important information, over-fitting will not be caused, and the spy of negative sample data is considered in assorting process The classifying quality of sign and each grader can effectively improve the classifying quality of sample data entirety.
The present invention also provides a kind of devices of data classification, are described with reference to Fig. 3.
Fig. 3 is the structure chart of device for classifying data one embodiment of the present invention.As shown in figure 3, the device 30 includes:
Positive negative sample division module 302, for sample data to be divided into positive sample data and negative sample data, wherein, The ratio of the quantity of negative sample data and positive sample data is more than threshold value.
Negative sample division module 304, for the similitude between each data point according to negative sample data by negative sample Data are divided into multiple secondary classes.
Specifically, negative sample division module 304, for the ratio according to negative sample data and the quantity of positive sample data, The quantity for the secondary class that negative sample data divide is determined, using between each data point of the cluster algorithm according to negative sample data Similitude negative sample data are divided into the secondary class of determining quantity.
Training data generation module 306, for the negative sample data of each secondary class and positive sample data to be incorporated as one Group training data, obtains multigroup training data.
Station work module 308 for being trained to every group of training data using support vector machines, obtains a classification The degree of closeness of two class data that device and the grader divide.
Wherein, grader is the optimum segmentation planar representation obtained using support vector machines training;Each grader divides Two class data degree of closeness be the grader maximum fractionation spacing.
Grader weight determination module 310, for being determined according to the degree of closeness of two class data that each grader divides The weight of each grader.
Wherein, the weight of the smaller then grader of degree of closeness for the two class data that grader divides is bigger.For example, classification Inverse of the weight of device for the maximum fractionation spacing of the grader.
Final classification device determining module 312, for being determined finally according to the weight of each grader and each grader Grader.
Specifically, final classification device determining module 312, for the weight according to each grader by each grader most The expression of optimal sorting cutting plane is weighted read group total, obtains the optimum segmentation planar representation of final classification device.
Data categorization module 314, for being classified using final classification device to testing data.
One of ordinary skill in the art will appreciate that hardware can be passed through by realizing all or part of step of above-described embodiment It completes, relevant hardware can also be instructed to complete by program, the program can be stored in a kind of computer-readable In storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all the present invention spirit and Within principle, any modification, equivalent replacement, improvement and so on should all be included in the protection scope of the present invention.

Claims (10)

1. a kind of data classification method, which is characterized in that including:
Sample data is divided into positive sample data and negative sample data, wherein, the quantity of negative sample data and positive sample data Ratio be more than threshold value;
The negative sample data are divided by multiple secondary classes according to the similitude between each data point of the negative sample data;
The negative sample data of each secondary class and the positive sample data are incorporated as one group of training data, obtain multigroup trained number According to;
Every group of training data using support vector machines is trained, obtains two classes that a grader and the grader divide The degree of closeness of data;
The degree of closeness of two class data divided according to each grader determines the weight of each grader, wherein, classification The weight of the smaller then grader of degree of closeness for the two class data that device divides is bigger;
Final classification device is determined according to the weight of each grader and each grader;
Classified using the final classification device to testing data.
2. according to the method described in claim 1, it is characterized in that,
The negative sample data are divided by multiple secondary classes according to the similitude between each data point of the negative sample data Including:
According to the ratio of the negative sample data and the quantity of positive sample data, the secondary class that the negative sample data divide is determined Quantity;
Using cluster algorithm according to the similitude between each data points of the negative sample data by the negative sample number According to the secondary class for being divided into the determining quantity.
3. according to the method described in claim 1, it is characterized in that,
The grader is the optimum segmentation planar representation obtained using support vector machines training;
The degree of closeness of two class data that each grader divides is the maximum fractionation spacing of the grader.
4. according to the method described in claim 3, it is characterized in that,
Determine that final classification device includes according to the weight of each grader and each grader:
The optimum segmentation planar representation of each grader is weighted by read group total according to the weight of each grader, is obtained most The optimum segmentation planar representation of whole grader.
5. according to the method described in claim 3, it is characterized in that,
Inverse of the weight of the grader for the maximum fractionation spacing of the grader.
6. a kind of device of data classification, which is characterized in that including:
Positive negative sample division module, for sample data to be divided into positive sample data and negative sample data, wherein, negative sample number It is more than threshold value according to the ratio of the quantity with positive sample data;
Negative sample division module, for the similitude between each data point according to the negative sample data by the negative sample Data are divided into multiple secondary classes;
Training data generation module, for the negative sample data of each secondary class and the positive sample data to be incorporated as one group of instruction Practice data, obtain multigroup training data;
Station work module, for being trained to every group of training data using support vector machines, obtain a grader and The degree of closeness for the two class data that the grader divides;
Grader weight determination module, the degree of closeness of two class data for being divided according to each grader determine each The weight of grader, wherein, the weight of the smaller then grader of degree of closeness for the two class data that grader divides is bigger;
Final classification device determining module, for determining final classification device according to the weight of each grader and each grader;
Data categorization module, for being classified using the final classification device to testing data.
7. device according to claim 6, which is characterized in that
The negative sample division module for the ratio according to the negative sample data and the quantity of positive sample data, determines institute State the quantity of the secondary class of negative sample data division, using cluster algorithm according to each data points of the negative sample data it Between similitude the negative sample data are divided into the secondary class of the determining quantity.
8. device according to claim 6, which is characterized in that
The grader is the optimum segmentation planar representation obtained using support vector machines training;
The degree of closeness of two class data that each grader divides is the maximum fractionation spacing of the grader.
9. device according to claim 8, which is characterized in that
The final classification device determining module, for the weight according to each grader by the optimum segmentation plane of each grader Expression is weighted read group total, obtains the optimum segmentation planar representation of final classification device.
10. device according to claim 8, which is characterized in that
Inverse of the weight of the grader for the maximum fractionation spacing of the grader.
CN201611149072.8A 2016-12-14 2016-12-14 Data classification method and device Pending CN108229507A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611149072.8A CN108229507A (en) 2016-12-14 2016-12-14 Data classification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611149072.8A CN108229507A (en) 2016-12-14 2016-12-14 Data classification method and device

Publications (1)

Publication Number Publication Date
CN108229507A true CN108229507A (en) 2018-06-29

Family

ID=62638197

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611149072.8A Pending CN108229507A (en) 2016-12-14 2016-12-14 Data classification method and device

Country Status (1)

Country Link
CN (1) CN108229507A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109272056A (en) * 2018-10-30 2019-01-25 成都信息工程大学 The method of data balancing method and raising data classification performance based on pseudo- negative sample
CN109558543A (en) * 2018-12-11 2019-04-02 拉扎斯网络科技(上海)有限公司 A kind of specimen sample method, specimen sample device, server and storage medium
CN109670971A (en) * 2018-11-30 2019-04-23 平安医疗健康管理股份有限公司 Judgment method, device, equipment and the computer storage medium of abnormal medical expenditure
CN111666872A (en) * 2020-06-04 2020-09-15 电子科技大学 Efficient behavior identification method under data imbalance

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7421417B2 (en) * 2003-08-28 2008-09-02 Wisconsin Alumni Research Foundation Input feature and kernel selection for support vector machine classification
CN101901345A (en) * 2009-05-27 2010-12-01 复旦大学 Classification method of differential proteomics
CN103995821A (en) * 2014-03-14 2014-08-20 盐城工学院 Selective clustering integration method based on spectral clustering algorithm
CN104573708A (en) * 2014-12-19 2015-04-29 天津大学 Ensemble-of-under-sampled extreme learning machine
CN104809226A (en) * 2015-05-07 2015-07-29 武汉大学 Method for early classifying imbalance multi-variable time sequence data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7421417B2 (en) * 2003-08-28 2008-09-02 Wisconsin Alumni Research Foundation Input feature and kernel selection for support vector machine classification
CN101901345A (en) * 2009-05-27 2010-12-01 复旦大学 Classification method of differential proteomics
CN103995821A (en) * 2014-03-14 2014-08-20 盐城工学院 Selective clustering integration method based on spectral clustering algorithm
CN104573708A (en) * 2014-12-19 2015-04-29 天津大学 Ensemble-of-under-sampled extreme learning machine
CN104809226A (en) * 2015-05-07 2015-07-29 武汉大学 Method for early classifying imbalance multi-variable time sequence data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
汪洪桥,蔡艳宁,王仕成,付光远,孙富春: "《模式分析的多核方法及其应用》", 31 March 2014, 国防工业出版社 *
陈瑞雪: "基于不平衡数据的支持向量机分类方法研究", 《中国优秀硕士学位论文全文数据库(电子期刊)》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109272056A (en) * 2018-10-30 2019-01-25 成都信息工程大学 The method of data balancing method and raising data classification performance based on pseudo- negative sample
CN109272056B (en) * 2018-10-30 2021-09-21 成都信息工程大学 Data balancing method based on pseudo negative sample and method for improving data classification performance
CN109670971A (en) * 2018-11-30 2019-04-23 平安医疗健康管理股份有限公司 Judgment method, device, equipment and the computer storage medium of abnormal medical expenditure
CN109558543A (en) * 2018-12-11 2019-04-02 拉扎斯网络科技(上海)有限公司 A kind of specimen sample method, specimen sample device, server and storage medium
CN111666872A (en) * 2020-06-04 2020-09-15 电子科技大学 Efficient behavior identification method under data imbalance
CN111666872B (en) * 2020-06-04 2022-08-05 电子科技大学 Efficient behavior identification method under data imbalance

Similar Documents

Publication Publication Date Title
CN109952614B (en) Biological particle classification system and method
CN103632168B (en) Classifier integration method for machine learning
CN103136504B (en) Face identification method and device
CN108229507A (en) Data classification method and device
CN107194803A (en) A kind of P2P nets borrow the device of borrower's assessing credit risks
CN108363810A (en) A kind of file classification method and device
CN107682109B (en) A kind of interference signal classifying identification method suitable for UAV Communication system
CN106326913A (en) Money laundering account determination method and device
CN111062425B (en) Unbalanced data set processing method based on C-K-SMOTE algorithm
CN110533116A (en) Based on the adaptive set of Euclidean distance at unbalanced data classification method
CN109886284A (en) Fraud detection method and system based on hierarchical clustering
CN111861103A (en) Fresh tea leaf classification method based on multiple features and multiple classifiers
CN108629373A (en) A kind of image classification method, system, equipment and computer readable storage medium
CN112633337A (en) Unbalanced data processing method based on clustering and boundary points
CN103177266A (en) Intelligent stock pest identification system
CN104850868A (en) Customer segmentation method based on k-means and neural network cluster
CN110264454A (en) Cervical cancer tissues pathological image diagnostic method based on more hidden layer condition random fields
CN106570076A (en) Computer text classification system
CN110046593A (en) The complex electric energy quality disturbance recognition methods of S-transformation and random forest is improved based on segmentation
CN109829498A (en) Rough sort method, apparatus, terminal device and storage medium based on clustering
CN104134073B (en) One kind is based on the normalized remote sensing image list class sorting technique of a class
CN109359680A (en) Explosion sillar automatic identification and lumpiness feature extracting method and device
CN108875801A (en) A kind of Classification of Load Curves system based on smart grid
CN105760471B (en) Based on the two class text classification methods for combining convex linear perceptron
CN110516741A (en) Classification based on dynamic classifier selection is overlapped unbalanced data classification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180629