CN105975992A - Unbalanced data classification method based on adaptive upsampling - Google Patents

Unbalanced data classification method based on adaptive upsampling Download PDF

Info

Publication number
CN105975992A
CN105975992A CN201610331709.9A CN201610331709A CN105975992A CN 105975992 A CN105975992 A CN 105975992A CN 201610331709 A CN201610331709 A CN 201610331709A CN 105975992 A CN105975992 A CN 105975992A
Authority
CN
China
Prior art keywords
sample
positive
positive sample
samples
new
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610331709.9A
Other languages
Chinese (zh)
Inventor
吕卫
李喆
褚晶辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201610331709.9A priority Critical patent/CN105975992A/en
Publication of CN105975992A publication Critical patent/CN105975992A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2148Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade

Abstract

The invention relates to an unbalanced data classification method based on adaptive upsampling. The method includes the following steps of calculating the total of positive samples to be newly generated; calculating the probability density distribution for each positive sample by taking the Euclidean distance as the metric; determining the number of the new samples to be generated of the positive sample; generating a new positive sample and adding the newly generated positive sample points to an original unbalanced training set to make the positive and negative samples be same in number, namely, obtaining a new balance training set including n<n> positive samples and n<n> negative samples; and training the newly generated balance training set by means of an Adaboost algorithm and obtaining a final classification model after the iteration for T times. According to the invention, the classification performance of the unbalanced dataset is improved.

Description

A kind of unbalanced dataset sorting technique rising sampling based on self adaptation
Art
The present invention relates to mode identification technology, be specifically related to a kind of grader for unbalanced dataset.
Background technology
Along with data mining, the fast development of pattern recognition and machine learning techniques, data classification image retrieval, Medical treatment detection with diagnose, detect a lie, multiple fields such as text classification and crude oil leakage detection are applied and play a significant role. But, the such as classical taxonomy algorithm such as support vector machine, artificial neural network and linear discriminant analysis all assumes that instruction when design In data set used by white silk, all kinds of comprised sample numbers are roughly the same.But it practice, in above-mentioned several fields, exceptional sample The number of (positive sample) is often far fewer than normal sample (negative sample).Now, for obtaining higher overall accuracy rate, classical taxonomy Device can more concern negative sample classes, classification boundaries can move to positive sample orientation so that a large amount of positive samples are divided into negative class by mistake, Cause positive class sample classification hydraulic performance decline eventually.It is possessed of higher values in decision-making, for carrying in view of in most cases exceptional sample High positive sample classification accuracy rate, the sorting algorithm for unbalanced dataset becomes study hotspot.
In recent years, scientific research personnel proposes the multiple sorting technique for unbalanced dataset.According to effective object not With, these methods are mainly divided into data level method and the big class of algorithm level method two.
Data level method mainly by data are carried out resampling change data be distributed, the number making positive negative sample is basic Identical, realize data balancing with this.Negative sample carried out down-sampled and align sample and carry out a liter sampling and all can reach this purpose. Patent " the protein-nucleotide bindings bit point prediction method based on there being supervision up-sampling study " (CN104077499A) have employed The method rising sampling, by increasing positive sample size to obtain the data set of balance and for Training Support Vector Machines.But due to This kind of method adds in original data set after simply being replicated by positive sample, is equivalent to each positive sample standard deviation and is repeatedly instructed Practice, Expired Drugs easily occurs, ultimately result in classifier performance and decline.Patent is " based on sub-sampling towards unbalanced dataset Traffic event automatic detection method " (CN103927874A) use down-sampled method, concentrate from negative sample and randomly draw part Grader is trained by sample sample positive with entirety composition training set.But owing to having abandoned a large amount of negative sample, the method cannot Ensureing that the negative sample subset that extraction obtains can preferably represent original sample set, therefore training effect is the most not ideal enough.
Algorithm level method mainly changes data distribution solve uneven classification problem by improving sorting algorithm. Adaboost is one of classical algorithm level method.This method is by cascading multiple graders, and is continuously increased wrong point of sample This weight, to improve such sample wrong cost divided again, thus improves the accuracy rate of classification.But, due to traditional Adaboost algorithm itself does not too much pay close attention to positive sample, and therefore effect is the most not ideal enough.
As can be seen from the above analysis, although data level method and algorithm level method can alleviate data nonbalance to dividing The impact that class effect produces, but two kinds of methods all have some limitations.
Summary of the invention
It is an object of the invention to overcome the most methodical deficiency, propose a kind of unbalanced data rising sampling based on self adaptation Collection sorting algorithm, to improve the classification performance of unbalanced dataset.Technical scheme is as follows:
A kind of unbalanced dataset sorting technique rising sampling based on self adaptation, if original unbalanced data concentrates positive sample Number is np, negative sample number is nn, the method comprises the following steps:
(1) according to npAnd nnCalculate the unbalance factor IR of unbalanced dataset, IR calculating needs newly-generated positive sample total Number G;
(2) with Euclidean distance for tolerance, for each positive sample i, search unbalanced data is concentrated with its closest K Individual nearest samples, adds up the ratio shared by negative sample in above-mentioned K nearest samples, is designated as pi, to each positive sample gained The p arrivediValue is added and is normalized, and the value obtained after process being completed is designated as ri, the r of the most each positive sampleiValue sum Be 1, i.e. riFormation probability Density Distribution, claims riProbability for positive sample i;
(3) for each positive sample i, according to the probability r obtained in positive total sample number G-value and step (2)iDetermine this positive sample This required new samples number g generatedi
(4) for each positive sample i, K the nearest samples obtained in step (2) randomly selects giIndividual, respectively Forming sample pair with it, randomly select and a little i.e. obtain newly-generated positive sample on the line of sample pair, new positive sample is raw One-tenth process generates G new positive sample point after completing, newly-generated G positive sample point is joined original Nonblanced training sets In, make positive and negative number of samples identical, i.e. obtain comprising positive sample and each n of negative samplenIndividual new balance training collection;
(5) iterations of note Adaboost algorithm is T, uses Adaboost algorithm to enter newly-generated balance training collection Row training, obtains final disaggregated model after iteration T time.
The present invention is directed to unbalanced dataset, the algorithm that data level method and algorithm level method are combined, and to a liter sampling Algorithm improves and optimizes, and the positive sample point near positive and negative sample boundary mainly carries out a liter sampling, to away from border Positive sample does not processes, to obtain more preferable classifying quality on unbalanced dataset, combine self adaptation rise sampling algorithm with The advantage of Adaboost algorithm, it is ensured that rise the new positive sample generated in sampling and be concentrated mainly near border, simultaneously by combination Grader carries out strengthening study, improves grader overall performance.Comparing through experiment, the present invention is in multiple grader evaluation indexes There is clear superiority.
Accompanying drawing explanation
Fig. 1 is that Adaboost strengthens learning algorithm flow chart.
Fig. 2 is the flow chart of the present invention.
Detailed description of the invention
The present invention is risen Adaboost algorithm shown in sampling algorithm and Fig. 1 by self adaptation and inspires, and the two is combined, is formed One integrated classifier.The present invention is further detailed explanation below in conjunction with the accompanying drawings.
(1) test and training data are obtained: the present invention selects the vehicle class identification database in KEEL data base, altogether bag Containing 846, sample.Positive sample in data base is buggy data, totally 199, i.e. np=199.Negative sample comprise bus, The data of Opel car, Sa Bo automobile totally three kinds of vehicles, totally 647, i.e. nn=647.Data base comprises moment of torsion, turns to half Totally 18 dimensional feature such as footpath, maximum braking distance.Unbalance factor is calculated by (1) formula,
IR=nn/np(1)
Unbalance factor in this experiment can be obtained and should be 3.25.
(2) the positive number of samples needing to generate is calculated by (2) formula,
G=(nn-np)×β(2)
Wherein, β is a constant between 0 to 1.When β=1, after liter sampling, the number of positive negative sample is by complete Exactly the same, data set reaches complete equipilibrium, and the present invention takes β=1.Understand, need the new positive number of samples generated to should be 448.With Align sample according to this value afterwards and carry out self adaptation liter sampling processing, make positive and negative number of samples reach balance.Method particularly includes: for Each positive sample, using Euclidean distance as tolerance, calculates negative sample proportion p in K the sample point that it is nearest respectivelyi:
pi=ki/ K, i=1 ..., np (3)
For ensure accurately to judge each positive sample whether near positive and negative sample boundary, K should take higher value, but along with K value Increase, amount of calculation also will substantially increase.For keeping relatively low computation complexity, above-mentioned two demands are carried out at compromise by the present invention Reason, takes K=5.Subsequently, to all piIt is normalized so that it is be expressed as probability density distribution and calculate each positive sample The new positive number of samples that should generate
g i = p i &Sigma; j = 1 n p p j &times; G - - - ( 4 )
From (4) formula, the sample point that negative sample is more in border, neighbouring sample will be used for generating more just Sample, and the sample point being positive sample away from border, neighbouring sample is not used to generate positive sample.Subsequently, to each Individual positive sample, randomly selects g in its K nearest samples pointiIndividual, by the positive sample that the generation of (5) formula method is new:
newi=xi+λ(xni-xi)(5)
Wherein, newiBeing newly-generated sample point, λ is value random number between 0 to 1, xniFor being selected at random In neighbouring sample point.For each positive sample, this process will carry out giSecondary.After sample generation process completes, by newly-generated Sample point join in original Nonblanced training sets, i.e. can get new balance training collection.This adaptive increasing is sampled Method may insure that newly-generated training set does not exist imbalance problem, and newly-generated sample is predominantly located at positive and negative sample and distinguishes The borderline region that difficulty is bigger.
Being can be seen that by Fig. 1 and Fig. 2, if rising sampling the most at random, all positive sample points being replicated, the most newly-generated Sample point will be completely superposed and be distributed in original positive sample point in whole positive sample space.And self adaptation liter sampling is permissible Generate the positive sample different from former sample point, and newly-generated positive sample standard deviation is near border.
(3) present invention takes five folding cross validations be trained unbalanced dataset and test.Train and all select with test Select the C4.5 decision tree Adaboost sorting algorithm as base grader.Wherein, if the minimum leaf segment of C4.5 decision tree count into 2, confidence level is 0.25, and tree training needs to carry out beta pruning process after completing.All data all complete normalization before entering grader Process, i.e. data minima is 0, and maximum is 1.Positive sample data label is+1, and negative sample data label is-1.
The positive negative sample of balance is gone out training set and test set by five folding cross-validation division, now training set should comprise Positive each 518 of negative sample.Number of samples used by training is 2nn, i.e. 1036.Take the iterations T=10 of Adaboost algorithm, It is trained the most as follows:
1. remember that each sample weights is Dt(i), wherein, the integer value between t desirable 1 to (T-1), represent current iteration wheel Secondary, i represents sample number.The weights initializing each sample are D1(i)=1/ (2nn), i=1 ..., 2nn.
2. it is used for training grader h by the training set after weightingt, after having trained, calculate its training error rate
&epsiv; t = &Sigma; i = 1 m D t &lsqb; y i &NotEqual; h t ( x i ) &rsqb; - - - ( 6 )
Wherein, t=1 ... T, for the iteration wheel number of times being presently in.εtIt is the training error rate of t wheel iteration, Dt(i) The weight of each sample, y in iteration is taken turns for thisiFor sample xiAffiliated class label, value is 1 or-1.h(xi) it is sample xi Tag along sort after trained.
3. set the grader obtained after t wheel iteration completes weight in final vote as αt, according to often taking turns in iteration Training error rate calculate this and take turns weight of grader that repetitive exercise generates and be
&alpha; t = 1 2 l n 1 - &epsiv; t &epsiv; t - - - ( 7 )
Meanwhile, in next round iteration, the weight of each sample is updated to
D t + 1 ( i ) = D t ( i ) exp &lsqb; - &alpha; t y i h t ( x i ) &rsqb; Z t - - - ( 8 )
Wherein, ZtFor the weights sum of sample each in current iteration round, it is used for each sample weights is normalized place Reason.
4. perform 2,3 steps T time altogether, complete whole iteration and right value update process, thus complete classifier training.For Test sample to be sorted, its classification results should be
s i g n ( H ( x ) = &Sigma; t = 1 T &alpha; t h t ( x ) ) - - - ( 9 )
From (7) formula, the weight of each sub-classifier is determined by its classification error rate.The grader that error rate is lower will Higher weight is obtained in the voting process of (9) formula.Additionally, for single sample, by formula (8) if it will be seen that sample Original tag is different from classification results, then the value of exponential depth will be greater than 0, and the result of natural logrithm will be less than 1 so that this sample exists Weight in lower whorl iteration increases.Otherwise, sample weights in lower whorl iteration will reduce.
Test set sample is inputted in the grader of training, the final classification results of test sample, as shown in Figure 2.
Table 1 gives and directly uses C4.5 decision tree to classify unbalanced dataset, aligns sample and rise at random Use C4.5 to carry out classifying after sampling and method used in the present invention carries out the test result respectively obtained of classifying.We use Classifier performance is evaluated by following index:
Table 1 sorting algorithm result with compare (result black matrix best under same index marks)
By table 1 data it can be seen that although direct use C4.5 decision tree carries out classifying can obtain the highest specificity Index, but sensitivity is minimum, it was demonstrated that and now classification performance is created and significantly affects by data nonbalance phenomenon.The border of positive sample Region is invaded bites, and a large amount of positive samples are divided into negative sample by mistake.After simple random liter sampling, this problem has been alleviated, But sensitivity is the biggest with specific gap;And the present invention has obtained good sensitivity and specific index simultaneously, two The geometrical mean of person is the highest in the several method participating in contrast, it was demonstrated that sensitivity and specificity are had most preferably by the present invention Compromise.
In sum, the present invention can obtain good classifying quality on unbalanced dataset, effectively eliminates data not The negative influence that classification is brought by equilibrium problem.

Claims (1)

1. rise a unbalanced dataset sorting technique for sampling based on self adaptation, if original unbalanced data concentrates positive sample number Mesh is np, negative sample number is nn, the method comprises the following steps:
(1) according to npAnd nnCalculate the unbalance factor IR of unbalanced dataset, IR calculating needs newly-generated positive total sample number G;
(2) with Euclidean distance for tolerance, for each positive sample i, search unbalanced data is concentrated with its closest K Neighbour's sample, adds up the ratio shared by negative sample in above-mentioned K nearest samples, is designated as pi, to the p obtained by each positive samplei Value is added and is normalized, and the value obtained after process being completed is designated as ri, the r of the most each positive sampleiValue sum is 1, i.e. riFormation probability Density Distribution, claims riProbability for positive sample i;
(3) for each positive sample i, according to the probability r obtained in positive total sample number G-value and step (2)iDetermine this positive sample institute New samples number g that need to generatei
(4) for each positive sample i, K the nearest samples obtained in step (2) randomly selects giIndividual, respectively with its group Becoming sample pair, randomly select and a little i.e. obtain newly-generated positive sample on the line of sample pair, new positive sample generates process Generate G new positive sample point after completing, newly-generated G positive sample point is joined in original Nonblanced training sets, makes Positive and negative number of samples is identical, i.e. obtains comprising positive sample and each n of negative samplenIndividual new balance training collection;
(5) iterations of note Adaboost algorithm is T, uses Adaboost algorithm to instruct newly-generated balance training collection Practice, after iteration T time, obtain final disaggregated model.
CN201610331709.9A 2016-05-18 2016-05-18 Unbalanced data classification method based on adaptive upsampling Pending CN105975992A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610331709.9A CN105975992A (en) 2016-05-18 2016-05-18 Unbalanced data classification method based on adaptive upsampling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610331709.9A CN105975992A (en) 2016-05-18 2016-05-18 Unbalanced data classification method based on adaptive upsampling

Publications (1)

Publication Number Publication Date
CN105975992A true CN105975992A (en) 2016-09-28

Family

ID=56955297

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610331709.9A Pending CN105975992A (en) 2016-05-18 2016-05-18 Unbalanced data classification method based on adaptive upsampling

Country Status (1)

Country Link
CN (1) CN105975992A (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107273916A (en) * 2017-05-22 2017-10-20 上海大学 The unknown Information Hiding & Detecting method of steganographic algorithm
CN108133223A (en) * 2016-12-01 2018-06-08 富士通株式会社 The device and method for determining convolutional neural networks CNN models
CN108334455A (en) * 2018-03-05 2018-07-27 清华大学 The Software Defects Predict Methods and system of cost-sensitive hypergraph study based on search
CN108629413A (en) * 2017-03-15 2018-10-09 阿里巴巴集团控股有限公司 Neural network model training, trading activity Risk Identification Method and device
CN108733633A (en) * 2018-05-18 2018-11-02 北京科技大学 A kind of the unbalanced data homing method and device of sample distribution adjustment
CN108776711A (en) * 2018-03-07 2018-11-09 中国电力科学研究院有限公司 A kind of electrical power system transient sample data extracting method and system
CN109086412A (en) * 2018-08-03 2018-12-25 北京邮电大学 A kind of unbalanced data classification method based on adaptive weighted Bagging-GBDT
CN109327464A (en) * 2018-11-15 2019-02-12 中国人民解放军战略支援部队信息工程大学 Class imbalance processing method and processing device in a kind of network invasion monitoring
CN109614967A (en) * 2018-10-10 2019-04-12 浙江大学 A kind of detection method of license plate based on negative sample data value resampling
CN109740750A (en) * 2018-12-17 2019-05-10 北京深极智能科技有限公司 Method of data capture and device
CN109756494A (en) * 2018-12-29 2019-05-14 中国银联股份有限公司 A kind of negative sample transform method and device
CN109862392A (en) * 2019-03-20 2019-06-07 济南大学 Recognition methods, system, equipment and the medium of internet gaming video flow
CN110163226A (en) * 2018-02-12 2019-08-23 北京京东尚科信息技术有限公司 Equilibrating data set generation method and apparatus and classification method and device
CN110998648A (en) * 2018-08-09 2020-04-10 北京嘀嘀无限科技发展有限公司 System and method for distributing orders
CN111062806A (en) * 2019-12-13 2020-04-24 合肥工业大学 Personal finance credit risk evaluation method, system and storage medium
WO2020082734A1 (en) * 2018-10-24 2020-04-30 平安科技(深圳)有限公司 Text emotion recognition method and apparatus, electronic device, and computer non-volatile readable storage medium
CN111598189A (en) * 2020-07-20 2020-08-28 北京瑞莱智慧科技有限公司 Generative model training method, data generation method, device, medium, and apparatus
CN111652268A (en) * 2020-04-22 2020-09-11 浙江盈狐云数据科技有限公司 Unbalanced stream data classification method based on resampling mechanism

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103927874A (en) * 2014-04-29 2014-07-16 东南大学 Automatic incident detection method based on under-sampling and used for unbalanced data set
CN104573708A (en) * 2014-12-19 2015-04-29 天津大学 Ensemble-of-under-sampled extreme learning machine
CN104951809A (en) * 2015-07-14 2015-09-30 西安电子科技大学 Unbalanced data classification method based on unbalanced classification indexes and integrated learning
CN105373806A (en) * 2015-10-19 2016-03-02 河海大学 Outlier detection method based on uncertain data set

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103927874A (en) * 2014-04-29 2014-07-16 东南大学 Automatic incident detection method based on under-sampling and used for unbalanced data set
CN104573708A (en) * 2014-12-19 2015-04-29 天津大学 Ensemble-of-under-sampled extreme learning machine
CN104951809A (en) * 2015-07-14 2015-09-30 西安电子科技大学 Unbalanced data classification method based on unbalanced classification indexes and integrated learning
CN105373806A (en) * 2015-10-19 2016-03-02 河海大学 Outlier detection method based on uncertain data set

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HAIBO HE 等: "ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning", 《2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS》 *
刘余霞 等: "一种新的过采样算法DB_SMOTE", 《计算机工程与应用》 *
陶新民 等: "不均衡数据分类算法的综述", 《重庆邮电大学学报(自然科学版)》 *

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108133223A (en) * 2016-12-01 2018-06-08 富士通株式会社 The device and method for determining convolutional neural networks CNN models
CN108133223B (en) * 2016-12-01 2020-06-26 富士通株式会社 Device and method for determining convolutional neural network CNN model
CN108629413A (en) * 2017-03-15 2018-10-09 阿里巴巴集团控股有限公司 Neural network model training, trading activity Risk Identification Method and device
CN108629413B (en) * 2017-03-15 2020-06-16 创新先进技术有限公司 Neural network model training and transaction behavior risk identification method and device
CN107273916A (en) * 2017-05-22 2017-10-20 上海大学 The unknown Information Hiding & Detecting method of steganographic algorithm
CN107273916B (en) * 2017-05-22 2020-10-16 上海大学 Information hiding detection method for unknown steganography algorithm
CN110163226A (en) * 2018-02-12 2019-08-23 北京京东尚科信息技术有限公司 Equilibrating data set generation method and apparatus and classification method and device
CN108334455A (en) * 2018-03-05 2018-07-27 清华大学 The Software Defects Predict Methods and system of cost-sensitive hypergraph study based on search
CN108334455B (en) * 2018-03-05 2020-06-26 清华大学 Software defect prediction method and system based on search cost-sensitive hypergraph learning
CN108776711A (en) * 2018-03-07 2018-11-09 中国电力科学研究院有限公司 A kind of electrical power system transient sample data extracting method and system
CN108733633A (en) * 2018-05-18 2018-11-02 北京科技大学 A kind of the unbalanced data homing method and device of sample distribution adjustment
CN109086412A (en) * 2018-08-03 2018-12-25 北京邮电大学 A kind of unbalanced data classification method based on adaptive weighted Bagging-GBDT
CN110998648A (en) * 2018-08-09 2020-04-10 北京嘀嘀无限科技发展有限公司 System and method for distributing orders
CN109614967A (en) * 2018-10-10 2019-04-12 浙江大学 A kind of detection method of license plate based on negative sample data value resampling
CN109614967B (en) * 2018-10-10 2020-07-17 浙江大学 License plate detection method based on negative sample data value resampling
WO2020082734A1 (en) * 2018-10-24 2020-04-30 平安科技(深圳)有限公司 Text emotion recognition method and apparatus, electronic device, and computer non-volatile readable storage medium
CN109327464A (en) * 2018-11-15 2019-02-12 中国人民解放军战略支援部队信息工程大学 Class imbalance processing method and processing device in a kind of network invasion monitoring
CN109740750A (en) * 2018-12-17 2019-05-10 北京深极智能科技有限公司 Method of data capture and device
CN109756494A (en) * 2018-12-29 2019-05-14 中国银联股份有限公司 A kind of negative sample transform method and device
CN109756494B (en) * 2018-12-29 2021-04-16 中国银联股份有限公司 Negative sample transformation method and device
CN109862392B (en) * 2019-03-20 2021-04-13 济南大学 Method, system, device and medium for identifying video traffic of internet game
CN109862392A (en) * 2019-03-20 2019-06-07 济南大学 Recognition methods, system, equipment and the medium of internet gaming video flow
CN111062806A (en) * 2019-12-13 2020-04-24 合肥工业大学 Personal finance credit risk evaluation method, system and storage medium
CN111062806B (en) * 2019-12-13 2022-05-10 合肥工业大学 Personal finance credit risk evaluation method, system and storage medium
CN111652268A (en) * 2020-04-22 2020-09-11 浙江盈狐云数据科技有限公司 Unbalanced stream data classification method based on resampling mechanism
CN111598189A (en) * 2020-07-20 2020-08-28 北京瑞莱智慧科技有限公司 Generative model training method, data generation method, device, medium, and apparatus
CN111598189B (en) * 2020-07-20 2020-10-30 北京瑞莱智慧科技有限公司 Generative model training method, data generation method, device, medium, and apparatus

Similar Documents

Publication Publication Date Title
CN105975992A (en) Unbalanced data classification method based on adaptive upsampling
CN107563435A (en) Higher-dimension unbalanced data sorting technique based on SVM
CN103632168B (en) Classifier integration method for machine learning
CN103728551B (en) A kind of analog-circuit fault diagnosis method based on cascade integrated classifier
CN105844287B (en) A kind of the domain adaptive approach and system of classification of remote-sensing images
CN101944174B (en) Identification method of characters of licence plate
CN108764366A (en) Feature selecting and cluster for lack of balance data integrate two sorting techniques
CN104598885B (en) The detection of word label and localization method in street view image
CN114241273B (en) Multi-modal image processing method and system based on Transformer network and hypersphere space learning
CN108985327B (en) Terrain matching area self-organization optimization classification method based on factor analysis
CN106202952A (en) A kind of Parkinson disease diagnostic method based on machine learning
CN104834940A (en) Medical image inspection disease classification method based on support vector machine (SVM)
CN104881671B (en) A kind of high score remote sensing image Local Feature Extraction based on 2D Gabor
CN109214460A (en) Method for diagnosing fault of power transformer based on Relative Transformation Yu nuclear entropy constituent analysis
CN106845717A (en) A kind of energy efficiency evaluation method based on multi-model convergence strategy
CN103020122A (en) Transfer learning method based on semi-supervised clustering
CN105426919A (en) Significant guidance and unsupervised feature learning based image classification method
CN108460421A (en) The sorting technique of unbalanced data
CN106682606A (en) Face recognizing method and safety verification apparatus
CN102156871A (en) Image classification method based on category correlated codebook and classifier voting strategy
CN106845387A (en) Pedestrian detection method based on self study
CN110059716A (en) A kind of building of CNN-LSTM-SVM network model and MOOC discontinue one&#39;s studies prediction technique
CN103886030B (en) Cost-sensitive decision-making tree based physical information fusion system data classification method
CN110363230A (en) Stacking integrated sewage handling failure diagnostic method based on weighting base classifier
CN110009030A (en) Sewage treatment method for diagnosing faults based on stacking meta learning strategy

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160928