CN105975992A  Unbalanced data classification method based on adaptive upsampling  Google Patents
Unbalanced data classification method based on adaptive upsampling Download PDFInfo
 Publication number
 CN105975992A CN105975992A CN201610331709.9A CN201610331709A CN105975992A CN 105975992 A CN105975992 A CN 105975992A CN 201610331709 A CN201610331709 A CN 201610331709A CN 105975992 A CN105975992 A CN 105975992A
 Authority
 CN
 China
 Prior art keywords
 sample
 positive
 positive sample
 samples
 negative
 Prior art date
 Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
 Pending
Links
 230000003044 adaptive Effects 0.000 title abstract description 3
 238000004422 calculation algorithm Methods 0.000 claims abstract description 25
 238000005070 sampling Methods 0.000 claims description 22
 238000000034 methods Methods 0.000 claims description 16
 230000004301 light adaptation Effects 0.000 claims description 8
 239000012141 concentrates Substances 0.000 claims description 3
 230000015572 biosynthetic process Effects 0.000 claims description 2
 238000005755 formation reactions Methods 0.000 claims description 2
 280000946161 Its Group companies 0.000 claims 1
 230000000630 rising Effects 0.000 description 5
 238000003066 decision tree Methods 0.000 description 4
 230000035945 sensitivity Effects 0.000 description 4
 238000001514 detection method Methods 0.000 description 3
 230000000694 effects Effects 0.000 description 3
 238000004458 analytical method Methods 0.000 description 2
 238000005516 engineering process Methods 0.000 description 2
 239000000203 mixture Substances 0.000 description 2
 208000008425 Protein Deficiency Diseases 0.000 description 1
 230000027455 binding Effects 0.000 description 1
 238000004364 calculation method Methods 0.000 description 1
 238000002790 crossvalidation Methods 0.000 description 1
 239000010779 crude oil Substances 0.000 description 1
 238000007418 data mining Methods 0.000 description 1
 239000003814 drug Substances 0.000 description 1
 229940079593 drugs Drugs 0.000 description 1
 238000011156 evaluation Methods 0.000 description 1
 238000000605 extraction Methods 0.000 description 1
 238000010801 machine learning Methods 0.000 description 1
 239000011159 matrix material Substances 0.000 description 1
 230000001537 neural Effects 0.000 description 1
 238000010606 normalization Methods 0.000 description 1
 239000002773 nucleotide Substances 0.000 description 1
 238000003909 pattern recognition Methods 0.000 description 1
 230000003252 repetitive Effects 0.000 description 1
 238000005728 strengthening Methods 0.000 description 1
Classifications

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
 G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
 G06K9/62—Methods or arrangements for recognition using electronic means
 G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
 G06K9/6256—Obtaining sets of training patterns; Bootstrap methods, e.g. bagging, boosting
 G06K9/6257—Obtaining sets of training patterns; Bootstrap methods, e.g. bagging, boosting characterised by the organisation or the structure of the process, e.g. boosting cascade

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
 G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
 G06K9/62—Methods or arrangements for recognition using electronic means
 G06K9/6267—Classification techniques
Abstract
The invention relates to an unbalanced data classification method based on adaptive upsampling. The method includes the following steps of calculating the total of positive samples to be newly generated; calculating the probability density distribution for each positive sample by taking the Euclidean distance as the metric; determining the number of the new samples to be generated of the positive sample; generating a new positive sample and adding the newly generated positive sample points to an original unbalanced training set to make the positive and negative samples be same in number, namely, obtaining a new balance training set including n<n> positive samples and n<n> negative samples; and training the newly generated balance training set by means of an Adaboost algorithm and obtaining a final classification model after the iteration for T times. According to the invention, the classification performance of the unbalanced dataset is improved.
Description
Art
The present invention relates to mode identification technology, be specifically related to a kind of grader for unbalanced dataset.
Background technology
Along with data mining, the fast development of pattern recognition and machine learning techniques, data classification image retrieval,
Medical treatment detection with diagnose, detect a lie, multiple fields such as text classification and crude oil leakage detection are applied and play a significant role.
But, the such as classical taxonomy algorithm such as support vector machine, artificial neural network and linear discriminant analysis all assumes that instruction when design
In data set used by white silk, all kinds of comprised sample numbers are roughly the same.But it practice, in abovementioned several fields, exceptional sample
The number of (positive sample) is often far fewer than normal sample (negative sample).Now, for obtaining higher overall accuracy rate, classical taxonomy
Device can more concern negative sample classes, classification boundaries can move to positive sample orientation so that a large amount of positive samples are divided into negative class by mistake,
Cause positive class sample classification hydraulic performance decline eventually.It is possessed of higher values in decisionmaking, for carrying in view of in most cases exceptional sample
High positive sample classification accuracy rate, the sorting algorithm for unbalanced dataset becomes study hotspot.
In recent years, scientific research personnel proposes the multiple sorting technique for unbalanced dataset.According to effective object not
With, these methods are mainly divided into data level method and the big class of algorithm level method two.
Data level method mainly by data are carried out resampling change data be distributed, the number making positive negative sample is basic
Identical, realize data balancing with this.Negative sample carried out downsampled and align sample and carry out a liter sampling and all can reach this purpose.
Patent " the proteinnucleotide bindings bit point prediction method based on there being supervision upsampling study " (CN104077499A) have employed
The method rising sampling, by increasing positive sample size to obtain the data set of balance and for Training Support Vector Machines.But due to
This kind of method adds in original data set after simply being replicated by positive sample, is equivalent to each positive sample standard deviation and is repeatedly instructed
Practice, Expired Drugs easily occurs, ultimately result in classifier performance and decline.Patent is " based on subsampling towards unbalanced dataset
Traffic event automatic detection method " (CN103927874A) use downsampled method, concentrate from negative sample and randomly draw part
Grader is trained by sample sample positive with entirety composition training set.But owing to having abandoned a large amount of negative sample, the method cannot
Ensureing that the negative sample subset that extraction obtains can preferably represent original sample set, therefore training effect is the most not ideal enough.
Algorithm level method mainly changes data distribution solve uneven classification problem by improving sorting algorithm.
Adaboost is one of classical algorithm level method.This method is by cascading multiple graders, and is continuously increased wrong point of sample
This weight, to improve such sample wrong cost divided again, thus improves the accuracy rate of classification.But, due to traditional
Adaboost algorithm itself does not too much pay close attention to positive sample, and therefore effect is the most not ideal enough.
As can be seen from the above analysis, although data level method and algorithm level method can alleviate data nonbalance to dividing
The impact that class effect produces, but two kinds of methods all have some limitations.
Summary of the invention
It is an object of the invention to overcome the most methodical deficiency, propose a kind of unbalanced data rising sampling based on self adaptation
Collection sorting algorithm, to improve the classification performance of unbalanced dataset.Technical scheme is as follows:
A kind of unbalanced dataset sorting technique rising sampling based on self adaptation, if original unbalanced data concentrates positive sample
Number is n_{p}, negative sample number is n_{n}, the method comprises the following steps:
(1) according to n_{p}And n_{n}Calculate the unbalance factor IR of unbalanced dataset, IR calculating needs newlygenerated positive sample total
Number G；
(2) with Euclidean distance for tolerance, for each positive sample i, search unbalanced data is concentrated with its closest K
Individual nearest samples, adds up the ratio shared by negative sample in abovementioned K nearest samples, is designated as p_{i}, to each positive sample gained
The p arrived_{i}Value is added and is normalized, and the value obtained after process being completed is designated as r_{i}, the r of the most each positive sample_{i}Value sum
Be 1, i.e. r_{i}Formation probability Density Distribution, claims r_{i}Probability for positive sample i；
(3) for each positive sample i, according to the probability r obtained in positive total sample number Gvalue and step (2)_{i}Determine this positive sample
This required new samples number g generated_{i}；
(4) for each positive sample i, K the nearest samples obtained in step (2) randomly selects g_{i}Individual, respectively
Forming sample pair with it, randomly select and a little i.e. obtain newlygenerated positive sample on the line of sample pair, new positive sample is raw
Onetenth process generates G new positive sample point after completing, newlygenerated G positive sample point is joined original Nonblanced training sets
In, make positive and negative number of samples identical, i.e. obtain comprising positive sample and each n of negative sample_{n}Individual new balance training collection；
(5) iterations of note Adaboost algorithm is T, uses Adaboost algorithm to enter newlygenerated balance training collection
Row training, obtains final disaggregated model after iteration T time.
The present invention is directed to unbalanced dataset, the algorithm that data level method and algorithm level method are combined, and to a liter sampling
Algorithm improves and optimizes, and the positive sample point near positive and negative sample boundary mainly carries out a liter sampling, to away from border
Positive sample does not processes, to obtain more preferable classifying quality on unbalanced dataset, combine self adaptation rise sampling algorithm with
The advantage of Adaboost algorithm, it is ensured that rise the new positive sample generated in sampling and be concentrated mainly near border, simultaneously by combination
Grader carries out strengthening study, improves grader overall performance.Comparing through experiment, the present invention is in multiple grader evaluation indexes
There is clear superiority.
Accompanying drawing explanation
Fig. 1 is that Adaboost strengthens learning algorithm flow chart.
Fig. 2 is the flow chart of the present invention.
Detailed description of the invention
The present invention is risen Adaboost algorithm shown in sampling algorithm and Fig. 1 by self adaptation and inspires, and the two is combined, is formed
One integrated classifier.The present invention is further detailed explanation below in conjunction with the accompanying drawings.
(1) test and training data are obtained: the present invention selects the vehicle class identification database in KEEL data base, altogether bag
Containing 846, sample.Positive sample in data base is buggy data, totally 199, i.e. n_{p}=199.Negative sample comprise bus,
The data of Opel car, Sa Bo automobile totally three kinds of vehicles, totally 647, i.e. n_{n}=647.Data base comprises moment of torsion, turns to half
Totally 18 dimensional feature such as footpath, maximum braking distance.Unbalance factor is calculated by (1) formula,
IR=n_{n}/n_{p}(1)
Unbalance factor in this experiment can be obtained and should be 3.25.
(2) the positive number of samples needing to generate is calculated by (2) formula,
G=(n_{n}n_{p})×β(2)
Wherein, β is a constant between 0 to 1.When β=1, after liter sampling, the number of positive negative sample is by complete
Exactly the same, data set reaches complete equipilibrium, and the present invention takes β=1.Understand, need the new positive number of samples generated to should be 448.With
Align sample according to this value afterwards and carry out self adaptation liter sampling processing, make positive and negative number of samples reach balance.Method particularly includes: for
Each positive sample, using Euclidean distance as tolerance, calculates negative sample proportion p in K the sample point that it is nearest respectively_{i}:
p_{i}=k_{i}/ K, i=1 ..., n_{p} (3)
For ensure accurately to judge each positive sample whether near positive and negative sample boundary, K should take higher value, but along with K value
Increase, amount of calculation also will substantially increase.For keeping relatively low computation complexity, abovementioned two demands are carried out at compromise by the present invention
Reason, takes K=5.Subsequently, to all p_{i}It is normalized so that it is be expressed as probability density distribution and calculate each positive sample
The new positive number of samples that should generate
From (4) formula, the sample point that negative sample is more in border, neighbouring sample will be used for generating more just
Sample, and the sample point being positive sample away from border, neighbouring sample is not used to generate positive sample.Subsequently, to each
Individual positive sample, randomly selects g in its K nearest samples point_{i}Individual, by the positive sample that the generation of (5) formula method is new:
new_{i}=x_{i}+λ(x_{ni}x_{i})(5)
Wherein, new_{i}Being newlygenerated sample point, λ is value random number between 0 to 1, x_{ni}For being selected at random
In neighbouring sample point.For each positive sample, this process will carry out g_{i}Secondary.After sample generation process completes, by newlygenerated
Sample point join in original Nonblanced training sets, i.e. can get new balance training collection.This adaptive increasing is sampled
Method may insure that newlygenerated training set does not exist imbalance problem, and newlygenerated sample is predominantly located at positive and negative sample and distinguishes
The borderline region that difficulty is bigger.
Being can be seen that by Fig. 1 and Fig. 2, if rising sampling the most at random, all positive sample points being replicated, the most newlygenerated
Sample point will be completely superposed and be distributed in original positive sample point in whole positive sample space.And self adaptation liter sampling is permissible
Generate the positive sample different from former sample point, and newlygenerated positive sample standard deviation is near border.
(3) present invention takes five folding cross validations be trained unbalanced dataset and test.Train and all select with test
Select the C4.5 decision tree Adaboost sorting algorithm as base grader.Wherein, if the minimum leaf segment of C4.5 decision tree count into
2, confidence level is 0.25, and tree training needs to carry out beta pruning process after completing.All data all complete normalization before entering grader
Process, i.e. data minima is 0, and maximum is 1.Positive sample data label is+1, and negative sample data label is1.
The positive negative sample of balance is gone out training set and test set by five folding crossvalidation division, now training set should comprise
Positive each 518 of negative sample.Number of samples used by training is 2n_{n}, i.e. 1036.Take the iterations T=10 of Adaboost algorithm,
It is trained the most as follows:
1. remember that each sample weights is D_{t}(i), wherein, the integer value between t desirable 1 to (T1), represent current iteration wheel
Secondary, i represents sample number.The weights initializing each sample are D_{1}(i)=1/ (2n_{n}), i=1 ..., 2n_{n}.
2. it is used for training grader h by the training set after weighting_{t}, after having trained, calculate its training error rate
Wherein, t=1 ... T, for the iteration wheel number of times being presently in.ε_{t}It is the training error rate of t wheel iteration, D_{t}(i)
The weight of each sample, y in iteration is taken turns for this_{i}For sample x_{i}Affiliated class label, value is 1 or1.h(x_{i}) it is sample x_{i}
Tag along sort after trained.
3. set the grader obtained after t wheel iteration completes weight in final vote as α_{t}, according to often taking turns in iteration
Training error rate calculate this and take turns weight of grader that repetitive exercise generates and be
Meanwhile, in next round iteration, the weight of each sample is updated to
Wherein, Z_{t}For the weights sum of sample each in current iteration round, it is used for each sample weights is normalized place
Reason.
4. perform 2,3 steps T time altogether, complete whole iteration and right value update process, thus complete classifier training.For
Test sample to be sorted, its classification results should be
From (7) formula, the weight of each subclassifier is determined by its classification error rate.The grader that error rate is lower will
Higher weight is obtained in the voting process of (9) formula.Additionally, for single sample, by formula (8) if it will be seen that sample
Original tag is different from classification results, then the value of exponential depth will be greater than 0, and the result of natural logrithm will be less than 1 so that this sample exists
Weight in lower whorl iteration increases.Otherwise, sample weights in lower whorl iteration will reduce.
Test set sample is inputted in the grader of training, the final classification results of test sample, as shown in Figure 2.
Table 1 gives and directly uses C4.5 decision tree to classify unbalanced dataset, aligns sample and rise at random
Use C4.5 to carry out classifying after sampling and method used in the present invention carries out the test result respectively obtained of classifying.We use
Classifier performance is evaluated by following index:
Table 1 sorting algorithm result with compare (result black matrix best under same index marks)
By table 1 data it can be seen that although direct use C4.5 decision tree carries out classifying can obtain the highest specificity
Index, but sensitivity is minimum, it was demonstrated that and now classification performance is created and significantly affects by data nonbalance phenomenon.The border of positive sample
Region is invaded bites, and a large amount of positive samples are divided into negative sample by mistake.After simple random liter sampling, this problem has been alleviated,
But sensitivity is the biggest with specific gap；And the present invention has obtained good sensitivity and specific index simultaneously, two
The geometrical mean of person is the highest in the several method participating in contrast, it was demonstrated that sensitivity and specificity are had most preferably by the present invention
Compromise.
In sum, the present invention can obtain good classifying quality on unbalanced dataset, effectively eliminates data not
The negative influence that classification is brought by equilibrium problem.
Claims (1)
1. rise a unbalanced dataset sorting technique for sampling based on self adaptation, if original unbalanced data concentrates positive sample number
Mesh is n_{p}, negative sample number is n_{n}, the method comprises the following steps:
(1) according to n_{p}And n_{n}Calculate the unbalance factor IR of unbalanced dataset, IR calculating needs newlygenerated positive total sample number G；
(2) with Euclidean distance for tolerance, for each positive sample i, search unbalanced data is concentrated with its closest K
Neighbour's sample, adds up the ratio shared by negative sample in abovementioned K nearest samples, is designated as p_{i}, to the p obtained by each positive sample_{i}
Value is added and is normalized, and the value obtained after process being completed is designated as r_{i}, the r of the most each positive sample_{i}Value sum is 1, i.e.
r_{i}Formation probability Density Distribution, claims r_{i}Probability for positive sample i；
(3) for each positive sample i, according to the probability r obtained in positive total sample number Gvalue and step (2)_{i}Determine this positive sample institute
New samples number g that need to generate_{i}；
(4) for each positive sample i, K the nearest samples obtained in step (2) randomly selects g_{i}Individual, respectively with its group
Becoming sample pair, randomly select and a little i.e. obtain newlygenerated positive sample on the line of sample pair, new positive sample generates process
Generate G new positive sample point after completing, newlygenerated G positive sample point is joined in original Nonblanced training sets, makes
Positive and negative number of samples is identical, i.e. obtains comprising positive sample and each n of negative sample_{n}Individual new balance training collection；
(5) iterations of note Adaboost algorithm is T, uses Adaboost algorithm to instruct newlygenerated balance training collection
Practice, after iteration T time, obtain final disaggregated model.
Priority Applications (1)
Application Number  Priority Date  Filing Date  Title 

CN201610331709.9A CN105975992A (en)  20160518  20160518  Unbalanced data classification method based on adaptive upsampling 
Applications Claiming Priority (1)
Application Number  Priority Date  Filing Date  Title 

CN201610331709.9A CN105975992A (en)  20160518  20160518  Unbalanced data classification method based on adaptive upsampling 
Publications (1)
Publication Number  Publication Date 

CN105975992A true CN105975992A (en)  20160928 
Family
ID=56955297
Family Applications (1)
Application Number  Title  Priority Date  Filing Date 

CN201610331709.9A Pending CN105975992A (en)  20160518  20160518  Unbalanced data classification method based on adaptive upsampling 
Country Status (1)
Country  Link 

CN (1)  CN105975992A (en) 
Cited By (14)
Publication number  Priority date  Publication date  Assignee  Title 

CN107273916A (en) *  20170522  20171020  上海大学  The unknown Information Hiding ＆ Detecting method of steganographic algorithm 
CN108133223A (en) *  20161201  20180608  富士通株式会社  The device and method for determining convolutional neural networks CNN models 
CN108334455A (en) *  20180305  20180727  清华大学  The Software Defects Predict Methods and system of costsensitive hypergraph study based on search 
CN108629413A (en) *  20170315  20181009  阿里巴巴集团控股有限公司  Neural network model training, trading activity Risk Identification Method and device 
CN108733633A (en) *  20180518  20181102  北京科技大学  A kind of the unbalanced data homing method and device of sample distribution adjustment 
CN109086412A (en) *  20180803  20181225  北京邮电大学  A kind of unbalanced data classification method based on adaptive weighted BaggingGBDT 
CN109327464A (en) *  20181115  20190212  中国人民解放军战略支援部队信息工程大学  Class imbalance processing method and processing device in a kind of network invasion monitoring 
CN109614967A (en) *  20181010  20190412  浙江大学  A kind of detection method of license plate based on negative sample data value resampling 
CN109740750A (en) *  20181217  20190510  北京深极智能科技有限公司  Method of data capture and device 
CN109756494A (en) *  20181229  20190514  中国银联股份有限公司  A kind of negative sample transform method and device 
CN109862392A (en) *  20190320  20190607  济南大学  Recognition methods, system, equipment and the medium of internet gaming video flow 
CN111062806A (en) *  20191213  20200424  合肥工业大学  Personal finance credit risk evaluation method, system and storage medium 
WO2020082734A1 (en) *  20181024  20200430  平安科技（深圳）有限公司  Text emotion recognition method and apparatus, electronic device, and computer nonvolatile readable storage medium 
CN111598189A (en) *  20200720  20200828  北京瑞莱智慧科技有限公司  Generative model training method, data generation method, device, medium, and apparatus 
Citations (4)
Publication number  Priority date  Publication date  Assignee  Title 

CN103927874A (en) *  20140429  20140716  东南大学  Automatic incident detection method based on undersampling and used for unbalanced data set 
CN104573708A (en) *  20141219  20150429  天津大学  Ensembleofundersampled extreme learning machine 
CN104951809A (en) *  20150714  20150930  西安电子科技大学  Unbalanced data classification method based on unbalanced classification indexes and integrated learning 
CN105373806A (en) *  20151019  20160302  河海大学  Outlier detection method based on uncertain data set 

2016
 20160518 CN CN201610331709.9A patent/CN105975992A/en active Pending
Patent Citations (4)
Publication number  Priority date  Publication date  Assignee  Title 

CN103927874A (en) *  20140429  20140716  东南大学  Automatic incident detection method based on undersampling and used for unbalanced data set 
CN104573708A (en) *  20141219  20150429  天津大学  Ensembleofundersampled extreme learning machine 
CN104951809A (en) *  20150714  20150930  西安电子科技大学  Unbalanced data classification method based on unbalanced classification indexes and integrated learning 
CN105373806A (en) *  20151019  20160302  河海大学  Outlier detection method based on uncertain data set 
NonPatent Citations (3)
Title 

HAIBO HE 等: "ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning", 《2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS》 * 
刘余霞 等: "一种新的过采样算法DB_SMOTE", 《计算机工程与应用》 * 
陶新民 等: "不均衡数据分类算法的综述", 《重庆邮电大学学报（自然科学版）》 * 
Cited By (22)
Publication number  Priority date  Publication date  Assignee  Title 

CN108133223A (en) *  20161201  20180608  富士通株式会社  The device and method for determining convolutional neural networks CNN models 
CN108133223B (en) *  20161201  20200626  富士通株式会社  Device and method for determining convolutional neural network CNN model 
CN108629413A (en) *  20170315  20181009  阿里巴巴集团控股有限公司  Neural network model training, trading activity Risk Identification Method and device 
CN108629413B (en) *  20170315  20200616  创新先进技术有限公司  Neural network model training and transaction behavior risk identification method and device 
CN107273916B (en) *  20170522  20201016  上海大学  Information hiding detection method for unknown steganography algorithm 
CN107273916A (en) *  20170522  20171020  上海大学  The unknown Information Hiding ＆ Detecting method of steganographic algorithm 
CN108334455B (en) *  20180305  20200626  清华大学  Software defect prediction method and system based on search costsensitive hypergraph learning 
CN108334455A (en) *  20180305  20180727  清华大学  The Software Defects Predict Methods and system of costsensitive hypergraph study based on search 
CN108733633A (en) *  20180518  20181102  北京科技大学  A kind of the unbalanced data homing method and device of sample distribution adjustment 
CN109086412A (en) *  20180803  20181225  北京邮电大学  A kind of unbalanced data classification method based on adaptive weighted BaggingGBDT 
CN109614967B (en) *  20181010  20200717  浙江大学  License plate detection method based on negative sample data value resampling 
CN109614967A (en) *  20181010  20190412  浙江大学  A kind of detection method of license plate based on negative sample data value resampling 
WO2020082734A1 (en) *  20181024  20200430  平安科技（深圳）有限公司  Text emotion recognition method and apparatus, electronic device, and computer nonvolatile readable storage medium 
CN109327464A (en) *  20181115  20190212  中国人民解放军战略支援部队信息工程大学  Class imbalance processing method and processing device in a kind of network invasion monitoring 
CN109740750A (en) *  20181217  20190510  北京深极智能科技有限公司  Method of data capture and device 
CN109756494A (en) *  20181229  20190514  中国银联股份有限公司  A kind of negative sample transform method and device 
CN109756494B (en) *  20181229  20210416  中国银联股份有限公司  Negative sample transformation method and device 
CN109862392A (en) *  20190320  20190607  济南大学  Recognition methods, system, equipment and the medium of internet gaming video flow 
CN109862392B (en) *  20190320  20210413  济南大学  Method, system, device and medium for identifying video traffic of internet game 
CN111062806A (en) *  20191213  20200424  合肥工业大学  Personal finance credit risk evaluation method, system and storage medium 
CN111598189A (en) *  20200720  20200828  北京瑞莱智慧科技有限公司  Generative model training method, data generation method, device, medium, and apparatus 
CN111598189B (en) *  20200720  20201030  北京瑞莱智慧科技有限公司  Generative model training method, data generation method, device, medium, and apparatus 
Similar Documents
Publication  Publication Date  Title 

CN105975992A (en)  Unbalanced data classification method based on adaptive upsampling  
Zhuang  LadderNet: Multipath networks based on UNet for medical image segmentation  
CN105300693B (en)  A kind of Method for Bearing Fault Diagnosis based on transfer learning  
CN103632168B (en)  Classifier integration method for machine learning  
CN101944174B (en)  Identification method of characters of licence plate  
CN103728551B (en)  A kind of analogcircuit fault diagnosis method based on cascade integrated classifier  
CN105844287B (en)  A kind of the domain adaptive approach and system of classification of remotesensing images  
CN105426919A (en)  Significant guidance and unsupervised feature learning based image classification method  
CN104834940A (en)  Medical image inspection disease classification method based on support vector machine (SVM)  
CN107563435A (en)  Higherdimension unbalanced data sorting technique based on SVM  
CN108764366A (en)  Feature selecting and cluster for lack of balance data integrate two sorting techniques  
CN103886030B (en)  Costsensitive decisionmaking tree based physical information fusion system data classification method  
CN104881671B (en)  A kind of high score remote sensing image Local Feature Extraction based on 2D Gabor  
CN104598885B (en)  The detection of word label and localization method in street view image  
CN102156871A (en)  Image classification method based on category correlated codebook and classifier voting strategy  
CN106845717A (en)  A kind of energy efficiency evaluation method based on multimodel convergence strategy  
CN107133640A (en)  Image classification method based on topography's block description and Fei Sheer vectors  
CN108460421A (en)  The sorting technique of unbalanced data  
CN107092884A (en)  A kind of quick coarsefine cascade pedestrian detection method  
CN106202952A (en)  A kind of Parkinson disease diagnostic method based on machine learning  
CN105975611A (en)  Selfadaptive combined downsampling reinforcing learning machine  
CN105930792A (en)  Human action classification method based on video local feature dictionary  
CN106682606A (en)  Face recognizing method and safety verification apparatus  
CN105930872A (en)  Bus driving state classification method based on classsimilar binary tree support vector machine  
CN108154924A (en)  Alzheimer's disease tagsort method and system based on support vector machines 
Legal Events
Date  Code  Title  Description 

PB01  Publication  
C06  Publication  
SE01  Entry into force of request for substantive examination  
C10  Entry into substantive examination  
RJ01  Rejection of invention patent application after publication 
Application publication date: 20160928 

RJ01  Rejection of invention patent application after publication 