CN109472302A - A kind of support vector machine ensembles learning method based on AdaBoost - Google Patents
A kind of support vector machine ensembles learning method based on AdaBoost Download PDFInfo
- Publication number
- CN109472302A CN109472302A CN201811264179.6A CN201811264179A CN109472302A CN 109472302 A CN109472302 A CN 109472302A CN 201811264179 A CN201811264179 A CN 201811264179A CN 109472302 A CN109472302 A CN 109472302A
- Authority
- CN
- China
- Prior art keywords
- adaboost
- classifier
- sample
- support vector
- weight
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/285—Selection of pattern recognition techniques, e.g. of classifiers in a multi-classifier system
Abstract
The support vector machine ensembles learning method based on AdaBoost that the present invention relates to a kind of.For the deficiency that existing support vector machines learning method is relatively low there are precision when handling class imbalance classification problem, provide a kind of support vector machine ensembles learning method based on AdaBoost, Weak Classifier is constructed using Weighted Support Vector (W-SVM), and Weak Classifier is integrated by strong classifier based on AdaBoost algorithm frame.The deep enough excavation sample distribution information of this method energy, and then it is obviously improved precision of prediction, it is the effective tool for handling class imbalance problem.
Description
Technical field
The invention belongs to data minings and machine learning field, are related to data mining and data processing method, specifically,
It is related to a kind of support vector machine ensembles learning method based on AdaBoost.
Background technique
Support vector machines is the typical core learning model established on the basis of structural risk minimization, is the most commonly used
A kind of supervised learning algorithm, basic thought is that training data is mapped to a higher-dimension Hilbert by Nonlinear Mapping
In feature space.Then, largest interval Optimal Separating Hyperplane is constructed in the space Hilbert and execute linear classification.However, training is single
Often there are many shortcomings such as precision of prediction is relatively low when handling challenge in the classifier that a support vector machines obtains.
In order to improve practical application effect, we have proposed the support vector machines based on AdaBoost under the basic framework of integrated study
Integrated learning approach, with the deep distributed intelligence for excavating sample, lift scheme precision of prediction.Support vector machine ensembles of the present invention are calculated
Method is capable of handling the classification problem under data class imbalance, can timely basis compared with classical SVM, weighed SVM (W-SVM)
The property of classifier is adapted dynamically sample weights distribution, and then improves precision of prediction.
Summary of the invention
It is an object of the invention to deposit for existing support vector machines learning method when handling class imbalance classification problem
In the deficiency that precision is relatively low, a kind of support vector machine ensembles learning method based on AdaBoost is provided, is constructed using W-SVM
Weak Classifier, and Weak Classifier is integrated by strong classifier based on AdaBoost algorithm frame.This method can deep enough excavation sample
This distributed intelligence, and then it is obviously improved precision of prediction.
An embodiment according to the present invention provides a kind of support vector machine ensembles learning method based on AdaBoost, contains
There are following steps:
(1) initialization sample weight selects W-SVM to construct Weak Classifier, classifies to class imbalance problem;
(2) weight of AdaBoost algorithm dynamic regularized learning algorithm sample is used, and according to Weak Classifier fk(x) precision
Determine the weight α of Weak Classifierk;
(3) Weak Classifier quantity T is determined by cross-validation method, multiple Weak Classifiers is integrated into strong classifier
In learning method according to an embodiment of the present invention, in step (1), initialization sample weight selects W-SVM structure
Weak Classifier is built, two classification tasks are executed to class imbalance problem.Specific steps are as follows: initialization sample weightIt is instructing
Practice selects Gaussian kernel as model kernel function, i.e. k (x on sample seti,xj)=exp (- ‖ xi-xj‖2/ d), training W-SVM mould
Type obtains decision function:Utilize the mark of decision function prediction test sample x
Label:
Y=sign (fk(x)) (1)
In learning method according to an embodiment of the present invention, in step (2), learned using AdaBoost algorithm dynamic adjustment
Practise the weight w of samplek,i, and determine according to nicety of grading the weight α of Weak Classifierk.Specific steps are as follows: calculate k-th of weak typing
Device fk(x) the weighted error rate on training set is
That is fk(x) the error rate e on training datasetkIt is exactly by fk(x) accidentally divide the weighted sum of sample.
Calculate Weak Classifier fk(x) weight in integrated classifier:
Training sample weight is updated, sample weights coefficient when+1 Weak Classifier of kth is learnt are as follows:
Wherein,For normalization factor.
In learning method according to an embodiment of the present invention, in step (3), multiple Weak Classifiers are integrated into strong classification
The specific steps of device are as follows: according to Integrated Strategy, AdaBoost is using weighted mean method, according to weight αkCombine each weak point
Class device, obtains
By the effect of sign function sign, strong classifier is obtained
Wherein T is the number of iterations, is determined by cross-validation method.
The support vector machine ensembles learning method based on AdaBoost that the present invention relates to a kind of.Based on AdaBoost algorithm frame
Frame, this method can realize the dynamic adjustment of sample weights, and then deeply excavate sample distribution information.By introducing W-SVM model
Construct classification of the Weak Classifier realization to class imbalance problem.Finally, multiple W-SVM Weak Classifiers are integrated by Integrated Strategy
For strong classifier, the nicety of grading of prediction model is improved.
Detailed description of the invention
Attached drawing 1 is the support vector machine ensembles learning method schematic diagram based on AdaBoost in the embodiment of the present invention.
Attached drawing 2 is blast furnace temperature [Si] and air quantity time series chart in the embodiment of the present invention.
Attached drawing 3 is the normal working of a furnace and abnormal working of a furnace sample distribution schematic diagram in the embodiment of the present invention.
Specific embodiment
Specific steps of the present invention are explained below in conjunction with attached drawing.
Embodiment one: pre- with blast furnace temperature ([Si]) state of Laigang 1# blast furnace (BF (a)) and Baogang's 7# blast furnace (BF (b))
For report problem.Fig. 2 gives the time series of blast furnace temperature [Si] and blast furnace air quantity.[Si] and air quantity exist as shown in Figure 2
There is significant difference on scale.The influence that large scale variable will be covered small scale variable and be generated to model, and then seriously affect mould
The precision of prediction of type.For this purpose, using formula firstSampled data is returned
One change processing, so that input variable is in same scale.Training sample set and test sample set are determined, in training sample
Clustering is carried out to furnace temperature [Si] by K mean algorithm on collection, furnace temperature [Si] is divided into low temperature, high temperature and normal three kinds of shapes
State.Low temperature and the condition of high temperature merge into abnormality.Therefore, furnace temperature [Si] is divided into two major classes, i.e. normal condition and abnormality.
Fig. 3 show K mean cluster algorithm output BF (a) and BF (b) normal state and abnormality sample distribution, normally with abnormal shape
The ratio between sample size of state is about 4:1, is typical class imbalance classification problem.
A kind of support vector machine ensembles learning method based on AdaBoost provided by the invention contains following steps:
Step 1: initialization sample weightW-SVM is selected to construct Weak Classifier, to above-mentioned class imbalance problem
Execute classification task.Closing in training sample set selects Gaussian kernel as model kernel function, i.e. k (xi,xj)=exp (- ‖ xi-xj
‖2/ d), training W-SVM model obtains decision functionTest specimens are predicted using decision function
The label of this x:
Y=sign (fk(x)) (1)
Step 2: using the weight w of AdaBoost algorithm dynamic regularized learning algorithm samplek,i, and according to Weak Classifier fk(x)
Precision determine the weight α of Weak Classifierk.Calculate k-th of Weak Classifier fk(x) the weighted error rate on training set is
That is fk(x) the error rate e on training datasetkIt is exactly by fk(x) accidentally divide the weighted sum of sample.
Calculate Weak Classifier fk(x) weight shared in integrated classifier:
Training sample weight is updated, sample weights coefficient when+1 Weak Classifier of kth is learnt are as follows:
Wherein,For normalization factor.
Step 3: multiple Weak Classifiers are integrated into strong classifier.According to Integrated Strategy, AdaBoost is using weighting
The method of average, according to weight αkEach Weak Classifier is combined, is obtained
By the effect of sign function sign, strong classifier is obtained
Wherein T is the number of iterations, and 5 folding cross-validation methods is used to determine the number of iterations T of BF (a) for the iteration of 14, BF (a)
Number T is 7.
The classification for comparing integrated study classifier (AdaBoostSVM) formula (6) and SVM, W-SVM on test set is imitated
Fruit is shown in Table 1.
1. blast furnace temperature of table [Si] state classification result
Above-described embodiment is used to explain the present invention, rather than limits the invention, in spirit and right of the invention
It is required that protection scope in, to any modifications and changes for making of the present invention, both fall within protection scope of the present invention.
Claims (4)
1. a kind of support vector machine ensembles learning method based on AdaBoost, it is characterised in that: contain following steps:
(1) initialization sample weight selects weighed SVM (W-SVM) to construct Weak Classifier, classifies to class imbalance problem;
(2) weight of AdaBoost algorithm dynamic regularized learning algorithm sample is used, and according to Weak Classifier fk(x) precision determines weak
The weight α of classifierk;
(3) Weak Classifier quantity T is determined by cross-validation method, and T Weak Classifier is integrated into strong classifier
2. a kind of support vector machine ensembles learning method based on AdaBoost according to claim 1, it is characterised in that:
In step (1), initialization sample weight selects W-SVM to construct Weak Classifier, executes two classification to class imbalance problem and appoints
Business.Specific steps are as follows: initialization sample weightClosing in training sample set selects Gaussian kernel as model kernel function,
That is k (xi,xj)=exp (- ‖ xi-xj‖2/ d), training W-SVM model obtains decision function: Utilize the label of decision function prediction test sample x:
Y=sign (fk(x)) (1)
3. a kind of support vector machine ensembles learning method based on AdaBoost according to claim 1, it is characterised in that:
In step (2), using the weight w of AdaBoost algorithm dynamic regularized learning algorithm samplek,i, and weak typing is determined according to nicety of grading
The weight α of devicek.Specific steps are as follows: calculate k-th of Weak Classifier fk(x) the weighted error rate on training set is
That is fk(x) the error rate e on training datasetkIt is exactly by fk(x) accidentally divide the weighted sum of sample.
Calculate Weak Classifier fk(x) weight in integrated classifier:
Training sample weight is updated, sample weights when+1 Weak Classifier of kth are learnt are as follows:
Wherein,For normalization factor.
4. a kind of support vector machine ensembles learning method based on AdaBoost according to claim 1, it is characterised in that:
In step (3), multiple Weak Classifiers are integrated into the specific steps of strong classifier are as follows: according to Integrated Strategy, AdaBoost is used
Be weighted mean method, according to weight αkEach Weak Classifier is combined, is obtained
By the effect of sign function sign, strong classifier is obtained
Wherein T is the number of iterations, is determined by cross-validation method.
The support vector machine ensembles learning method based on AdaBoost that the present invention relates to a kind of.Based on AdaBoost algorithm frame,
This method can realize the dynamic adjustment of sample weights, and then deeply excavate sample distribution information.By introducing W-SVM model construction
Weak Classifier realizes the classification to class imbalance problem.Finally, multiple W-SVM Weak Classifiers are integrated by force by Integrated Strategy
Classifier improves the nicety of grading of prediction model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811264179.6A CN109472302A (en) | 2018-10-29 | 2018-10-29 | A kind of support vector machine ensembles learning method based on AdaBoost |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811264179.6A CN109472302A (en) | 2018-10-29 | 2018-10-29 | A kind of support vector machine ensembles learning method based on AdaBoost |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109472302A true CN109472302A (en) | 2019-03-15 |
Family
ID=65666566
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811264179.6A Pending CN109472302A (en) | 2018-10-29 | 2018-10-29 | A kind of support vector machine ensembles learning method based on AdaBoost |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109472302A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110717529A (en) * | 2019-09-25 | 2020-01-21 | 南京旷云科技有限公司 | Data sampling method and device |
CN110929301A (en) * | 2019-11-20 | 2020-03-27 | 海宁利伊电子科技有限公司 | Hardware Trojan horse detection method based on lifting algorithm |
CN110991500A (en) * | 2019-11-19 | 2020-04-10 | 天津师范大学 | Small sample multi-classification method based on nested integrated depth support vector machine |
CN111723949A (en) * | 2020-06-24 | 2020-09-29 | 中国石油大学(华东) | Porosity prediction method based on selective ensemble learning |
CN112790775A (en) * | 2021-01-22 | 2021-05-14 | 中国地质大学(武汉) | High-frequency oscillation rhythm detection method and device based on integrated classification |
CN113723622A (en) * | 2021-08-10 | 2021-11-30 | 中国科学院计算机网络信息中心 | Tobacco leaf sensory quality prediction method |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120148160A1 (en) * | 2010-07-08 | 2012-06-14 | Honeywell International Inc. | Landmark localization for facial imagery |
CN102799899A (en) * | 2012-06-29 | 2012-11-28 | 北京理工大学 | Special audio event layered and generalized identification method based on SVM (Support Vector Machine) and GMM (Gaussian Mixture Model) |
CN106650773A (en) * | 2016-10-11 | 2017-05-10 | 酒泉职业技术学院 | SVM-AdaBoost algorithm-based pedestrian detection method |
CN107256392A (en) * | 2017-06-05 | 2017-10-17 | 南京邮电大学 | A kind of comprehensive Emotion identification method of joint image, voice |
CN107292233A (en) * | 2017-05-16 | 2017-10-24 | 开易(北京)科技有限公司 | Tracking, the system of pedestrian detection and feature based in auxiliary driving based on part |
CN107333294A (en) * | 2017-07-31 | 2017-11-07 | 南昌航空大学 | A kind of combination AdaBoost and SVMs link quality prediction method |
CN107992895A (en) * | 2017-10-19 | 2018-05-04 | 电子科技大学 | A kind of Boosting support vector machines learning method |
CN108040337A (en) * | 2018-01-02 | 2018-05-15 | 重庆邮电大学 | Based on improvement AdaBoost wireless sense network intrusion detection methods |
-
2018
- 2018-10-29 CN CN201811264179.6A patent/CN109472302A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120148160A1 (en) * | 2010-07-08 | 2012-06-14 | Honeywell International Inc. | Landmark localization for facial imagery |
CN102799899A (en) * | 2012-06-29 | 2012-11-28 | 北京理工大学 | Special audio event layered and generalized identification method based on SVM (Support Vector Machine) and GMM (Gaussian Mixture Model) |
CN106650773A (en) * | 2016-10-11 | 2017-05-10 | 酒泉职业技术学院 | SVM-AdaBoost algorithm-based pedestrian detection method |
CN107292233A (en) * | 2017-05-16 | 2017-10-24 | 开易(北京)科技有限公司 | Tracking, the system of pedestrian detection and feature based in auxiliary driving based on part |
CN107256392A (en) * | 2017-06-05 | 2017-10-17 | 南京邮电大学 | A kind of comprehensive Emotion identification method of joint image, voice |
CN107333294A (en) * | 2017-07-31 | 2017-11-07 | 南昌航空大学 | A kind of combination AdaBoost and SVMs link quality prediction method |
CN107992895A (en) * | 2017-10-19 | 2018-05-04 | 电子科技大学 | A kind of Boosting support vector machines learning method |
CN108040337A (en) * | 2018-01-02 | 2018-05-15 | 重庆邮电大学 | Based on improvement AdaBoost wireless sense network intrusion detection methods |
Non-Patent Citations (1)
Title |
---|
汪小我: "microRNA相关问题的计算分析", 《中国博士学位论文全文数据库 基础科学辑》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110717529A (en) * | 2019-09-25 | 2020-01-21 | 南京旷云科技有限公司 | Data sampling method and device |
CN110717529B (en) * | 2019-09-25 | 2022-09-30 | 南京旷云科技有限公司 | Data sampling method and device |
CN110991500A (en) * | 2019-11-19 | 2020-04-10 | 天津师范大学 | Small sample multi-classification method based on nested integrated depth support vector machine |
CN110929301A (en) * | 2019-11-20 | 2020-03-27 | 海宁利伊电子科技有限公司 | Hardware Trojan horse detection method based on lifting algorithm |
CN110929301B (en) * | 2019-11-20 | 2022-07-26 | 海宁利伊电子科技有限公司 | Hardware Trojan horse detection method based on lifting algorithm |
CN111723949A (en) * | 2020-06-24 | 2020-09-29 | 中国石油大学(华东) | Porosity prediction method based on selective ensemble learning |
CN112790775A (en) * | 2021-01-22 | 2021-05-14 | 中国地质大学(武汉) | High-frequency oscillation rhythm detection method and device based on integrated classification |
CN113723622A (en) * | 2021-08-10 | 2021-11-30 | 中国科学院计算机网络信息中心 | Tobacco leaf sensory quality prediction method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109472302A (en) | A kind of support vector machine ensembles learning method based on AdaBoost | |
CN107688825B (en) | Improved integrated weighted extreme learning machine sewage treatment fault diagnosis method | |
Wang et al. | A cluster-based competitive particle swarm optimizer with a sparse truncation operator for multi-objective optimization | |
Cheng et al. | An innovative hybrid multi-objective particle swarm optimization with or without constraints handling | |
Zeng et al. | Accurately clustering single-cell RNA-seq data by capturing structural relations between cells through graph convolutional network | |
CN110210560A (en) | Increment training method, classification method and the device of sorter network, equipment and medium | |
CN107943856A (en) | A kind of file classification method and system based on expansion marker samples | |
CN114841257B (en) | Small sample target detection method based on self-supervision comparison constraint | |
De Amorim | Constrained clustering with minkowski weighted k-means | |
Cai et al. | Imbalanced evolving self-organizing learning | |
Xiang et al. | A many-objective particle swarm optimizer with leaders selected from historical solutions by using scalar projections | |
CN104091038A (en) | Method for weighting multiple example studying features based on master space classifying criterion | |
CN110516339A (en) | Sealing structure reliability estimation method under multi-invalidation mode based on Adaboost algorithm | |
CN109993229A (en) | A kind of serious unbalanced data classification method | |
CN112149760A (en) | Heterogeneous inner hyperplane-based fuzzy support vector machine design method | |
CN110490306A (en) | A kind of neural metwork training and object identifying method, device and electronic equipment | |
CN111191685A (en) | Method for dynamically weighting loss function | |
CN105512675A (en) | Memory multi-point crossover gravitational search-based feature selection method | |
Kumar et al. | Particle swarm optimization: a study of variants and their applications | |
CN107273922A (en) | A kind of screening sample and weighing computation method learnt towards multi-source instance migration | |
CN106971201A (en) | Multi-tag sorting technique based on integrated study | |
CN103902706A (en) | Method for classifying and predicting big data on basis of SVM (support vector machine) | |
Zhang et al. | Learning biased SVM with weighted within-class scatter for imbalanced classification | |
CN109726738A (en) | Data classification method based on transfer learning Yu attribute entropy weighted fuzzy clustering | |
Liu et al. | A improved NSGA-II algorithm based on sub-regional search |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information |
Inventor after: Gradual order Inventor after: Gu Xiao Inventor after: Chen Hongyi Inventor after: Liang Xijun Inventor before: Chen Hongyi Inventor before: Lei Hejie Inventor before: Liang Xijun Inventor before: Gradual order |
|
CB03 | Change of inventor or designer information | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190315 |
|
RJ01 | Rejection of invention patent application after publication |