CN109472302A - A kind of support vector machine ensembles learning method based on AdaBoost - Google Patents

A kind of support vector machine ensembles learning method based on AdaBoost Download PDF

Info

Publication number
CN109472302A
CN109472302A CN201811264179.6A CN201811264179A CN109472302A CN 109472302 A CN109472302 A CN 109472302A CN 201811264179 A CN201811264179 A CN 201811264179A CN 109472302 A CN109472302 A CN 109472302A
Authority
CN
China
Prior art keywords
adaboost
classifier
sample
support vector
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811264179.6A
Other languages
Chinese (zh)
Inventor
陈宏义
雷鹤杰
梁锡军
渐令
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Petroleum East China
Original Assignee
China University of Petroleum East China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Petroleum East China filed Critical China University of Petroleum East China
Priority to CN201811264179.6A priority Critical patent/CN109472302A/en
Publication of CN109472302A publication Critical patent/CN109472302A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/285Selection of pattern recognition techniques, e.g. of classifiers in a multi-classifier system

Abstract

The support vector machine ensembles learning method based on AdaBoost that the present invention relates to a kind of.For the deficiency that existing support vector machines learning method is relatively low there are precision when handling class imbalance classification problem, provide a kind of support vector machine ensembles learning method based on AdaBoost, Weak Classifier is constructed using Weighted Support Vector (W-SVM), and Weak Classifier is integrated by strong classifier based on AdaBoost algorithm frame.The deep enough excavation sample distribution information of this method energy, and then it is obviously improved precision of prediction, it is the effective tool for handling class imbalance problem.

Description

A kind of support vector machine ensembles learning method based on AdaBoost
Technical field
The invention belongs to data minings and machine learning field, are related to data mining and data processing method, specifically, It is related to a kind of support vector machine ensembles learning method based on AdaBoost.
Background technique
Support vector machines is the typical core learning model established on the basis of structural risk minimization, is the most commonly used A kind of supervised learning algorithm, basic thought is that training data is mapped to a higher-dimension Hilbert by Nonlinear Mapping In feature space.Then, largest interval Optimal Separating Hyperplane is constructed in the space Hilbert and execute linear classification.However, training is single Often there are many shortcomings such as precision of prediction is relatively low when handling challenge in the classifier that a support vector machines obtains. In order to improve practical application effect, we have proposed the support vector machines based on AdaBoost under the basic framework of integrated study Integrated learning approach, with the deep distributed intelligence for excavating sample, lift scheme precision of prediction.Support vector machine ensembles of the present invention are calculated Method is capable of handling the classification problem under data class imbalance, can timely basis compared with classical SVM, weighed SVM (W-SVM) The property of classifier is adapted dynamically sample weights distribution, and then improves precision of prediction.
Summary of the invention
It is an object of the invention to deposit for existing support vector machines learning method when handling class imbalance classification problem In the deficiency that precision is relatively low, a kind of support vector machine ensembles learning method based on AdaBoost is provided, is constructed using W-SVM Weak Classifier, and Weak Classifier is integrated by strong classifier based on AdaBoost algorithm frame.This method can deep enough excavation sample This distributed intelligence, and then it is obviously improved precision of prediction.
An embodiment according to the present invention provides a kind of support vector machine ensembles learning method based on AdaBoost, contains There are following steps:
(1) initialization sample weight selects W-SVM to construct Weak Classifier, classifies to class imbalance problem;
(2) weight of AdaBoost algorithm dynamic regularized learning algorithm sample is used, and according to Weak Classifier fk(x) precision Determine the weight α of Weak Classifierk
(3) Weak Classifier quantity T is determined by cross-validation method, multiple Weak Classifiers is integrated into strong classifier
In learning method according to an embodiment of the present invention, in step (1), initialization sample weight selects W-SVM structure Weak Classifier is built, two classification tasks are executed to class imbalance problem.Specific steps are as follows: initialization sample weightIt is instructing Practice selects Gaussian kernel as model kernel function, i.e. k (x on sample seti,xj)=exp (- ‖ xi-xj2/ d), training W-SVM mould Type obtains decision function:Utilize the mark of decision function prediction test sample x Label:
Y=sign (fk(x)) (1)
In learning method according to an embodiment of the present invention, in step (2), learned using AdaBoost algorithm dynamic adjustment Practise the weight w of samplek,i, and determine according to nicety of grading the weight α of Weak Classifierk.Specific steps are as follows: calculate k-th of weak typing Device fk(x) the weighted error rate on training set is
That is fk(x) the error rate e on training datasetkIt is exactly by fk(x) accidentally divide the weighted sum of sample.
Calculate Weak Classifier fk(x) weight in integrated classifier:
Training sample weight is updated, sample weights coefficient when+1 Weak Classifier of kth is learnt are as follows:
Wherein,For normalization factor.
In learning method according to an embodiment of the present invention, in step (3), multiple Weak Classifiers are integrated into strong classification The specific steps of device are as follows: according to Integrated Strategy, AdaBoost is using weighted mean method, according to weight αkCombine each weak point Class device, obtains
By the effect of sign function sign, strong classifier is obtained
Wherein T is the number of iterations, is determined by cross-validation method.
The support vector machine ensembles learning method based on AdaBoost that the present invention relates to a kind of.Based on AdaBoost algorithm frame Frame, this method can realize the dynamic adjustment of sample weights, and then deeply excavate sample distribution information.By introducing W-SVM model Construct classification of the Weak Classifier realization to class imbalance problem.Finally, multiple W-SVM Weak Classifiers are integrated by Integrated Strategy For strong classifier, the nicety of grading of prediction model is improved.
Detailed description of the invention
Attached drawing 1 is the support vector machine ensembles learning method schematic diagram based on AdaBoost in the embodiment of the present invention.
Attached drawing 2 is blast furnace temperature [Si] and air quantity time series chart in the embodiment of the present invention.
Attached drawing 3 is the normal working of a furnace and abnormal working of a furnace sample distribution schematic diagram in the embodiment of the present invention.
Specific embodiment
Specific steps of the present invention are explained below in conjunction with attached drawing.
Embodiment one: pre- with blast furnace temperature ([Si]) state of Laigang 1# blast furnace (BF (a)) and Baogang's 7# blast furnace (BF (b)) For report problem.Fig. 2 gives the time series of blast furnace temperature [Si] and blast furnace air quantity.[Si] and air quantity exist as shown in Figure 2 There is significant difference on scale.The influence that large scale variable will be covered small scale variable and be generated to model, and then seriously affect mould The precision of prediction of type.For this purpose, using formula firstSampled data is returned One change processing, so that input variable is in same scale.Training sample set and test sample set are determined, in training sample Clustering is carried out to furnace temperature [Si] by K mean algorithm on collection, furnace temperature [Si] is divided into low temperature, high temperature and normal three kinds of shapes State.Low temperature and the condition of high temperature merge into abnormality.Therefore, furnace temperature [Si] is divided into two major classes, i.e. normal condition and abnormality. Fig. 3 show K mean cluster algorithm output BF (a) and BF (b) normal state and abnormality sample distribution, normally with abnormal shape The ratio between sample size of state is about 4:1, is typical class imbalance classification problem.
A kind of support vector machine ensembles learning method based on AdaBoost provided by the invention contains following steps:
Step 1: initialization sample weightW-SVM is selected to construct Weak Classifier, to above-mentioned class imbalance problem Execute classification task.Closing in training sample set selects Gaussian kernel as model kernel function, i.e. k (xi,xj)=exp (- ‖ xi-xj2/ d), training W-SVM model obtains decision functionTest specimens are predicted using decision function The label of this x:
Y=sign (fk(x)) (1)
Step 2: using the weight w of AdaBoost algorithm dynamic regularized learning algorithm samplek,i, and according to Weak Classifier fk(x) Precision determine the weight α of Weak Classifierk.Calculate k-th of Weak Classifier fk(x) the weighted error rate on training set is
That is fk(x) the error rate e on training datasetkIt is exactly by fk(x) accidentally divide the weighted sum of sample.
Calculate Weak Classifier fk(x) weight shared in integrated classifier:
Training sample weight is updated, sample weights coefficient when+1 Weak Classifier of kth is learnt are as follows:
Wherein,For normalization factor.
Step 3: multiple Weak Classifiers are integrated into strong classifier.According to Integrated Strategy, AdaBoost is using weighting The method of average, according to weight αkEach Weak Classifier is combined, is obtained
By the effect of sign function sign, strong classifier is obtained
Wherein T is the number of iterations, and 5 folding cross-validation methods is used to determine the number of iterations T of BF (a) for the iteration of 14, BF (a) Number T is 7.
The classification for comparing integrated study classifier (AdaBoostSVM) formula (6) and SVM, W-SVM on test set is imitated Fruit is shown in Table 1.
1. blast furnace temperature of table [Si] state classification result
Above-described embodiment is used to explain the present invention, rather than limits the invention, in spirit and right of the invention It is required that protection scope in, to any modifications and changes for making of the present invention, both fall within protection scope of the present invention.

Claims (4)

1. a kind of support vector machine ensembles learning method based on AdaBoost, it is characterised in that: contain following steps:
(1) initialization sample weight selects weighed SVM (W-SVM) to construct Weak Classifier, classifies to class imbalance problem;
(2) weight of AdaBoost algorithm dynamic regularized learning algorithm sample is used, and according to Weak Classifier fk(x) precision determines weak The weight α of classifierk
(3) Weak Classifier quantity T is determined by cross-validation method, and T Weak Classifier is integrated into strong classifier
2. a kind of support vector machine ensembles learning method based on AdaBoost according to claim 1, it is characterised in that: In step (1), initialization sample weight selects W-SVM to construct Weak Classifier, executes two classification to class imbalance problem and appoints Business.Specific steps are as follows: initialization sample weightClosing in training sample set selects Gaussian kernel as model kernel function, That is k (xi,xj)=exp (- ‖ xi-xj2/ d), training W-SVM model obtains decision function: Utilize the label of decision function prediction test sample x:
Y=sign (fk(x)) (1)
3. a kind of support vector machine ensembles learning method based on AdaBoost according to claim 1, it is characterised in that: In step (2), using the weight w of AdaBoost algorithm dynamic regularized learning algorithm samplek,i, and weak typing is determined according to nicety of grading The weight α of devicek.Specific steps are as follows: calculate k-th of Weak Classifier fk(x) the weighted error rate on training set is
That is fk(x) the error rate e on training datasetkIt is exactly by fk(x) accidentally divide the weighted sum of sample.
Calculate Weak Classifier fk(x) weight in integrated classifier:
Training sample weight is updated, sample weights when+1 Weak Classifier of kth are learnt are as follows:
Wherein,For normalization factor.
4. a kind of support vector machine ensembles learning method based on AdaBoost according to claim 1, it is characterised in that: In step (3), multiple Weak Classifiers are integrated into the specific steps of strong classifier are as follows: according to Integrated Strategy, AdaBoost is used Be weighted mean method, according to weight αkEach Weak Classifier is combined, is obtained
By the effect of sign function sign, strong classifier is obtained
Wherein T is the number of iterations, is determined by cross-validation method.
The support vector machine ensembles learning method based on AdaBoost that the present invention relates to a kind of.Based on AdaBoost algorithm frame, This method can realize the dynamic adjustment of sample weights, and then deeply excavate sample distribution information.By introducing W-SVM model construction Weak Classifier realizes the classification to class imbalance problem.Finally, multiple W-SVM Weak Classifiers are integrated by force by Integrated Strategy Classifier improves the nicety of grading of prediction model.
CN201811264179.6A 2018-10-29 2018-10-29 A kind of support vector machine ensembles learning method based on AdaBoost Pending CN109472302A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811264179.6A CN109472302A (en) 2018-10-29 2018-10-29 A kind of support vector machine ensembles learning method based on AdaBoost

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811264179.6A CN109472302A (en) 2018-10-29 2018-10-29 A kind of support vector machine ensembles learning method based on AdaBoost

Publications (1)

Publication Number Publication Date
CN109472302A true CN109472302A (en) 2019-03-15

Family

ID=65666566

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811264179.6A Pending CN109472302A (en) 2018-10-29 2018-10-29 A kind of support vector machine ensembles learning method based on AdaBoost

Country Status (1)

Country Link
CN (1) CN109472302A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110717529A (en) * 2019-09-25 2020-01-21 南京旷云科技有限公司 Data sampling method and device
CN110929301A (en) * 2019-11-20 2020-03-27 海宁利伊电子科技有限公司 Hardware Trojan horse detection method based on lifting algorithm
CN110991500A (en) * 2019-11-19 2020-04-10 天津师范大学 Small sample multi-classification method based on nested integrated depth support vector machine
CN111723949A (en) * 2020-06-24 2020-09-29 中国石油大学(华东) Porosity prediction method based on selective ensemble learning
CN112790775A (en) * 2021-01-22 2021-05-14 中国地质大学(武汉) High-frequency oscillation rhythm detection method and device based on integrated classification
CN113723622A (en) * 2021-08-10 2021-11-30 中国科学院计算机网络信息中心 Tobacco leaf sensory quality prediction method

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120148160A1 (en) * 2010-07-08 2012-06-14 Honeywell International Inc. Landmark localization for facial imagery
CN102799899A (en) * 2012-06-29 2012-11-28 北京理工大学 Special audio event layered and generalized identification method based on SVM (Support Vector Machine) and GMM (Gaussian Mixture Model)
CN106650773A (en) * 2016-10-11 2017-05-10 酒泉职业技术学院 SVM-AdaBoost algorithm-based pedestrian detection method
CN107256392A (en) * 2017-06-05 2017-10-17 南京邮电大学 A kind of comprehensive Emotion identification method of joint image, voice
CN107292233A (en) * 2017-05-16 2017-10-24 开易(北京)科技有限公司 Tracking, the system of pedestrian detection and feature based in auxiliary driving based on part
CN107333294A (en) * 2017-07-31 2017-11-07 南昌航空大学 A kind of combination AdaBoost and SVMs link quality prediction method
CN107992895A (en) * 2017-10-19 2018-05-04 电子科技大学 A kind of Boosting support vector machines learning method
CN108040337A (en) * 2018-01-02 2018-05-15 重庆邮电大学 Based on improvement AdaBoost wireless sense network intrusion detection methods

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120148160A1 (en) * 2010-07-08 2012-06-14 Honeywell International Inc. Landmark localization for facial imagery
CN102799899A (en) * 2012-06-29 2012-11-28 北京理工大学 Special audio event layered and generalized identification method based on SVM (Support Vector Machine) and GMM (Gaussian Mixture Model)
CN106650773A (en) * 2016-10-11 2017-05-10 酒泉职业技术学院 SVM-AdaBoost algorithm-based pedestrian detection method
CN107292233A (en) * 2017-05-16 2017-10-24 开易(北京)科技有限公司 Tracking, the system of pedestrian detection and feature based in auxiliary driving based on part
CN107256392A (en) * 2017-06-05 2017-10-17 南京邮电大学 A kind of comprehensive Emotion identification method of joint image, voice
CN107333294A (en) * 2017-07-31 2017-11-07 南昌航空大学 A kind of combination AdaBoost and SVMs link quality prediction method
CN107992895A (en) * 2017-10-19 2018-05-04 电子科技大学 A kind of Boosting support vector machines learning method
CN108040337A (en) * 2018-01-02 2018-05-15 重庆邮电大学 Based on improvement AdaBoost wireless sense network intrusion detection methods

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
汪小我: "microRNA相关问题的计算分析", 《中国博士学位论文全文数据库 基础科学辑》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110717529A (en) * 2019-09-25 2020-01-21 南京旷云科技有限公司 Data sampling method and device
CN110717529B (en) * 2019-09-25 2022-09-30 南京旷云科技有限公司 Data sampling method and device
CN110991500A (en) * 2019-11-19 2020-04-10 天津师范大学 Small sample multi-classification method based on nested integrated depth support vector machine
CN110929301A (en) * 2019-11-20 2020-03-27 海宁利伊电子科技有限公司 Hardware Trojan horse detection method based on lifting algorithm
CN110929301B (en) * 2019-11-20 2022-07-26 海宁利伊电子科技有限公司 Hardware Trojan horse detection method based on lifting algorithm
CN111723949A (en) * 2020-06-24 2020-09-29 中国石油大学(华东) Porosity prediction method based on selective ensemble learning
CN112790775A (en) * 2021-01-22 2021-05-14 中国地质大学(武汉) High-frequency oscillation rhythm detection method and device based on integrated classification
CN113723622A (en) * 2021-08-10 2021-11-30 中国科学院计算机网络信息中心 Tobacco leaf sensory quality prediction method

Similar Documents

Publication Publication Date Title
CN109472302A (en) A kind of support vector machine ensembles learning method based on AdaBoost
CN107688825B (en) Improved integrated weighted extreme learning machine sewage treatment fault diagnosis method
Wang et al. A cluster-based competitive particle swarm optimizer with a sparse truncation operator for multi-objective optimization
Cheng et al. An innovative hybrid multi-objective particle swarm optimization with or without constraints handling
Zeng et al. Accurately clustering single-cell RNA-seq data by capturing structural relations between cells through graph convolutional network
CN110210560A (en) Increment training method, classification method and the device of sorter network, equipment and medium
CN107943856A (en) A kind of file classification method and system based on expansion marker samples
CN114841257B (en) Small sample target detection method based on self-supervision comparison constraint
De Amorim Constrained clustering with minkowski weighted k-means
Cai et al. Imbalanced evolving self-organizing learning
Xiang et al. A many-objective particle swarm optimizer with leaders selected from historical solutions by using scalar projections
CN104091038A (en) Method for weighting multiple example studying features based on master space classifying criterion
CN110516339A (en) Sealing structure reliability estimation method under multi-invalidation mode based on Adaboost algorithm
CN109993229A (en) A kind of serious unbalanced data classification method
CN112149760A (en) Heterogeneous inner hyperplane-based fuzzy support vector machine design method
CN110490306A (en) A kind of neural metwork training and object identifying method, device and electronic equipment
CN111191685A (en) Method for dynamically weighting loss function
CN105512675A (en) Memory multi-point crossover gravitational search-based feature selection method
Kumar et al. Particle swarm optimization: a study of variants and their applications
CN107273922A (en) A kind of screening sample and weighing computation method learnt towards multi-source instance migration
CN106971201A (en) Multi-tag sorting technique based on integrated study
CN103902706A (en) Method for classifying and predicting big data on basis of SVM (support vector machine)
Zhang et al. Learning biased SVM with weighted within-class scatter for imbalanced classification
CN109726738A (en) Data classification method based on transfer learning Yu attribute entropy weighted fuzzy clustering
Liu et al. A improved NSGA-II algorithm based on sub-regional search

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Gradual order

Inventor after: Gu Xiao

Inventor after: Chen Hongyi

Inventor after: Liang Xijun

Inventor before: Chen Hongyi

Inventor before: Lei Hejie

Inventor before: Liang Xijun

Inventor before: Gradual order

CB03 Change of inventor or designer information
RJ01 Rejection of invention patent application after publication

Application publication date: 20190315

RJ01 Rejection of invention patent application after publication