CN102722726A - Multi-class support vector machine classification method based on dynamic binary tree - Google Patents

Multi-class support vector machine classification method based on dynamic binary tree Download PDF

Info

Publication number
CN102722726A
CN102722726A CN2012101815509A CN201210181550A CN102722726A CN 102722726 A CN102722726 A CN 102722726A CN 2012101815509 A CN2012101815509 A CN 2012101815509A CN 201210181550 A CN201210181550 A CN 201210181550A CN 102722726 A CN102722726 A CN 102722726A
Authority
CN
China
Prior art keywords
svm
classification
binary tree
network
train
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012101815509A
Other languages
Chinese (zh)
Other versions
CN102722726B (en
Inventor
韦磊
朱红
程春玲
王亚石
隋宗见
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Post and Telecommunication University
Nanjing University of Posts and Telecommunications
Nanjing Power Supply Co of Jiangsu Electric Power Co
Original Assignee
Nanjing Post and Telecommunication University
Nanjing Power Supply Co of Jiangsu Electric Power Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University, Nanjing Power Supply Co of Jiangsu Electric Power Co filed Critical Nanjing Post and Telecommunication University
Priority to CN201210181550.9A priority Critical patent/CN102722726B/en
Publication of CN102722726A publication Critical patent/CN102722726A/en
Application granted granted Critical
Publication of CN102722726B publication Critical patent/CN102722726B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a multi-class support vector machine classification method based on a dynamic binary tree, which belongs to the technical field of data mining. The method is characterized in that a plurality of dichotomous SVMs are utilized to form a multi-class SVM classifier with a binary tree structure, the binary tree structure is dynamically adjusted according to the classification result of all the dichotomous SVMs during the classification process, the dichotomous SVMs with higher classification success rate are adjusted to the root of the binary tree structure, accordingly, the classification success rate of the early stage is improved, the number of the dichotomous SVMs which pass by a single sample is reduced, and the classification speed is effectively increased while the classification accuracy is ensured. The invention further discloses a network alarm prediction method, a peer-to-peer (P2P) traffic classification method, an image semantic classification method, a network attack detection method and a webpage classification method which adopt the multi-classification method.

Description

The many sorting techniques of a kind of SVM based on dynamic binary tree
Technical field
The present invention relates to the many sorting techniques of a kind of SVM (Support Vector Machine, SVMs), relate in particular to the many sorting techniques of a kind of SVM, belong to the data mining technology field based on dynamic binary tree.
Background technology
SVMs is that two types of classification problems design at first, and in practical application, many classification problems are more general.How the premium properties of SVMs is generalized in the middle of the multicategory classification, becomes a hot issue of present SVMs research.
The many sorting techniques of SVMs that exist at present can be divided into two types: class methods are on all training samples, to find the solution a big quadratic programming problem, simultaneously with a plurality of types separately.These class methods are fairly simple theoretically, increase greatly but find the solution so big its computation complexity of multiclass quadratic programming, thereby its training time are longer.Second class methods are structures and combine a plurality of two types of classification problems to carry out multicategory classification.These class methods comprise one type to surplus type method, one type to one type of method, decision-making binary tree method and the acyclic figure method of decision-directed etc.
One type to surplus type of method (One Versus Rest; OVR) be one of the method the most widely of using at present; The steps include: to construct K two types of classifiers (establishing total K classification), wherein i SVMs serves as positive type sample with i class sample, and all the other samples are negative type of sample training.
During differentiation, the input sample obtains K output valve altogether through K vector machine, if having one+1 to occur, then its corresponding classification is the input sample class; If neither one+and 1 output, prove that then input vector does not belong to a kind of in this K type, and belong to other types, the failure of this subseries.
The advantage of OVR method is: only need K two types of classifiers of training, (K) is less for the number of resulting classification function, and training speed is very fast under the little situation of sample size.
The shortcoming of OVR method is: 1, the training of each two class vector machine all be with whole samples as training sample, this need find the solution the quadratic programming problem that K contains whole variablees.Because the training speed of SVMs sharply slows down along with the increase of number of training, therefore, the OVR method training time is longer; 2. the increase classification speed along with sample size also can sharply slow down, and can be known by method of discrimination, and each sample must pass through each SVMs, has reduced classification speed so greatly.
Summary of the invention
Technical matters to be solved by this invention is to overcome tradition based on existing the increasing along with sample size of the sorting technique of SVMs; The deficiency that classification speed sharply descends; A kind of SVM based on dynamic binary tree is provided many sorting techniques; Under the situation that does not influence classification accuracy, effectively improve polytypic speed, expanded range of application based on many sorting techniques of SVMs.
The present invention is concrete to adopt following technical scheme to solve the problems of the technologies described above.
The many sorting techniques of a kind of SVM based on dynamic binary tree at first utilize a plurality of two classification SVM that train to construct the SVM multi-categorizer of binary tree structures, utilize the SVM multi-categorizer of being constructed that the test sample book collection is classified then; The SVM multi-categorizer that said utilization is constructed is classified to the test sample book collection, specifically may further comprise the steps:
Step 1, first test sample book that test sample book is concentrated are imported the root node of said SVM multi-categorizer; And with in the SVM multi-categorizer each two the classification SVM the adjustment factor be initialized as 0; The classification number of success that is defined as this two classification SVM of the said adjustment factor and the ratio of classification total degree; The classification number of success is for be the number of+1 test sample book through this two classification SVM and output result, the classification total degree be meant through this two classify SVM the sum of test sample book;
Step 2, be empty node like present node, then assorting process finishes, and forwards step 4 to, otherwise, go to step 3;
Step 3, with treating classification samples and classify as the first two classification SVM; As to export the result be-1; Then dynamically adjust the adjustment factor, and this test sample book is inputed to as the pairing two classification SVM of the child node of the first two classification SVM, change step 2 then as the first two classification SVM according to the output result; If+1, then dynamically adjustment is when the adjustment factor of the first two classification SVM according to the output result, assorting process finishes, and goes to step 4;
Step 4, judge in the SVM multi-categorizer each two classification SVM maximal value and the ratio between the minimum value of the adjustment factor whether greater than a preset adjustment threshold value; In this way; Then readjust the binary tree structure of said SVM multi-categorizer according to following method: will adjust the big SVM of factor values to the adjustment of the root position of binary tree, and promptly adjust the maximum SVM of the factor as root node, inferior big SVM is as the child node of root node; By that analogy, set up new binary tree structure; As not, then keep the structure of binary tree constant;
Step 5, the next test sample book that test sample book is concentrated are imported the root node of said SVM multi-categorizer, and repeated execution of steps 2-step 4, concentrate all test sample books all to accomplish classification until test sample book.
Many sorting techniques of the present invention can be widely used in the data mining in each field, for example:
A kind of network alarm Forecasting Methodology is classified to the time series of alarm, and classification results is and predicts the outcome, and said time series to alarm is classified, and may further comprise the steps:
Steps A, one type of network alarm historical data is carried out vector extract and carry out pre-service, obtain the training sample of such network alarm;
The training sample that step B, utilization obtain is trained two classification SVM, obtains two classification SVM of such network alarm;
Step C, choose respectively repeating step A-step B of multiclass network alarm history data, obtain a plurality of two classification SVM that train;
A plurality of two classification SVM that train that step D, utilization obtain use the many sorting techniques of above-mentioned SVM based on dynamic binary tree that the time series of alarm is classified, and classification results is and predicts the outcome.
A kind of P2P traffic classification method is used for type under the P2P flow is discerned, and may further comprise the steps:
Steps A, one type of P2P data on flows is carried out feature extraction, obtain the training sample of such P2P flow;
The training sample that step B, utilization obtain is trained two classification SVM, obtains two classification SVM of such P2P flow;
Step C, choose respectively repeating step A-step B of multiclass P2P data on flows, obtain a plurality of two classification SVM that train;
A plurality of two classification SVM that train that step D, utilization obtain use the many sorting techniques of above-mentioned SVM based on dynamic binary tree that the P2P data on flows is classified.
A kind of image, semantic sorting technique may further comprise the steps:
Steps A, one type of image is carried out semantic feature extract, obtain the training sample of such image;
The training sample that step B, utilization obtain is trained two classification SVM, obtains two classification SVM of such image;
Step C, choose respectively repeating step A-step B of multiclass image, obtain a plurality of two classification SVM that train;
A plurality of two classification SVM that train that step D, utilization obtain use the many sorting techniques of above-mentioned SVM based on dynamic binary tree that image is carried out semantic classification.
A kind of network attack detecting method through network packet is classified, judges whether that occurring network attacks, and said network packet is classified, and may further comprise the steps:
Steps A, one type of network attack data is carried out feature extraction, obtain the training sample of such network attack;
The training sample that step B, utilization obtain is trained two classification SVM, obtains two classification SVM of such network attack;
Step C, choose the known network attack data of multiclass repeating step A-step B respectively, obtain a plurality of two classification SVM that train;
A plurality of two classification SVM that train that step D, utilization obtain use the many sorting techniques of above-mentioned SVM based on dynamic binary tree that network packet is classified.
A kind of Web page classification method may further comprise the steps:
Steps A, one type of web data is carried out feature extraction, obtain the training sample of such webpage;
The training sample that step B, utilization obtain is trained two classification SVM, obtains two classification SVM of such webpage;
Step C, choose respectively repeating step A-step B of multiclass web data, obtain a plurality of two classification SVM that train;
A plurality of two classification SVM that train that step D, utilization obtain use the many sorting techniques of above-mentioned SVM based on dynamic binary tree that webpage is classified.
Many sorting techniques of the present invention utilize a plurality of two classification SVM to constitute the SVM multi-categorizer of binary tree structure; And in assorting process, dynamically adjust binary tree structure according to the classification results of each two classification SVM; With being categorized into the root that two higher classification SVM of power are adjusted to binary tree structure; Thereby improve early stage classification probability of successful, and reduce single sample the quantity of two classification SVM of process, when guaranteeing classification accuracy, effectively improved classification speed.
Description of drawings
Fig. 1 is the structure of the SVM multi-categorizer of binary tree structure;
Fig. 2 is the structure of the SVMs that defines among the present invention;
Fig. 3 is the process flow diagram that the present invention is based on the many sorting techniques of SVM of dynamic binary tree.
Embodiment
Below in conjunction with accompanying drawing technical scheme of the present invention is elaborated:
The objective of the invention is to solve the existing low problem of classification speed that exists based on many sorting techniques of SVMs.Consider that each sample can only belong to a classification in concrete applied environment, so when the binary tree formation that K vector machine of a sample process forms,, then can practice thrift the plenty of time like this without remaining vector machine if the output result is+1.Based on this thought can with often export+1 SVMs is transferred to the formation first half, can save portion of time so again, improves the speed of classification.
Based on above thought; The present invention is at sorting phase; At first a plurality of two classification SVM that train are combined into a binary tree; The structure of binary tree is as shown in Figure 1, in each SVMs, increases an adjustment factor attribute simultaneously, the classification number of success that is defined as this two classification SVM of the said adjustment factor and the ratio of classification total degree; The classification number of success is for be the number of+1 test sample book through this two classification SVM and output result, the classification total degree be meant through this two classify SVM the sum of test sample book.In the process of classification, constantly adjust the size that it adjusts the factor according to the classification results of each two classification SVM; When the ratio of maximal value and the minimum value of the adjustment factor of each two classification SVM during greater than preset threshold value; Structure to binary tree is adjusted again, and the two classification SVM that the adjustment factor is maximum are put into the root node of binary tree, its child node of second largest conduct; By that analogy, form a new binary tree.The adjustment continued is carried out the assorting process of subsequent samples.
The many sorting techniques of SVM based on dynamic binary tree of the present invention; At first utilize the SVM multi-categorizer of a plurality of two classification SVM structure binary tree structures that train; Wherein said two classification SVM have increased by three attributes on the architecture basics of traditional SVMs; Be respectively classification number of success, classification total degree and adjustment factor attribute, the classification number of success is for be the number of+1 vector through this SVM and output result, and the total degree of classifying is meant the sum through the vector of this SVM.The attribute structure of this two classification SVM is as shown in Figure 2.
Utilize the SVM multi-categorizer of being constructed that the test sample book collection is classified then, specifically may further comprise the steps:
Step 1, first test sample book that test sample book is concentrated are imported the root node of said SVM multi-categorizer, and the adjustment factor of each two classification SVM in the SVM multi-categorizer is initialized as 0;
Step 2, be empty node like present node, then assorting process finishes, and forwards step 4 to, otherwise, go to step 3;
Step 3, with treating classification samples and classify as the first two classification SVM; As to export the result be-1; Then dynamically adjust the adjustment factor, and this test sample book is inputed to as the pairing two classification SVM of the child node of the first two classification SVM, change step 2 then as the first two classification SVM according to the output result; If+1, then dynamically adjustment is when the adjustment factor of the first two classification SVM according to the output result, assorting process finishes, and goes to step 4;
Step 4, judge in the SVM multi-categorizer each two classification SVM maximal value and the ratio between the minimum value of the adjustment factor whether greater than a preset adjustment threshold value
Figure 2012101815509100002DEST_PATH_IMAGE001
; In this way; Then readjust the binary tree structure of said SVM multi-categorizer: will adjust the root position adjustment of the big SVM of factor values to binary tree according to following method; Promptly adjust the maximum SVM of the factor as root node; Inferior big SVM is as the child node of root node; By that analogy, set up new binary tree structure; As not, then keep the structure of binary tree constant;
Step 5, the next test sample book that test sample book is concentrated are imported the root node of said SVM multi-categorizer, and repeating step execution in step 2-step 4, concentrate all test sample books all to accomplish classification until test sample book.
The flow process of the many sorting techniques of SVM based on dynamic binary tree of the present invention is as shown in Figure 3.
In order to make the public further understand technical scheme of the present invention, the application example of lifting several different field below describes.
Application example 1, network alarm prediction:
The prediction that is applied as network alarm of SVM utilizes the classification feature of SVM, and the time series of alarm is classified; Classification results is and predicts the outcome; Whether the alarm of one type of prediction exists because SVM can only classify, and need polytype alarm be predicted in the network, so need training obtain a plurality of SVM; And a plurality of SVM are organized into a binary tree structure, solve polytypic problem.Suppose in a two-way cable tv network system; Main management HFC equipment, optical network device (Optical Network Unit; ONU) and MoCA (Multimedia over Coax Alliance) head end and MoCA terminal; Five types alarm wherein possibly appear: optical fiber cable termination equipment (Optical Line Terminal, OLT) too high, the link failure of dropout, equipment packet loss, head end goes offline and the terminal goes offline.This instance is predicted this alarm of five types, then need set up five SVM.
The detailed process of network alarm prediction is following:
1, the learning phase of SVM:
1) the alarm record is carried out vector and extract, obtain the vector that target alarms is the OLT dropout, then it is carried out pre-service, form training sample, resulting training sample is carried out training study, generation can sort out the SVM of OLT dropout;
2) whether repeat said process, obtaining four respectively is the SVM of certain type of alarm;
2, based on the many sorting phases of SVM of dynamic binary tree:
1) the adjustment factor with five SVM all is initialized as 0;
2) obtain the order of SVM according to training, construct initial binary tree, root node is OLT dropout SVM, and its child nodes is the too high SVM of equipment packet loss, is respectively link failure SVM, head end SVM and the terminal SVM that goes offline that goes offline by that analogy;
3) the next sample in the test sample book is passed through this binary tree, vector is at first through root node OLT dropout SVM;
4) judge whether present node is the sky node, if empty node, then current vector classification finishes, and changes step 7), otherwise changes step 5);
5) with the corresponding SVM of present node this vector is carried out two classification, and revise the value of the adjustment factor of current SVM;
Whether the output result who 6) judges current SVM is+1, if+1, show that current sample successfully classifies, change step 7), if-1, then this sample continues the child nodes through present node, changes step 4);
7) calculate the maximum ratio of adjusting the factor and the minimum adjustment factor in the binary tree node;
Figure 218286DEST_PATH_IMAGE001
compares with threshold value; If ratio is greater than
Figure 547242DEST_PATH_IMAGE001
; Then readjust five positions of SVM in binary tree; With the adjustment factor maximum as root node; Inferior big child nodes as root node; By that analogy; Form new binary tree, if ratio less than
Figure 7043DEST_PATH_IMAGE001
, then keeps the structure of binary tree constant;
8) vector in the judgement sample whether all classification finish, if finish then to forward to step 3), if sample classification finishes, then whole assorting process finishes.
 
Application example 2, P2P traffic classification:
Peer-to-peer network (Peer to Peer; P2P) share, transmit resource through directly connecting between the peer node; Have resource utilization height, server load little, eliminated advantages such as server bottleneck, thereby obtained widespread use at aspects such as Streaming Media, instant messaging, file-sharing, online game, search engine and collaborative works.The P2P business has also consumed Internet resources excessively, even causes network congestion but simultaneously.In order to guarantee the normal orderly operation of network, need effectively discern various types of P2P flows and take corresponding operating strategy.The inventive method can be deployed in IAD, core router or its bypass in the network, is organized into dynamic binary tree through a plurality of SVM the network traffics of gathering from IAD or router are classified.Suppose on the core router of LAN, to have gathered five-tuple (source, the purpose IP address of network data flow; Source, destination slogan; Agreement) and main traffic statistics characteristic; Comprise (ratio of the mean square deviation of packet size variation, main passive linking number ratio, up-downgoing flow), form the sample data of data stream, and use the inventive method the sample data stream that the core router from LAN collects is classified; Identify non-P2P flow and BitTorrent, PPLive, UUsee, Thunder, MSN and six kinds of P2P flows commonly used of Skype, need set up seven SVM altogether.
The detailed process of P2P traffic classification is following:
1, the learning phase of SVM:
1) the P2P data stream is carried out the data pre-service, extract five-tuple and main attributive character thereof, form training sample, a SVM is trained, generate the SVM that can sort out the P2P flow with resulting training sample.
2) repeat said process, BitTorrent flow, PPLive flow, UUsee flow, Thunder flow, MSN flow and Skype flow are obtained respectively whether an ability judgment data stream is the SVM of this type flow.
2, based on the many sorting phases of SVM of dynamic binary tree:
1) the adjustment factor with seven SVM all is initialized as 0;
2) obtain the order of SVM according to training; Construct initial binary tree; Root node is P2P traffic classification SVM; Its child nodes is BitTorrent traffic classification SVM, is respectively PPLive traffic classification SVM, UUsee traffic classification SVM, Thunder traffic classification SVM, MSN traffic classification SVM and Skype traffic classification SVM by that analogy;
3) the next sample in the test sample book is passed through this binary tree, vector is at first through P2P traffic classification SVM;
4) judge whether present node is the sky node, if empty node, then current sample classification finishes, and changes step 7), otherwise changes step 5);
5) with the corresponding SVM of present node this sample is carried out two classification, and revise the value of the adjustment factor of current SVM;
Whether the output result who 6) judges current SVM is+1, if+1, show that current sample successfully classifies, change step 7), if-1, then current sample continues the child nodes through present node, changes step 4);
7) calculate the maximum ratio of adjusting the factor and the minimum adjustment factor in the binary tree node; compares with threshold value; If ratio is greater than
Figure 440877DEST_PATH_IMAGE001
; Then readjust seven positions of SVM in binary tree; With the adjustment factor maximum as root node; Inferior big child nodes as root node; By that analogy; Form new binary tree, if ratio less than
Figure 876538DEST_PATH_IMAGE001
, then keeps the structure of binary tree constant;
8) judgement sample whether all classification finish, if finish then to forward to step 3), if sample classification finishes, then whole assorting process finishes.
 
Application example 3, image, semantic classification:
Image is multimedia a kind of main forms, image data base is divided into significant semantic classes is based on an important content in the image retrieval of content.The SVM algorithm is because the training sample that needs is few, and good classification effect is widely used in the image, semantic classification.Should carry out semantic classification to the picture (http://corel.digitalriver.com/) of Corel image Datasets with instance, the Corel data set comprises 10 classifications of horse, snow mountain, food on aborigines, sea, building, big bus car, dinosaur, elephant, flower, the grassland.Use the inventive method and learns to set up a SVM with the samples pictures of mark in each classification earlier, the SVM after will learning again sets up the not mark samples pictures that dynamic binary tree treats classification and carries out Fast Classification.The characteristics of image (being the input feature vector of sorter) of classification comprises 64 dimension color characteristics and 18 dimension textural characteristics.
The detailed process of image, semantic classification is following:
1, the learning phase of SVM:
1) the samples pictures collection to aborigines carries out pre-service, and the 82 dimension characteristics of image that extract picture form training sample, set up a SVM and also learn with formed training sample, and generation can sort out aborigines' SVM;
2) repeat said process; Practice a SVM with the samples pictures training of each classification respectively; Whether nine types of horses on sea, building, big bus car, dinosaur, elephant, flower, the grassland, snow mountain, food to judge image be certain type SVM if being obtained one respectively, totally nine SVM.
2, based on the many sorting phases of SVM of dynamic binary tree:
1) the adjustment factor with ten SVM all is initialized as 0;
2) obtain the order of SVM according to training; Construct initial binary tree; Root node is aborigines' classification SVM, and its child nodes is the classification SVM in sea, is respectively the classification SVM of horse on building, big bus car, dinosaur, elephant, flower, the grassland, snow mountain, food by that analogy;
3) will pass through this binary tree by the next sample that the test sample book that the characteristics of image that does not mark picture constitutes is concentrated, vector at first passes through aborigines' classification SVM;
4) judge whether present node is the sky node, if empty node, then current sample classification finishes, and changes step 7), otherwise changes step 5);
5) with the corresponding SVM of present node this sample is carried out two classification, and revise the value of the adjustment factor of current SVM;
Whether the output result who 6) judges current SVM is+1, if+1, show that current sample successfully classifies, change step 7), if-1, then current sample continues the child nodes through present node, changes step 4);
7) calculate the maximum ratio of adjusting the factor and the minimum adjustment factor in the binary tree node; compares with threshold value; If ratio is greater than
Figure 723457DEST_PATH_IMAGE001
; Then readjust seven positions of SVM in binary tree; With the adjustment factor maximum as root node; Inferior big child nodes as root node; By that analogy; Form new binary tree, if ratio less than
Figure 291448DEST_PATH_IMAGE001
, then keeps the structure of binary tree constant;
8) judgement sample whether all classification finish, if finish then to forward to step 3), if sample classification finishes, then whole assorting process finishes.
 
Application example 4, network attack detect:
At the key inlet of the data stream such as switch, ingress router or fire wall inside of a network, monitor all packets in this network segment and use the inventive method that the network packet of catching is classified, judge whether to take place unusual invasion.If the network data collection that has collected (as: the network invasion monitoring data set KDDCUP99 that collect in MIT Lincoln laboratory) comprises 41 dimension attributes that relate to essential characteristic, content characteristic, traffic characteristic and host-flow measure feature four category features; 34 connection attributes and 7 categorical attributes are wherein arranged; Need therefrom detect Denial of Service attack (Denial of Service; DoS), (Probing) attacked in detection, the user obtains super authority and attacks (User to Root; U2R) and remote network user attack (Remote to Local, R2L) four kinds of common network attacks.Then use the inventive method and earlier each attack data is learnt to set up a SVM, the SVM after will learning again sets up dynamic binary tree structure the sample data bag is carried out Fast Classification.
The detailed process of network invasion monitoring is following:
1, the learning phase of SVM:
1) the network data collection is carried out the data pre-service, 34 connection attributes extracting the DoS attack data centralization form training sample, set up a SVM and also learn with formed training sample, and generation can sort out the SVM of DoS attack;
2) repeat said process, attack SVM of data set training with each respectively, Probing, U2R and R2L type are attacked obtained one respectively whether can judge be the SVM of certain attack.
2, based on the many sorting phases of SVM of dynamic binary tree:
1) the adjustment factor with four SVM all is initialized as 0;
2) obtain the order of SVM according to training, construct initial binary tree, root node is DoS classification SVM, and its child nodes is Probing classification SVM, is respectively U2R classification SVM and the R2L SVM that classifies by that analogy;
3) the next sample in the test sample book is passed through this binary tree, vector is at first through DoS classification SVM;
4) judge whether present node is the sky node, if empty node, then current sample classification finishes, and changes step 7), otherwise changes step 5);
5) with the corresponding SVM of present node this sample is carried out two classification, and revise the value of the adjustment factor of current SVM;
Whether the output result who 6) judges current SVM is+1, if+1, show that current sample successfully classifies, change step 7), if-1, then current sample continues the child nodes through present node, changes step 4);
7) calculate the maximum ratio of adjusting the factor and the minimum adjustment factor in the binary tree node;
Figure 581615DEST_PATH_IMAGE001
compares with threshold value; If ratio is greater than
Figure 711114DEST_PATH_IMAGE001
; Then readjust seven positions of SVM in binary tree; With the adjustment factor maximum as root node; Inferior big child nodes as root node; By that analogy; Form new binary tree, if ratio less than
Figure 86732DEST_PATH_IMAGE001
, then keeps the structure of binary tree constant;
8) judgement sample whether all classification finish, if finish then to forward to step 3), if sample classification finishes, then whole assorting process finishes.
 
Application example 5, Web page classifying:
Popularizing rapidly of Internet makes network become the main source that people's information is obtained.And automatic webpage classification is a kind of important technology of effective processing magnanimity Web information, can help the user from the webpage of magnanimity, to obtain information needed quickly and accurately.It is meant for waiting classifies webpage according to its content, according to certain automatic sorting algorithm, is divided into webpage the classification that defines in advance by computing machine.The inventive method can realize fast automatic classification to webpage.Should be divided into finance and economics, physical culture, military affairs, science and technology, culture totally 5 classifications by hand with instance from (as 2000) webpage that Internet downloads some.Wherein, choose a part of webpage (as 1500) as training set, all the other webpages (500) are as sample set.Before classifying with the inventive method, need at first webpage to be carried out feature extraction, that is: earlier hypertext is carried out home page filter, obtain Web page text, hypertext markup and hyperlinked information.Again Web page text is carried out word segmentation processing; And with TF-IDF (term frequency – inverse document frequency) characteristic representation body text is expressed as the vector form formed by entry; If whole characteristic sums of all web page texts are n; Then constitute the vector space of n dimension, wherein each web page text be represented as a n dimensional vector ( w 1 , w 2 ..., w n ), the component of vector on each dimension is to should the weights of characteristic in this page.Then, utilization the inventive method learns to set up a SVM to the webpage training set of each classification, and the SVM after will learning again sets up dynamic binary tree structure the webpage sample set is carried out Fast Classification.
The detailed process of Web page classifying is following:
1, the learning phase of SVM:
1) the webpage training set is carried out the data pre-service, the n dimensional feature vector that extracts each webpage in the finance and economic webpage forms training sample, sets up a SVM and also learns with formed training sample, and generation can sort out the SVM of finance and economic webpage;
2) repeating said process, practice a SVM with the webpage training of each classification respectively, whether physical culture, military affairs, science and technology, these four classifications of culture to judge webpage be certain type SVM if being obtained one respectively.
2, based on the many sorting phases of SVM of dynamic binary tree:
1) the adjustment factor with five SVM all is initialized as 0;
2) obtain the order of SVM according to training, construct initial binary tree, root node is finance and economics classification SVM, and its child nodes is classification sports SVM, is respectively military, science and technology and culture classification SVM by that analogy;
3) the next sample in the test sample book is passed through this binary tree, vector is at first through judging whether the SVM for the finance and economic webpage;
4) judge whether present node is the sky node, if empty node, then current sample classification finishes, and changes step 7), otherwise changes step 5);
5) with the corresponding SVM of present node this sample is carried out two classification, and revise the value of the adjustment factor of current SVM;
Whether the output result who 6) judges current SVM is+1, if+1, show that current sample successfully classifies, change step 7), if-1, then current sample continues the child nodes through present node, changes step 4);
7) calculate the maximum ratio of adjusting the factor and the minimum adjustment factor in the binary tree node; compares with threshold value; If ratio is greater than
Figure 606017DEST_PATH_IMAGE001
; Then readjust seven positions of SVM in binary tree; With the adjustment factor maximum as root node; Inferior big child nodes as root node; By that analogy; Form new binary tree, if ratio less than
Figure 906417DEST_PATH_IMAGE001
, then keeps the structure of binary tree constant;
8) judgement sample whether all classification finish, if finish then to forward to step 3), if sample classification finishes, then whole assorting process finishes.
Above application example has been merely is convenient to public understanding technical scheme of the present invention; Be not to qualification of the present invention; Those skilled in the art should know; Under the situation that does not break away from the spirit and scope of the present invention, can also make various variations or be applied to different field, so all technical schemes that are equal to and all belong to protection scope of the present invention in the application of different field.

Claims (6)

1. the many sorting techniques of the SVM based on dynamic binary tree at first utilize a plurality of two classification SVM that train to construct the SVM multi-categorizer of binary tree structures, utilize the SVM multi-categorizer of being constructed that the test sample book collection is classified then; It is characterized in that the SVM multi-categorizer that said utilization is constructed is classified to the test sample book collection, specifically may further comprise the steps:
Step 1, first test sample book that test sample book is concentrated are imported the root node of said SVM multi-categorizer; And with in the SVM multi-categorizer each two the classification SVM the adjustment factor be initialized as 0; The classification number of success that is defined as this two classification SVM of the said adjustment factor and the ratio of classification total degree; The classification number of success is for be the number of+1 test sample book through this two classification SVM and output result, the classification total degree be meant through this two classify SVM the sum of test sample book;
Step 2, be empty node like present node, then assorting process finishes, and forwards step 4 to, otherwise, go to step 3;
Step 3, with treating classification samples and classify as the first two classification SVM; As to export the result be-1; Then dynamically adjust the adjustment factor, and this test sample book is inputed to as the pairing two classification SVM of the child node of the first two classification SVM, change step 2 then as the first two classification SVM according to the output result; If+1, then dynamically adjustment is when the adjustment factor of the first two classification SVM according to the output result, assorting process finishes, and goes to step 4;
Step 4, judge in the SVM multi-categorizer each two classification SVM maximal value and the ratio between the minimum value of the adjustment factor whether greater than a preset adjustment threshold value; In this way; Then readjust the binary tree structure of said SVM multi-categorizer according to following method: will adjust the big SVM of factor values to the adjustment of the root position of binary tree, and promptly adjust the maximum SVM of the factor as root node, inferior big SVM is as the child node of root node; By that analogy, set up new binary tree structure; As not, then keep the structure of binary tree constant;
Step 5, the next test sample book that test sample book is concentrated are imported the root node of said SVM multi-categorizer, and repeated execution of steps 2-step 4, concentrate all test sample books all to accomplish classification until test sample book.
2. a network alarm Forecasting Methodology is classified to the time series of alarm, and classification results is and predicts the outcome, and it is characterized in that, said time series to alarm is classified, and may further comprise the steps:
Steps A, one type of network alarm historical data is carried out vector extract and carry out pre-service, obtain the training sample of such network alarm;
The training sample that step B, utilization obtain is trained two classification SVM, obtains two classification SVM of such network alarm;
Step C, choose respectively repeating step A-step B of multiclass network alarm history data, obtain a plurality of two classification SVM that train;
A plurality of two classification SVM that train that step D, utilization obtain use the many sorting techniques of the said SVM based on dynamic binary tree of claim 1 that the time series of alarm is classified, and classification results is and predicts the outcome.
3. a P2P traffic classification method is used for type under the P2P flow is discerned, and it is characterized in that, may further comprise the steps:
Steps A, one type of P2P data on flows is carried out feature extraction, obtain the training sample of such P2P flow;
The training sample that step B, utilization obtain is trained two classification SVM, obtains two classification SVM of such P2P flow;
Step C, choose respectively repeating step A-step B of multiclass P2P data on flows, obtain a plurality of two classification SVM that train;
A plurality of two classification SVM that train that step D, utilization obtain use the many sorting techniques of the said SVM based on dynamic binary tree of claim 1 that the P2P data on flows is classified.
4. an image, semantic sorting technique is characterized in that, may further comprise the steps:
Steps A, one type of image is carried out semantic feature extract, obtain the training sample of such image;
The training sample that step B, utilization obtain is trained two classification SVM, obtains two classification SVM of such image;
Step C, choose respectively repeating step A-step B of multiclass image, obtain a plurality of two classification SVM that train;
A plurality of two classification SVM that train that step D, utilization obtain use the many sorting techniques of the said SVM based on dynamic binary tree of claim 1 that image is carried out semantic classification.
5. network attack detecting method through network packet is classified, judges whether that occurring network attacks, and it is characterized in that, said network packet is classified, and may further comprise the steps:
Steps A, one type of network attack data is carried out feature extraction, obtain the training sample of such network attack;
The training sample that step B, utilization obtain is trained two classification SVM, obtains two classification SVM of such network attack;
Step C, choose the known network attack data of multiclass repeating step A-step B respectively, obtain a plurality of two classification SVM that train;
A plurality of two classification SVM that train that step D, utilization obtain use the many sorting techniques of the said SVM based on dynamic binary tree of claim 1 that network packet is classified.
6. a Web page classification method is characterized in that, may further comprise the steps:
Steps A, one type of web data is carried out feature extraction, obtain the training sample of such webpage;
The training sample that step B, utilization obtain is trained two classification SVM, obtains two classification SVM of such webpage;
Step C, choose respectively repeating step A-step B of multiclass web data, obtain a plurality of two classification SVM that train;
A plurality of two classification SVM that train that step D, utilization obtain use the many sorting techniques of the said SVM based on dynamic binary tree of claim 1 that webpage is classified.
CN201210181550.9A 2012-06-05 2012-06-05 Multi-class support vector machine classification method based on dynamic binary tree Active CN102722726B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210181550.9A CN102722726B (en) 2012-06-05 2012-06-05 Multi-class support vector machine classification method based on dynamic binary tree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210181550.9A CN102722726B (en) 2012-06-05 2012-06-05 Multi-class support vector machine classification method based on dynamic binary tree

Publications (2)

Publication Number Publication Date
CN102722726A true CN102722726A (en) 2012-10-10
CN102722726B CN102722726B (en) 2014-01-15

Family

ID=46948476

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210181550.9A Active CN102722726B (en) 2012-06-05 2012-06-05 Multi-class support vector machine classification method based on dynamic binary tree

Country Status (1)

Country Link
CN (1) CN102722726B (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102930007A (en) * 2012-10-30 2013-02-13 广东电网公司 User power supply recovery emergency degree classification method in large area power failure emergency processing
CN103136372A (en) * 2013-03-21 2013-06-05 陕西通信信息技术有限公司 Method of quick location, classification and filtration of universal resource locator (URL) in network credibility behavior management
CN104820839A (en) * 2015-04-24 2015-08-05 深圳信息职业技术学院 Respective positive and negative example correct rate setting-based controllable confidence machine algorithm
CN105447504A (en) * 2015-11-06 2016-03-30 中国科学院计算技术研究所 Traffic mode behavior recognition method and corresponding recognition model construction method
CN105631474A (en) * 2015-12-26 2016-06-01 哈尔滨工业大学 Hyperspectral data multi-class method based on Jeffries-Matusita distance and class pair decision tree
CN105930872A (en) * 2016-04-28 2016-09-07 上海应用技术学院 Bus driving state classification method based on class-similar binary tree support vector machine
CN108090503A (en) * 2017-11-28 2018-05-29 东软集团股份有限公司 On-line tuning method, apparatus, storage medium and the electronic equipment of multi-categorizer
CN108351968A (en) * 2017-12-28 2018-07-31 深圳市锐明技术股份有限公司 It is a kind of for the alarm method of criminal activity, device, storage medium and server
CN108830302A (en) * 2018-05-28 2018-11-16 苏州大学 A kind of image classification method, training method, classification prediction technique and relevant apparatus
CN109582774A (en) * 2018-11-30 2019-04-05 北京羽扇智信息科技有限公司 Natural language classification method, device, equipment and storage medium
CN109767545A (en) * 2017-01-10 2019-05-17 中国人民银行印制科学技术研究所 The defect classification method and defect categorizing system of valuable bills
CN109981583A (en) * 2019-02-26 2019-07-05 重庆邮电大学 A kind of industry control network method for situation assessment
CN110715799A (en) * 2019-10-22 2020-01-21 中研新科智能电气有限公司 Method and device for detecting mechanical state of circuit breaker and terminal equipment
CN110930969A (en) * 2019-10-14 2020-03-27 科大讯飞股份有限公司 Background music determination method and related equipment
CN113360657A (en) * 2021-06-30 2021-09-07 安徽商信政通信息技术股份有限公司 Intelligent document distribution and handling method and device and computer equipment
CN114268365A (en) * 2021-12-02 2022-04-01 国网甘肃省电力公司酒泉供电公司 Intelligent communication optical cable early warning method and system based on visualization technology

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101329734A (en) * 2008-07-31 2008-12-24 重庆大学 License plate character recognition method based on K-L transform and LS-SVM
CN101853400A (en) * 2010-05-20 2010-10-06 武汉大学 Multiclass image classification method based on active learning and semi-supervised learning
CN101980251A (en) * 2010-11-23 2011-02-23 中国矿业大学 Remote sensing classification method for binary tree multi-category support vector machines
CN102013946A (en) * 2010-11-01 2011-04-13 大连理工大学 Method for correcting errors of support vector machine (SVM) classification for solving multi-classification problems

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101329734A (en) * 2008-07-31 2008-12-24 重庆大学 License plate character recognition method based on K-L transform and LS-SVM
CN101853400A (en) * 2010-05-20 2010-10-06 武汉大学 Multiclass image classification method based on active learning and semi-supervised learning
CN102013946A (en) * 2010-11-01 2011-04-13 大连理工大学 Method for correcting errors of support vector machine (SVM) classification for solving multi-classification problems
CN101980251A (en) * 2010-11-23 2011-02-23 中国矿业大学 Remote sensing classification method for binary tree multi-category support vector machines

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
谷胜伟: "基于赫夫曼树的SVM多分类器构造方法", 《滁州学院学报》 *

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102930007B (en) * 2012-10-30 2016-01-06 广东电网公司 User in large-area power-cuts emergency processing sends a telegram in reply urgency level sorting technique
CN102930007A (en) * 2012-10-30 2013-02-13 广东电网公司 User power supply recovery emergency degree classification method in large area power failure emergency processing
CN103136372A (en) * 2013-03-21 2013-06-05 陕西通信信息技术有限公司 Method of quick location, classification and filtration of universal resource locator (URL) in network credibility behavior management
CN104820839A (en) * 2015-04-24 2015-08-05 深圳信息职业技术学院 Respective positive and negative example correct rate setting-based controllable confidence machine algorithm
CN105447504A (en) * 2015-11-06 2016-03-30 中国科学院计算技术研究所 Traffic mode behavior recognition method and corresponding recognition model construction method
CN105447504B (en) * 2015-11-06 2019-04-02 中国科学院计算技术研究所 A kind of travel pattern Activity recognition method and corresponding identification model construction method
CN105631474B (en) * 2015-12-26 2019-01-11 哈尔滨工业大学 Based on Jeffries-Matusita distance and class to the more classification methods of the high-spectral data of decision tree
CN105631474A (en) * 2015-12-26 2016-06-01 哈尔滨工业大学 Hyperspectral data multi-class method based on Jeffries-Matusita distance and class pair decision tree
CN105930872A (en) * 2016-04-28 2016-09-07 上海应用技术学院 Bus driving state classification method based on class-similar binary tree support vector machine
CN109767545B (en) * 2017-01-10 2021-06-08 中钞印制技术研究院有限公司 Method and system for classifying defects of valuable bills
CN109767545A (en) * 2017-01-10 2019-05-17 中国人民银行印制科学技术研究所 The defect classification method and defect categorizing system of valuable bills
CN108090503A (en) * 2017-11-28 2018-05-29 东软集团股份有限公司 On-line tuning method, apparatus, storage medium and the electronic equipment of multi-categorizer
CN108351968B (en) * 2017-12-28 2022-04-22 深圳市锐明技术股份有限公司 Alarming method, device, storage medium and server for criminal activities
CN108351968A (en) * 2017-12-28 2018-07-31 深圳市锐明技术股份有限公司 It is a kind of for the alarm method of criminal activity, device, storage medium and server
CN108830302B (en) * 2018-05-28 2022-06-07 苏州大学 Image classification method, training method, classification prediction method and related device
CN108830302A (en) * 2018-05-28 2018-11-16 苏州大学 A kind of image classification method, training method, classification prediction technique and relevant apparatus
CN109582774A (en) * 2018-11-30 2019-04-05 北京羽扇智信息科技有限公司 Natural language classification method, device, equipment and storage medium
CN109981583B (en) * 2019-02-26 2021-09-24 重庆邮电大学 Industrial control network situation assessment method
CN109981583A (en) * 2019-02-26 2019-07-05 重庆邮电大学 A kind of industry control network method for situation assessment
CN110930969A (en) * 2019-10-14 2020-03-27 科大讯飞股份有限公司 Background music determination method and related equipment
CN110930969B (en) * 2019-10-14 2024-02-13 科大讯飞股份有限公司 Background music determining method and related equipment
CN110715799B (en) * 2019-10-22 2021-05-11 中研新科智能电气有限公司 Method and device for detecting mechanical state of circuit breaker and terminal equipment
CN110715799A (en) * 2019-10-22 2020-01-21 中研新科智能电气有限公司 Method and device for detecting mechanical state of circuit breaker and terminal equipment
CN113360657A (en) * 2021-06-30 2021-09-07 安徽商信政通信息技术股份有限公司 Intelligent document distribution and handling method and device and computer equipment
CN113360657B (en) * 2021-06-30 2023-10-24 安徽商信政通信息技术股份有限公司 Intelligent document distribution handling method and device and computer equipment
CN114268365A (en) * 2021-12-02 2022-04-01 国网甘肃省电力公司酒泉供电公司 Intelligent communication optical cable early warning method and system based on visualization technology

Also Published As

Publication number Publication date
CN102722726B (en) 2014-01-15

Similar Documents

Publication Publication Date Title
CN102722726B (en) Multi-class support vector machine classification method based on dynamic binary tree
Shapira et al. FlowPic: A generic representation for encrypted traffic classification and applications identification
CN109063745B (en) Network equipment type identification method and system based on decision tree
CN101714952B (en) Method and device for identifying traffic of access network
WO2018054342A1 (en) Method and system for classifying network data stream
CN102035698B (en) HTTP tunnel detection method based on decision tree classification algorithm
CN105871832A (en) Network application encrypted traffic recognition method and device based on protocol attributes
CN104052639B (en) Real-time multi-application network flow identification method based on support vector machine
Liu et al. Mobile app traffic flow feature extraction and selection for improving classification robustness
CN105516020B (en) A kind of parallel network flow sorting technique based on ontology knowledge reasoning
CN111107077B (en) SVM-based attack flow classification method
Saber et al. Online data center traffic classification based on inter-flow correlations
Yang et al. Smiler: Towards practical online traffic classification
Chung et al. An effective similarity metric for application traffic classification
Min et al. Online Internet traffic identification algorithm based on multistage classifier
Liu et al. Dynamic traffic classification algorithm and simulation of energy Internet of things based on machine learning
CN103281291A (en) Application layer protocol identification method based on Hadoop
Wang et al. Internet traffic classification using machine learning: a token-based approach
CN108494620A (en) Network service flow feature selecting and sorting technique based on multiple target Adaptive evolvement arithmetic
Huang et al. Internet traffic classification based on min-max ensemble feature selection
CN111447169B (en) Method and system for identifying malicious webpage in real time on gateway
CN114358177B (en) Unknown network traffic classification method and system based on multidimensional feature compact decision boundary
Dong et al. Traffic classification model based on integration of multiple classifiers
Hurley et al. Classifying network protocols: a ‘two-way’flow approach
Perepelkin et al. Problem of network traffic classification in multiprovider cloud infrastructures based on machine learning methods

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant