CN102722726B - Multi-class support vector machine classification method based on dynamic binary tree - Google Patents

Multi-class support vector machine classification method based on dynamic binary tree Download PDF

Info

Publication number
CN102722726B
CN102722726B CN201210181550.9A CN201210181550A CN102722726B CN 102722726 B CN102722726 B CN 102722726B CN 201210181550 A CN201210181550 A CN 201210181550A CN 102722726 B CN102722726 B CN 102722726B
Authority
CN
China
Prior art keywords
svm
classification
steps
binary tree
binary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210181550.9A
Other languages
Chinese (zh)
Other versions
CN102722726A (en
Inventor
韦磊
朱红
程春玲
王亚石
隋宗见
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Post and Telecommunication University
Nanjing University of Posts and Telecommunications
Nanjing Power Supply Co of Jiangsu Electric Power Co
Original Assignee
Nanjing Post and Telecommunication University
Nanjing Power Supply Co of Jiangsu Electric Power Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University, Nanjing Power Supply Co of Jiangsu Electric Power Co filed Critical Nanjing Post and Telecommunication University
Priority to CN201210181550.9A priority Critical patent/CN102722726B/en
Publication of CN102722726A publication Critical patent/CN102722726A/en
Application granted granted Critical
Publication of CN102722726B publication Critical patent/CN102722726B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a multi-class support vector machine classification method based on a dynamic binary tree, which belongs to the technical field of data mining. The method is characterized in that a plurality of dichotomous SVMs are utilized to form a multi-class SVM classifier with a binary tree structure, the binary tree structure is dynamically adjusted according to the classification result of all the dichotomous SVMs during the classification process, the dichotomous SVMs with higher classification success rate are adjusted to the root of the binary tree structure, accordingly, the classification success rate of the early stage is improved, the number of the dichotomous SVMs which pass by a single sample is reduced, and the classification speed is effectively increased while the classification accuracy is ensured. The invention further discloses a network alarm prediction method, a peer-to-peer (P2P) traffic classification method, an image semantic classification method, a network attack detection method and a webpage classification method which adopt the multi-classification method.

Description

The many sorting techniques of a kind of SVM based on dynamical binary-tree based
Technical field
The present invention relates to a kind of SVM(Support Vector Machine, support vector machine) many sorting techniques, relate in particular to the many sorting techniques of a kind of SVM based on dynamical binary-tree based, belong to data mining technology field.
Background technology
Support vector machine designs for two class classification problems at first, and in actual applications, many classification problems are more general.How the premium properties of support vector machine is generalized in the middle of multicategory classification, becomes a hot issue of current support vector machine research.
The many sorting techniques of support vector machine that exist at present can be divided into two classes: class methods are on all training samples, to solve a large quadratic programming problem, separately by a plurality of classes simultaneously.These class methods are fairly simple theoretically, greatly increase, thereby its training time are longer but solve so large its computation complexity of multiclass quadratic programming.Equations of The Second Kind method is structure and carries out multicategory classification in conjunction with a plurality of two class classification problems.These class methods comprise a class to remaining class method, a class to a class method, decision Binary Tree method and the acyclic figure method of decision-directed etc.
One class is to remaining class method (One Versus Rest, OVR) be one of method of being most widely used at present, the steps include: to construct K two class classifiers (establishing total K classification), wherein i support vector machine be take i class sample as positive class sample, and all the other samples are negative class sample training.
During differentiation, input sample obtains K output valve altogether through K vector machine, if having one+1 to occur, its corresponding classification is input sample class; If neither one+1 output, proves that input vector does not belong to a kind of in this K type, and belongs to other types, the failure of this subseries.
The advantage of OVR method is: only need to train K two class classifiers, (K) is less for the number of resulting classification function, and in the situation that sample size is little, training speed is very fast.
The shortcoming of OVR method is: 1, the training of each two class vector machine is using whole samples as training sample, and this need to solve K containing the quadratic programming problem of whole variablees.Because the training speed of support vector machine sharply slows down along with the increase of number of training, therefore, the OVR method training time is longer; 2. the increase classification speed along with sample size also can sharply slow down, and from method of discrimination, each sample must pass through each support vector machine, has reduced so greatly classification speed.
Summary of the invention
Technical matters to be solved by this invention is to overcome sorting technique existing the increasing along with sample size of tradition based on support vector machine, the deficiency that classification speed sharply declines, the many sorting techniques of a kind of SVM based on dynamical binary-tree based are provided, in the situation that not affecting classification accuracy, effectively improve polytypic speed, expanded the range of application of the many sorting techniques based on support vector machine.
The present invention specifically solves the problems of the technologies described above by the following technical solutions.
The many sorting techniques of SVM of dynamical binary-tree based, first utilize a plurality of two classification SVM that train to construct the SVM multi-categorizer of binary tree structures, then utilize the SVM multi-categorizer of constructing to classify to test sample book collection; The SVM multi-categorizer that described utilization is constructed is classified to test sample book collection, specifically comprises the following steps:
Step 1, first test sample book that test sample book is concentrated are inputted the root node of described SVM multi-categorizer, and the adjustment factor of each two classification SVM in SVM multi-categorizer is initialized as to 0, the described adjustment factor is defined as the classification number of success of this two classification SVM and the ratio of classification total degree, classification number of success is for passing through the number of the test sample book that this two classification SVM and Output rusults are+1, and classification total degree refers to by the sum of the test sample book of this two classification SVM;
Step 2, if present node is empty node, assorting process finishes, and forwards step 4 to, otherwise, go to step 3;
Step 3, with treating classification samples and classify as the first two classification SVM, if Output rusults is-1, according to Output rusults, dynamically adjust the adjustment factor as the first two classification SVM, and this test sample book is inputed to the corresponding two classification SVM of child node as the first two classification SVM, then go to step 2; If+1, according to Output rusults, dynamically adjust the adjustment factor as the first two classification SVM, assorting process finishes, and goes to step 4;
Step 4, judge in SVM multi-categorizer whether maximal value and the ratio between minimum value of the adjustment factor of each two classification SVM are greater than a default adjustment threshold value, in this way, readjust in accordance with the following methods the binary tree structure of described SVM multi-categorizer: SVM that factor values the is large root position adjustment to binary tree will be adjusted, adjust the SVM of factor maximum as root node, inferior large SVM is as the child node of root node, by that analogy, set up new binary tree structure; As no, keep the structure of binary tree constant;
Step 5, the next test sample book that test sample book is concentrated are inputted the root node of described SVM multi-categorizer, and repeated execution of steps 2-step 4, until test sample book concentrates all test sample books all to complete classification.
Many sorting techniques of the present invention can be widely used in the data mining in each field, for example:
A Forecasting Methodology, classifies to the time series of alarm, and classification results is and predicts the outcome, and the described time series to alarm is classified, and comprises the following steps:
Steps A, a class network alarm historical data is carried out to vector extracts and carry out pre-service, obtain the training sample of such network alarm;
The training sample that step B, utilization obtain is trained two classification SVM, obtains two classification SVM of such network alarm;
Step C, choose multiclass network alarm history data and respectively repeat steps A-step B, obtain a plurality of two classification SVM that train;
A plurality of two classification SVM that train that step D, utilization obtain, are used the many sorting techniques of the above-mentioned SVM based on dynamical binary-tree based to classify to the time series of alarm, and classification results is and predicts the outcome.
A traffic classification method, for type under P2P flow is identified, comprises the following steps:
Steps A, a class P2P data on flows is carried out to feature extraction, obtain the training sample of such P2P flow;
The training sample that step B, utilization obtain is trained two classification SVM, obtains two classification SVM of such P2P flow;
Step C, choose multiclass P2P data on flows and respectively repeat steps A-step B, obtain a plurality of two classification SVM that train;
A plurality of two classification SVM that train that step D, utilization obtain, are used the many sorting techniques of the above-mentioned SVM based on dynamical binary-tree based to classify to P2P data on flows.
A sorting technique, comprises the following steps:
Steps A, a class image is carried out to semantic feature extraction, obtain the training sample of such image;
The training sample that step B, utilization obtain is trained two classification SVM, obtains two classification SVM of such image;
Step C, choose multiclass image and respectively repeat steps A-step B, obtain a plurality of two classification SVM that train;
A plurality of two classification SVM that train that step D, utilization obtain, are used the many sorting techniques of the above-mentioned SVM based on dynamical binary-tree based to carry out semantic classification to image.
, by network packet is classified, judge whether to occur network attack, described network packet is classified, comprise the following steps:
Steps A, class network attack data are carried out to feature extraction, obtain the training sample of such network attack;
The training sample that step B, utilization obtain is trained two classification SVM, obtains two classification SVM of such network attack;
Step C, choose the known network attack data of multiclass and respectively repeat steps A-step B, obtain a plurality of two classification SVM that train;
A plurality of two classification SVM that train that step D, utilization obtain, are used the many sorting techniques of the above-mentioned SVM based on dynamical binary-tree based to classify to network packet.
, comprise the following steps:
Steps A, a class web data is carried out to feature extraction, obtain the training sample of such webpage;
The training sample that step B, utilization obtain is trained two classification SVM, obtains two classification SVM of such webpage;
Step C, choose multiclass web data and respectively repeat steps A-step B, obtain a plurality of two classification SVM that train;
A plurality of two classification SVM that train that step D, utilization obtain, are used the many sorting techniques of the above-mentioned SVM based on dynamical binary-tree based to classify to webpage.
Many sorting techniques of the present invention utilize a plurality of two classification SVM to form the SVM multi-categorizer of binary tree structure, and according to the classification results of each two classification SVM, dynamically adjust binary tree structure in assorting process, by being categorized into the two classification SVM that power is higher, be adjusted to the root of binary tree structure, thereby improve the successful probability of early stage classification, and reduce single sample the quantity of two classification SVM of process, when guaranteeing classification accuracy, effectively improved classification speed.
Accompanying drawing explanation
Fig. 1 is the structure of the SVM multi-categorizer of binary tree structure;
Fig. 2 is the structure of the support vector machine that defines in the present invention;
Fig. 3 is the process flow diagram that the present invention is based on the many sorting techniques of SVM of dynamical binary-tree based.
Embodiment
Below in conjunction with accompanying drawing, technical scheme of the present invention is elaborated:
The object of the invention is to solve the low problem of classification speed that existing many sorting techniques based on support vector machine exist.Consider that each sample can only belong to a classification in concrete applied environment, so when the binary tree queue of K vector machine formation of a sample process, if Output rusults is+1, can, without remaining vector machine, can save the plenty of time like this.Based on this thought, the support vector machine of often output+1 can be transferred to queue first half, can save portion of time so again, improve the speed of classification.
Based on above thought, the present invention is at sorting phase, first a plurality of two classification SVM that train are combined into a binary tree, the structure of binary tree as shown in Figure 1, in each support vector machine, increase by one and adjust factor attribute simultaneously, the described adjustment factor is defined as the classification number of success of this two classification SVM and the ratio of classification total degree, classification number of success is for passing through the number of the test sample book that this two classification SVM and Output rusults are+1, and classification total degree refers to by the sum of the test sample book of this two classification SVM.In the process of classification, according to the classification results of each two classification SVM, constantly adjust the size that it adjusts the factor, when each two classification maximal value of the adjustment factor of SVM and the ratio of minimum value are greater than default threshold value, structure to binary tree re-starts adjustment, the two classification SVM that adjust factor maximum are put into the root node of binary tree, its child node of second largest conduct, by that analogy, form a new binary tree.After adjustment, proceed the assorting process of subsequent samples.
The many sorting techniques of SVM based on dynamical binary-tree based of the present invention, first utilize the SVM multi-categorizer of a plurality of two classification SVM structure binary tree structures that train, wherein said two classification SVM have increased by three attributes on the architecture basics of traditional support vector machine, be respectively classification number of success, classification total degree and adjust factor attribute, classification number of success is for passing through the vectorial number that this SVM and Output rusults are+1, and classification total degree refers to the vectorial sum by this SVM.The attribute structure of this two classification SVM as shown in Figure 2.
Then utilize the SVM multi-categorizer of constructing to classify to test sample book collection, specifically comprise the following steps:
Step 1, first test sample book that test sample book is concentrated are inputted the root node of described SVM multi-categorizer, and the adjustment factor of each two classification SVM in SVM multi-categorizer is initialized as to 0;
Step 2, if present node is empty node, assorting process finishes, and forwards step 4 to, otherwise, go to step 3;
Step 3, with treating classification samples and classify as the first two classification SVM, if Output rusults is-1, according to Output rusults, dynamically adjust the adjustment factor as the first two classification SVM, and this test sample book is inputed to the corresponding two classification SVM of child node as the first two classification SVM, then go to step 2; If+1, according to Output rusults, dynamically adjust the adjustment factor as the first two classification SVM, assorting process finishes, and goes to step 4;
Step 4, judge in SVM multi-categorizer whether maximal value and the ratio between minimum value of the adjustment factor of each two classification SVM are greater than a default adjustment threshold value
Figure 2012101815509100002DEST_PATH_IMAGE001
in this way, readjust in accordance with the following methods the binary tree structure of described SVM multi-categorizer: SVM that factor values the is large root position adjustment to binary tree will be adjusted, adjust the SVM of factor maximum as root node, inferior large SVM is as the child node of root node, by that analogy, set up new binary tree structure; As no, keep the structure of binary tree constant;
Step 5, the next test sample book that test sample book is concentrated are inputted the root node of described SVM multi-categorizer, and repeating step execution step 2-step 4, until test sample book concentrates all test sample books all to complete classification.
The flow process of the many sorting techniques of SVM based on dynamical binary-tree based of the present invention as shown in Figure 3.
In order to make the public further understand technical scheme of the present invention, the application example of lifting several different field below describes.
Application example 1, network alarm prediction:
A prediction that is applied as network alarm of SVM, utilize the classification feature of SVM, time series to alarm is classified, classification results is and predicts the outcome, because a SVM can only classify, predict whether the alarm of a type exists, and in network, need polytype alarm to predict, so need training obtain a plurality of SVM, and a plurality of SVM are organized into a binary tree structure, solve polytypic problem.Suppose in a two-way cable tv network system, main management HFC equipment, optical network device (Optical Network Unit, ONU) and MoCA(Multimedia over Coax Alliance) head end and MoCA terminal, wherein may there is the alarm of five types: optical fiber cable termination equipment (Optical Line Terminal, OLT) dropout, equipment packet loss are too high, link failure, head end goes offline and terminal disconnection.This example is predicted this alarm of five types, needs to set up five SVM.
The detailed process of network alarm prediction is as follows:
1, the learning phase of SVM:
1) alarm record is carried out to vector and extract, obtain the vector that target alarms is OLT dropout, then it is carried out to pre-service, form training sample, resulting training sample is carried out to training study, generation can sort out the SVM of OLT dropout;
2) whether repeat said process, obtaining respectively four is the SVM of certain type of alarm;
2, the many sorting phases of the SVM based on dynamical binary-tree based:
1) the adjustment factor of five SVM is all initialized as to 0;
2) according to training, obtain the order of SVM, construct initial binary tree, root node is OLT dropout SVM, and its child nodes is the too high SVM of equipment packet loss, is respectively by that analogy link failure SVM, head end go offline SVM and terminal disconnection SVM;
3) the next sample in test sample book is passed through to this binary tree, vector is first by root node OLT dropout SVM;
4) judge whether present node is sky node, node if it is empty, current vector classification finishes, and goes to step 7), otherwise go to step 5);
5) with SVM corresponding to present node, this vector is carried out to two classification, and revise the value of the adjustment factor of current SVM;
6) whether the Output rusults that judges current SVM is+1, if+1, show that current sample successfully classifies, go to step 7), if-1, this sample continues by the child nodes of present node, goes to step 4);
7) calculate the maximum factor and the minimum ratio of adjusting the factor adjusted in binary tree node, with threshold value
Figure 218286DEST_PATH_IMAGE001
compare, if ratio is greater than
Figure 547242DEST_PATH_IMAGE001
, readjust five SVM in binary tree Zhong position, using adjust factor maximum as root node, the inferior large child nodes as root node, by that analogy, forms new binary tree, if ratio is less than , keep the structure of binary tree constant;
8) vector in judgement sample whether all classification finish, if do not finish to forward to step 3), if sample classification finishes, whole assorting process finishes.
Application example 2, P2P traffic classification:
Peer-to-peer network (Peer to Peer, P2P) by directly connecting and share, transmit resource between peer node, have that resource utilization is high, server load is little, eliminated the advantages such as server bottleneck, thereby obtained widespread use at aspects such as Streaming Media, instant messaging, file-sharing, online game, search engine and collaborative works.But meanwhile, P2P business has also consumed Internet resources excessively, even causes network congestion.In order to guarantee the normal orderly operation of network, need to effectively identify various types of P2P flows and take corresponding operating strategy.The inventive method can be deployed in access gateway, core router or its bypass in network, is organized into dynamical binary-tree based the network traffics that gather from access gateway or router are classified by a plurality of SVM.Suppose to have gathered the five-tuple (source of network data flow on the core router of LAN (Local Area Network), object IP address, source, destination slogan, agreement) and main traffic statistics feature, comprise (the mean square deviation that data package size changes, main quilt several ratio that is dynamically connected, the ratio of up-downgoing flow), form the sample data of data stream, and apply the sample data stream that the inventive method collects the core router from LAN (Local Area Network) and classify, identify non-P2P flow and BitTorrent, PPLive, UUsee, Thunder, six kinds of conventional P2P flows of MSN and Skype, need to set up altogether seven SVM.
The detailed process of P2P traffic classification is as follows:
1, the learning phase of SVM:
1) P2P data stream is carried out to data pre-service, extract five-tuple and main attributive character thereof, form training sample, with resulting training sample, a SVM is trained, generate a SVM that can sort out P2P flow.
2) repeat said process, BitTorrent flow, PPLive flow, UUsee flow, Thunder flow, MSN flow and Skype flow are obtained respectively to one and can judge whether data stream is the SVM of this type of flow.
2, the many sorting phases of the SVM based on dynamical binary-tree based:
1) the adjustment factor of seven SVM is all initialized as to 0;
2) according to training, obtain the order of SVM, construct initial binary tree, root node is P2P traffic classification SVM, its child nodes is BitTorrent traffic classification SVM, is respectively by that analogy PPLive traffic classification SVM, UUsee traffic classification SVM, Thunder traffic classification SVM, MSN traffic classification SVM and Skype traffic classification SVM;
3) the next sample in test sample book is passed through to this binary tree, vector is first by P2P traffic classification SVM;
4) judge whether present node is sky node, node if it is empty, current sample classification finishes, and goes to step 7), otherwise go to step 5);
5) with SVM corresponding to present node, this sample is carried out to two classification, and revise the value of the adjustment factor of current SVM;
6) whether the Output rusults that judges current SVM is+1, if+1, show that current sample successfully classifies, go to step 7), if-1, current sample continues by the child nodes of present node, goes to step 4);
7) calculate the maximum factor and the minimum ratio of adjusting the factor adjusted in binary tree node, with threshold value
Figure 673647DEST_PATH_IMAGE001
compare, if ratio is greater than
Figure 440877DEST_PATH_IMAGE001
, readjust seven SVM in binary tree Zhong position, using adjust factor maximum as root node, the inferior large child nodes as root node, by that analogy, forms new binary tree, if ratio is less than , keep the structure of binary tree constant;
8) judgement sample whether all classification finish, if do not finish to forward to step 3), if sample classification finishes, whole assorting process finishes.
Application example 3, image, semantic classification:
Image is multimedia a kind of main forms, and it is an important content in CBIR that image data base is divided into significant semantic classes.SVM algorithm is because the training sample of needs is few, and good classification effect, is widely used in image, semantic classification.This application example carries out semantic classification to the picture of Corel image Datasets (http://corel.digitalriver.com/), and Corel data set comprises 10 classifications of horse, snow mountain, food on aborigines, sea, building, bus, dinosaur, elephant ,Hua, grassland.Application the inventive method first learns to set up a SVM by the samples pictures of mark in each classification, then the SVM after study is set up to dynamical binary-tree based not mark samples pictures to be sorted is carried out to Fast Classification.The characteristics of image (being the input feature vector of sorter) of classification comprises 64 dimension color characteristics and 18 dimension textural characteristics.
The detailed process of image, semantic classification is as follows:
1, the learning phase of SVM:
1) aborigines' samples pictures collection is carried out to pre-service, the 82 dimension characteristics of image that extract picture form training sample, set up a SVM and learn with formed training sample, and generation can sort out aborigines' SVM;
2) repeat said process, with the samples pictures training of each classification, practice a SVM respectively, whether nine types of horse, snow mountain, foods on sea, building, bus, dinosaur, elephant ,Hua, grassland to judge image be the SVM of certain type if being obtained respectively to one, totally nine SVM.
2, the many sorting phases of the SVM based on dynamical binary-tree based:
1) the adjustment factor of ten SVM is all initialized as to 0;
2) according to training, obtain the order of SVM, construct initial binary tree, root node is aborigines' classification SVM, and the classification SVM that its child nodes is sea is respectively the classification SVM of horse on building, bus, dinosaur, elephant ,Hua, grassland, snow mountain, food by that analogy;
3) the next sample that the test sample book characteristics of image by not marking picture being formed is concentrated is by this binary tree, and vector is first by aborigines' classification SVM;
4) judge whether present node is sky node, node if it is empty, current sample classification finishes, and goes to step 7), otherwise go to step 5);
5) with SVM corresponding to present node, this sample is carried out to two classification, and revise the value of the adjustment factor of current SVM;
6) whether the Output rusults that judges current SVM is+1, if+1, show that current sample successfully classifies, go to step 7), if-1, current sample continues by the child nodes of present node, goes to step 4);
7) calculate the maximum factor and the minimum ratio of adjusting the factor adjusted in binary tree node, with threshold value
Figure 835135DEST_PATH_IMAGE001
compare, if ratio is greater than
Figure 723457DEST_PATH_IMAGE001
, readjust seven SVM in binary tree Zhong position, using adjust factor maximum as root node, the inferior large child nodes as root node, by that analogy, forms new binary tree, if ratio is less than
Figure 291448DEST_PATH_IMAGE001
, keep the structure of binary tree constant;
8) judgement sample whether all classification finish, if do not finish to forward to step 3), if sample classification finishes, whole assorting process finishes.
Application example 4, network attack detection:
At the crucial entrance of the data stream such as switch, ingress router or fire wall inside of a network, monitor all packets in this network segment and use the inventive method to classify to the network packet of catching, judge whether to occur abnormal invasion.If the Network data set having collected (as: the network invasion monitoring data set KDDCUP99 that MIT Lincoln laboratory is collected) comprises 41 dimension attributes that relate to essential characteristic, content characteristic, traffic characteristic and host-flow measure feature four category features, wherein there are 34 connection attributes and 7 categorical attributes, need therefrom detect Denial of Service attack (Denial of Service, DoS), (Probing) attacked in detection, user obtains super authority and attacks (User to Root, U2R) and remote network user attack (Remote to Local, R2L) four kinds of common network attacks.Apply the inventive method and first each attack data is learnt to set up a SVM, then the SVM after study is set up to dynamical binary-tree based structure sample data bag is carried out to Fast Classification.
The detailed process of network invasion monitoring is as follows:
1, the learning phase of SVM:
1) Network data set is carried out to data pre-service, 34 connection attributes extracting DoS attack data centralization form training sample, set up a SVM and learn with formed training sample, and generation can sort out the SVM of DoS attack;
2) repeat said process, with each, attack a SVM of data set training respectively, Probing, U2R and R2L type are attacked and obtained respectively one whether can judge be the SVM of certain attack.
2, the many sorting phases of the SVM based on dynamical binary-tree based:
1) the adjustment factor of four SVM is all initialized as to 0;
2) according to training, obtain the order of SVM, construct initial binary tree, root node is DoS classification SVM, and its child nodes is Probing classification SVM, is respectively by that analogy U2R classification SVM and R2L classification SVM;
3) by the next sample in test sample book by this binary tree, vector is first by the DoS SVM that classifies;
4) judge whether present node is sky node, node if it is empty, current sample classification finishes, and goes to step 7), otherwise go to step 5);
5) with SVM corresponding to present node, this sample is carried out to two classification, and revise the value of the adjustment factor of current SVM;
6) whether the Output rusults that judges current SVM is+1, if+1, show that current sample successfully classifies, go to step 7), if-1, current sample continues by the child nodes of present node, goes to step 4);
7) calculate the maximum factor and the minimum ratio of adjusting the factor adjusted in binary tree node, with threshold value
Figure 581615DEST_PATH_IMAGE001
compare, if ratio is greater than
Figure 711114DEST_PATH_IMAGE001
, readjust seven SVM in binary tree Zhong position, using adjust factor maximum as root node, the inferior large child nodes as root node, by that analogy, forms new binary tree, if ratio is less than
Figure 86732DEST_PATH_IMAGE001
, keep the structure of binary tree constant;
8) judgement sample whether all classification finish, if do not finish to forward to step 3), if sample classification finishes, whole assorting process finishes.
Application example 5, Web page classifying:
The universal rapidly of Internet makes network become the main source of people's acquisition of information.And automatic webpage classification is a kind of important technology of effective processing magnanimity Web information, can help user from the webpage of magnanimity, to obtain quickly and accurately information needed.It refers to for webpage to be sorted according to its content, by computing machine, according to certain Algorithms for Automatic Classification, webpage is divided into the classification pre-defining.The inventive method can realize fast automatic classification to webpage.This application example is downloaded (as 2000) webpage of some from Internet, be divided into by hand finance and economics, physical culture, military affairs, science and technology, culture totally 5 classifications.Wherein, choose a part of webpage (as 1500) as training set, all the other webpages (500) are as sample set.Before classifying by the inventive method, need first to webpage, carry out feature extraction, that is: first hypertext is carried out to home page filter, obtain Web page text, hypertext markup and hyperlinked information.Again Web page text is carried out to word segmentation processing, and with TF-IDF(term frequency – inverse document frequency) characteristic representation vector form that body text is expressed as being comprised of entry, if whole feature sums of all web page texts are n, form the vector space of n dimension, wherein each web page text be represented as a n dimensional vector ( w 1 , w 2 ..., w n ), the component of vector on every one dimension is to should the weights of feature in this page.Then, use the inventive method to learn to set up a SVM to the webpage training set of each classification, then the SVM after study is set up to dynamical binary-tree based structure webpage sample set is carried out to Fast Classification.
The detailed process of Web page classifying is as follows:
1, the learning phase of SVM:
1) webpage training set is carried out to data pre-service, the n dimensional feature vector that extracts each webpage in finance and economic webpage forms training sample, sets up a SVM and learns with formed training sample, and generation can sort out the SVM of finance and economic webpage;
2) repeating said process, practice a SVM respectively with the webpage training of each classification, whether physical culture, military affairs, science and technology, cultural these four classifications to judge webpage be the SVM of certain type if being obtained respectively to one.
2, the many sorting phases of the SVM based on dynamical binary-tree based:
1) the adjustment factor of five SVM is all initialized as to 0;
2) according to training, obtain the order of SVM, construct initial binary tree, root node is finance and economics classification SVM, and its child nodes is classification sports SVM, is respectively by that analogy military, scientific and technological and culture classification SVM;
3) the next sample in test sample book is passed through to this binary tree, vector is first by determining whether the SVM of finance and economic webpage;
4) judge whether present node is sky node, node if it is empty, current sample classification finishes, and goes to step 7), otherwise go to step 5);
5) with SVM corresponding to present node, this sample is carried out to two classification, and revise the value of the adjustment factor of current SVM;
6) whether the Output rusults that judges current SVM is+1, if+1, show that current sample successfully classifies, go to step 7), if-1, current sample continues by the child nodes of present node, goes to step 4);
7) calculate the maximum factor and the minimum ratio of adjusting the factor adjusted in binary tree node, with threshold value
Figure 195764DEST_PATH_IMAGE001
compare, if ratio is greater than
Figure 606017DEST_PATH_IMAGE001
, readjust seven SVM in binary tree Zhong position, using adjust factor maximum as root node, the inferior large child nodes as root node, by that analogy, forms new binary tree, if ratio is less than , keep the structure of binary tree constant;
8) judgement sample whether all classification finish, if do not finish to forward to step 3), if sample classification finishes, whole assorting process finishes.
Above application example is only for the ease of public understanding technical scheme of the present invention; it is not limitation of the invention; those skilled in the art should know; without departing from the spirit and scope of the present invention; therefore can also make various variations or be applied to different field, all technical schemes that are equal to and all belong to protection scope of the present invention in the application of different field.

Claims (6)

1. the many sorting techniques of the SVM based on dynamical binary-tree based, first utilize a plurality of two classification SVM that train to construct the SVM multi-categorizer of binary tree structures, then utilize the SVM multi-categorizer of constructing to classify to test sample book collection; It is characterized in that, the SVM multi-categorizer that described utilization is constructed is classified to test sample book collection, specifically comprises the following steps:
Step 1, first test sample book that test sample book is concentrated are inputted the root node of described SVM multi-categorizer, and the adjustment factor of each two classification SVM in SVM multi-categorizer is initialized as to 0, the described adjustment factor is defined as the classification number of success of this two classification SVM and the ratio of classification total degree, classification number of success is for passing through the number of the test sample book that this two classification SVM and Output rusults are+1, and classification total degree refers to by the sum of the test sample book of this two classification SVM;
Step 2, if present node is empty node, assorting process finishes, and forwards step 4 to, otherwise, go to step 3;
Step 3, with treating classification samples and classify as the first two classification SVM, if Output rusults is-1, according to Output rusults, dynamically adjust the adjustment factor as the first two classification SVM, and this test sample book is inputed to the corresponding two classification SVM of child node as the first two classification SVM, then go to step 2; If+1, according to Output rusults, dynamically adjust the adjustment factor as the first two classification SVM, assorting process finishes, and goes to step 4;
Step 4, judge in SVM multi-categorizer whether maximal value and the ratio between minimum value of the adjustment factor of each two classification SVM are greater than a default adjustment threshold value, in this way, readjust in accordance with the following methods the binary tree structure of described SVM multi-categorizer: SVM that factor values the is large root position adjustment to binary tree will be adjusted, adjust the SVM of factor maximum as root node, inferior large SVM is as the child node of root node, by that analogy, set up new binary tree structure; As no, keep the structure of binary tree constant;
Step 5, the next test sample book that test sample book is concentrated are inputted the root node of described SVM multi-categorizer, and repeated execution of steps 2-step 4, until test sample book concentrates all test sample books all to complete classification.
2. a network alarm Forecasting Methodology, classifies to the time series of alarm, and classification results is and predicts the outcome, and it is characterized in that, the described time series to alarm is classified, and comprises the following steps:
Steps A, a class network alarm historical data is carried out to vector extracts and carry out pre-service, obtain the training sample of such network alarm;
The training sample that step B, utilization obtain is trained two classification SVM, obtains two classification SVM of such network alarm;
Step C, choose multiclass network alarm history data and respectively repeat steps A-step B, obtain a plurality of two classification SVM that train;
A plurality of two classification SVM that train that step D, utilization obtain, right to use requires the many sorting techniques of SVM based on dynamical binary-tree based described in 1 to classify to the time series of alarm, and classification results is and predicts the outcome.
3. a P2P traffic classification method, for type under P2P flow is identified, is characterized in that, comprises the following steps:
Steps A, a class P2P data on flows is carried out to feature extraction, obtain the training sample of such P2P flow;
The training sample that step B, utilization obtain is trained two classification SVM, obtains two classification SVM of such P2P flow;
Step C, choose multiclass P2P data on flows and respectively repeat steps A-step B, obtain a plurality of two classification SVM that train;
A plurality of two classification SVM that train that step D, utilization obtain, right to use requires the many sorting techniques of SVM based on dynamical binary-tree based described in 1 to classify to P2P data on flows.
4. an image, semantic sorting technique, is characterized in that, comprises the following steps:
Steps A, a class image is carried out to semantic feature extraction, obtain the training sample of such image;
The training sample that step B, utilization obtain is trained two classification SVM, obtains two classification SVM of such image;
Step C, choose multiclass image and respectively repeat steps A-step B, obtain a plurality of two classification SVM that train;
A plurality of two classification SVM that train that step D, utilization obtain, right to use requires the many sorting techniques of SVM based on dynamical binary-tree based described in 1 to carry out semantic classification to image.
5. a network attack detecting method, by network packet is classified, judges whether to occur network attack, it is characterized in that, described network packet is classified, and comprises the following steps:
Steps A, class network attack data are carried out to feature extraction, obtain the training sample of such network attack;
The training sample that step B, utilization obtain is trained two classification SVM, obtains two classification SVM of such network attack;
Step C, choose the known network attack data of multiclass and respectively repeat steps A-step B, obtain a plurality of two classification SVM that train;
A plurality of two classification SVM that train that step D, utilization obtain, right to use requires the many sorting techniques of SVM based on dynamical binary-tree based described in 1 to classify to network packet.
6. a Web page classification method, is characterized in that, comprises the following steps:
Steps A, a class web data is carried out to feature extraction, obtain the training sample of such webpage;
The training sample that step B, utilization obtain is trained two classification SVM, obtains two classification SVM of such webpage;
Step C, choose multiclass web data and respectively repeat steps A-step B, obtain a plurality of two classification SVM that train;
A plurality of two classification SVM that train that step D, utilization obtain, right to use requires the many sorting techniques of SVM based on dynamical binary-tree based described in 1 to classify to webpage.
CN201210181550.9A 2012-06-05 2012-06-05 Multi-class support vector machine classification method based on dynamic binary tree Active CN102722726B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210181550.9A CN102722726B (en) 2012-06-05 2012-06-05 Multi-class support vector machine classification method based on dynamic binary tree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210181550.9A CN102722726B (en) 2012-06-05 2012-06-05 Multi-class support vector machine classification method based on dynamic binary tree

Publications (2)

Publication Number Publication Date
CN102722726A CN102722726A (en) 2012-10-10
CN102722726B true CN102722726B (en) 2014-01-15

Family

ID=46948476

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210181550.9A Active CN102722726B (en) 2012-06-05 2012-06-05 Multi-class support vector machine classification method based on dynamic binary tree

Country Status (1)

Country Link
CN (1) CN102722726B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102930007B (en) * 2012-10-30 2016-01-06 广东电网公司 User in large-area power-cuts emergency processing sends a telegram in reply urgency level sorting technique
CN103136372B (en) * 2013-03-21 2016-03-02 陕西通信信息技术有限公司 URL quick position, classification and filter method in network trusted sexual behaviour management
CN104820839A (en) * 2015-04-24 2015-08-05 深圳信息职业技术学院 Respective positive and negative example correct rate setting-based controllable confidence machine algorithm
CN105447504B (en) * 2015-11-06 2019-04-02 中国科学院计算技术研究所 A kind of travel pattern Activity recognition method and corresponding identification model construction method
CN105631474B (en) * 2015-12-26 2019-01-11 哈尔滨工业大学 Based on Jeffries-Matusita distance and class to the more classification methods of the high-spectral data of decision tree
CN105930872A (en) * 2016-04-28 2016-09-07 上海应用技术学院 Bus driving state classification method based on class-similar binary tree support vector machine
CN109754395B (en) * 2017-01-10 2021-03-02 中国人民银行印制科学技术研究所 Method and device for extracting defects of value documents
CN108090503B (en) * 2017-11-28 2021-05-07 东软集团股份有限公司 Online adjustment method and device for multiple classifiers, storage medium and electronic equipment
CN108351968B (en) * 2017-12-28 2022-04-22 深圳市锐明技术股份有限公司 Alarming method, device, storage medium and server for criminal activities
CN108830302B (en) * 2018-05-28 2022-06-07 苏州大学 Image classification method, training method, classification prediction method and related device
CN109582774A (en) * 2018-11-30 2019-04-05 北京羽扇智信息科技有限公司 Natural language classification method, device, equipment and storage medium
CN109981583B (en) * 2019-02-26 2021-09-24 重庆邮电大学 Industrial control network situation assessment method
CN110930969B (en) * 2019-10-14 2024-02-13 科大讯飞股份有限公司 Background music determining method and related equipment
CN110715799B (en) * 2019-10-22 2021-05-11 中研新科智能电气有限公司 Method and device for detecting mechanical state of circuit breaker and terminal equipment
CN113360657B (en) * 2021-06-30 2023-10-24 安徽商信政通信息技术股份有限公司 Intelligent document distribution handling method and device and computer equipment
CN114268365B (en) * 2021-12-02 2023-07-11 国网甘肃省电力公司酒泉供电公司 Communication optical cable intelligent early warning method and system based on visualization technology

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101329734A (en) * 2008-07-31 2008-12-24 重庆大学 License plate character recognition method based on K-L transform and LS-SVM
CN101853400A (en) * 2010-05-20 2010-10-06 武汉大学 Multiclass image classification method based on active learning and semi-supervised learning
CN101980251A (en) * 2010-11-23 2011-02-23 中国矿业大学 Remote sensing classification method for binary tree multi-category support vector machines
CN102013946A (en) * 2010-11-01 2011-04-13 大连理工大学 Method for correcting errors of support vector machine (SVM) classification for solving multi-classification problems

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101329734A (en) * 2008-07-31 2008-12-24 重庆大学 License plate character recognition method based on K-L transform and LS-SVM
CN101853400A (en) * 2010-05-20 2010-10-06 武汉大学 Multiclass image classification method based on active learning and semi-supervised learning
CN102013946A (en) * 2010-11-01 2011-04-13 大连理工大学 Method for correcting errors of support vector machine (SVM) classification for solving multi-classification problems
CN101980251A (en) * 2010-11-23 2011-02-23 中国矿业大学 Remote sensing classification method for binary tree multi-category support vector machines

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于赫夫曼树的SVM多分类器构造方法;谷胜伟;《滁州学院学报》;20090630;第11卷(第03期);第41页-第43页,第42页右栏第3段-第43页第1段和图1 *
谷胜伟.基于赫夫曼树的SVM多分类器构造方法.《滁州学院学报》.2009,第11卷(第03期),

Also Published As

Publication number Publication date
CN102722726A (en) 2012-10-10

Similar Documents

Publication Publication Date Title
CN102722726B (en) Multi-class support vector machine classification method based on dynamic binary tree
Shapira et al. FlowPic: A generic representation for encrypted traffic classification and applications identification
Salman et al. A review on machine learning–based approaches for Internet traffic classification
Shapira et al. Flowpic: Encrypted internet traffic classification is as easy as image recognition
WO2018054342A1 (en) Method and system for classifying network data stream
CN104052639B (en) Real-time multi-application network flow identification method based on support vector machine
CN110808971B (en) Deep embedding-based unknown malicious traffic active detection system and method
Liu et al. Mobile app traffic flow feature extraction and selection for improving classification robustness
Can et al. Detection of distributed denial of service attacks using automatic feature selection with enhancement for imbalance dataset
Zhu et al. Retracted article: traffic identification and traffic analysis based on support vector machine
CN115600128A (en) Semi-supervised encrypted traffic classification method and device and storage medium
CN111107077B (en) SVM-based attack flow classification method
Liu et al. Extending labeled mobile network traffic data by three levels traffic identification fusion
Min et al. Online Internet traffic identification algorithm based on multistage classifier
Dixit et al. Internet traffic detection using naïve bayes and K-Nearest neighbors (KNN) algorithm
Gahelot et al. Flow based botnet traffic detection using machine learning
Chung et al. An effective similarity metric for application traffic classification
Wang et al. Internet traffic classification using machine learning: a token-based approach
Jahan et al. Intrusion Detection Systems based on Artificial Intelligence.
Akbaş et al. Usage of machine learning algorithms for flow based anomaly detection system in software defined networks
Bahaa et al. nndpi: A novel deep packet inspection technique using word embedding, convolutional and recurrent neural networks
Hurley et al. Classifying network protocols: a ‘two-way’flow approach
Dong et al. Traffic classification model based on integration of multiple classifiers
Yue et al. A detection method for I-CIFA attack in NDN network
Fu et al. NSA-Net: A NetFlow sequence attention network for virtual private network traffic detection

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant