CN106572486A - Handheld terminal traffic identification method and system based on machine learning - Google Patents

Handheld terminal traffic identification method and system based on machine learning Download PDF

Info

Publication number
CN106572486A
CN106572486A CN201610903226.1A CN201610903226A CN106572486A CN 106572486 A CN106572486 A CN 106572486A CN 201610903226 A CN201610903226 A CN 201610903226A CN 106572486 A CN106572486 A CN 106572486A
Authority
CN
China
Prior art keywords
flow
handheld device
identified
decision
machine learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610903226.1A
Other languages
Chinese (zh)
Other versions
CN106572486B (en
Inventor
朱国胜
石志凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei University
Original Assignee
Hubei University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei University filed Critical Hubei University
Priority to CN201610903226.1A priority Critical patent/CN106572486B/en
Publication of CN106572486A publication Critical patent/CN106572486A/en
Application granted granted Critical
Publication of CN106572486B publication Critical patent/CN106572486B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a handheld terminal traffic identification method and a system based on machine learning. The method comprises the following steps: 1, UA keyword matching is carried out on to-be-identified traffic, the to-be-identified traffic is directly identified as the handheld device traffic or the non handheld device traffic in the case of matching, and a second step is carried out in the case of not matching; 2, based on a C4.5 decision tree algorithm and traffic attributes, the information gain rate of each traffic attribute is calculated and a decision tree model is built, and the unmatched to-be-identified traffic is identified as the handheld device traffic or the non handheld device traffic through the decision tree model. When the method adopts the C4.5 decision tree algorithm for classifying the traffic which can not be identified by the UA method, comparison on the traffic attribute values only needs to be carried out, the processing is simple relatively, the processing time is shortened obviously, and the handheld terminal identification accuracy and the non handheld terminal identification accuracy can be greatly improved.

Description

A kind of handheld terminal method for recognizing flux and system based on machine learning
Technical field
The present invention relates to technical field of communication network, and in particular to a kind of handheld terminal flow based on machine learning is recognized Method and system.
Background technology
Mobile data flow has accounted for the 47% of global ip flow at present, and wherein WIFI flows have accounted for whole mobile data stream More than the 90% of amount.Mobile terminal flow identification under WIFI environment manages significant to internet traffic.
Recognition methodss to mobile terminal and handheld device flow mainly have three kinds, IMEI (International Mobile Equipment Identity) identification, MAC identifications and UA (user agent, user agent) identifications.Mobile communication Under network environment, the terminal surfed the Net by SIM authentication modes, mobile operator can obtain IMEI information and be identified, identification Accuracy rate is high and comparative maturity, but IMEI methods are only applicable to mobile communications network environment, cannot under WiFi network environment Using.Although the identification based on equipment two layer MAC address is limited with certain discrimination, two layer MAC address spread scope, Three-layer network cannot be penetrated, the MAC Address for obtaining whole network access device on a wide area network is difficult to realize.Also some pass through Build ad hoc network environment and realize the hand-held differentiation with non-handheld terminal flow, such as in equipment access phase, by certain Checking causes handheld terminal to be linked into different switching equipment from non-handheld terminal, reaches the purpose of traffic differentiation.This realization Method is relatively complicated, needs to increase verification mode, changes original network structure, in real network management and does not apply to.
User agent's UA recognition methodss are by reading user agent character strings in http request, with known UA characters String storehouse is matched, and identifies device type and browser type.Handheld device, including mobile phone, panel computer, intelligent watch, Handhold GPS etc., its UA keyword can be obtained from published UA lists, and the keyword of wherein handheld device has:Android, IPad, iPhone, ARCHOS, BlackBerry, CUPCAKE, FacebookTouch, iPod, Kindle, LG, Links, Linux armv6l, Linux armv7l, Maemo, Minimo, Mobile Safari, Nokia, OperaMini, OperaMobi, PalmSource, PlayStation, SAMSUNG, Symbian, SymbOS, webOS, Windows CE, WindowsMobile, Zaurus;Keyword used in on-handheld device has:Windows NT, Windows 7, Windows Vista, WindowsXP, Windows Server, Intel Mac OS X, PPC Mac OS X, MacBook, iMac, Fedora, Ubuntu, Gentoo, SUSE, Linux x8664, Linux i686, WiiConnect.Recognition methodss based on UA It is easier to realize, while not limited by network access.But it is this to directly read user agent character strings and compare known The method accuracy rate of UA dictionaries corresponding with terminal is general.Meanwhile, UA recognition methodss are affected ratio by new architecture, mountain vallage machine and PC It is larger, cause under real network environment, the accuracy rate of identification is relatively low, and there is the type of a large amount of UA None- identifieds, and labelling For unknown.Truthful data analysis shows that unknown accounts for the 35% of all connections under typical Campus Network environment.
The content of the invention
In view of this, it is necessary to which a kind of handheld terminal based on machine learning that can improve flow recognition accuracy is provided Method for recognizing flux and system.
A kind of handheld terminal method for recognizing flux based on machine learning, comprises the steps:
Step 1:UA keyword matchs are carried out to flow to be identified, if it does, Direct Recognition be handheld device flow or On-handheld device flow;If mismatched, into step 2;
Step 2:Based on C4.5 decision Tree algorithms and flow attribution, the information gain-ratio structure for calculating each flow attribution is determined Plan tree-model, unmatched flow to be identified are identified as handheld device flow or on-handheld device flow by decision-tree model.
And, a kind of handheld terminal flux recognition system based on machine learning, including:
UA matching units, carry out UA keyword matchs to flow to be identified, the flow to be identified of matching are identified as hand-held Equipment flow or on-handheld device flow;
Training set construction unit, the handheld device flow that UA matching units are identified or on-handheld device flow are added to be used In the training set of machine learning;Wherein, in training set, each sample is represented by the attribute vector comprising several flow attributions;
Unmatched flow to be identified in UA matching units is added concentration to be sorted by collection construction unit to be sorted;
Decision-tree model construction unit, for being based on C4.5 decision Tree algorithms, each flow attribution in calculating training set Information gain-ratio, and build decision-tree model;
Flow recognition unit, by collection to be sorted by decision-tree model, identifies handheld device flow and on-handheld device Flow.
A kind of handheld terminal method for recognizing flux and system based on machine learning of the present invention, is calculated using C4.5 decision trees When method is classified to unmatched flow to be identified, it is only necessary to carry out flow attribution value and compare, process relatively easy, hence it is evident that shorten Process time;Meanwhile, the equipment of UA method None- identifieds can be effectively recognized, overall handheld terminal is recognized with non-handheld terminal Accuracy rate is greatly improved.
Description of the drawings
Fig. 1 is a kind of flow chart of the handheld terminal method for recognizing flux based on machine learning of the present invention;
Fig. 2 is a kind of block diagram of the handheld terminal flux recognition system based on machine learning of the present invention.
Specific embodiment
In order that the objects, technical solutions and advantages of the present invention become more apparent, it is below in conjunction with drawings and Examples, right The present invention is further elaborated, it will be appreciated that specific embodiment described herein only to explain the present invention, and It is not used in the restriction present invention.
The flow process of a kind of handheld terminal method for recognizing flux based on machine learning that the present invention is provided, as shown in figure 1, tool Body process is as follows:
Step 1:UA keyword matchs are carried out to flow to be identified, if it does, Direct Recognition be handheld device flow or On-handheld device flow;If mismatched, into step 2.
Wherein, the handheld device flow or on-handheld device flow for step 1 being identified adds the instruction for machine learning Practice collection S, unmatched flow to be identified is added into collection T to be sorted.
Step 2:Based on C4.5 decision Tree algorithms and flow attribution, the information gain-ratio structure for calculating each flow attribution is determined Plan tree-model, unmatched flow to be identified are identified as handheld device flow or on-handheld device flow by decision-tree model.
Specific process is as follows:
Step 2.1:Training set and each sample of concentration to be sorted are represented by the attribute vector comprising several flow attributions. Specifically, training set S={ D1,D2,......,Dn, collection T={ D to be sorted1,D2,......Dn}.Wherein, training set and classification Concentrate each sample be represented by the attribute vector comprising several flow attributions, each sample of such as training set is by 5 Individual flow attribution { A1,A2,A3,A4,A5Represent, 5 flow attributions respectively connect persistent period, source payload, source Data package size, destination payload and destination data package size.
Step 2.2:Based on C4.5 decision Tree algorithms, the information gain-ratio of each flow attribution in training set, and structure are calculated Build decision-tree model.
Specifically, as the flow attribution of training set S is all the attribute of discrete, thus each property value to be carried out from Dispersion.After discretization, it is assumed that attribute AmThe interval having after k discretization, according to AmDifferent intervals, can by S draw It is divided into C1,C2,......,CkCommon k subset, therefore deduces that, sample set to the average information classified is:
Wherein, P (Cp)=| Cp|/S,1≤p≤k.For wherein arbitrarily attribute Ai, it is assumed that there is t different value aq(1≤q ≤ t), then according to AiDifferent values, S can be divided into S1,S2,......StCommon t subset, while can be by C1, C2,......,CkK*t subset is divided into, each subset CpqRepresent in Ai=aqUnder conditions of belong to the sample set of pth class, Wherein 1≤p≤k, 1≤q≤t.By AiAfter being divided, sample set to the average information classified is:
Wherein,P(Cpq)=| Cpq|/|S|.Using AiInformation gain f divided by SG(S, Ai) be:
fG(S,Ai)=H (S)-H (S/Ai) (3)
By formula (3), information gain fG(S,Ai) represent as divide after probabilistic decline degree.Using attribute AiDivide the information gain-ratio f of SGR(S,Ai) for the ratio of information gain and segmentation information amount, i.e.,:
Wherein, segmentation information amountC4.5 decision Tree algorithms select maximum letter The attribute of breath ratio of profit increase sets up decision tree from top to bottom, then using remaining sample in training set, initial decision tree is cut Branch, so as to remove hooks, obtains final decision tree-model M.
Step 2.3:Collection to be sorted is classified by decision-tree model, identifies handheld device flow and on-handheld device Flow.
Specifically, collection T to be sorted is identified into handheld device flow and on-handheld device flow by decision-tree model M.
The method of the present invention carries out traffic classification using C4.5 decision Tree algorithms, can not rely on the priori of network flow sample Probability, can be effectively prevented from network flow sample distribution and change brought negative influence, while C4.5 decision-tree models are at place During reason network under test stream sample, it is only necessary to carry out property value comparison, process relatively easy, hence it is evident that shorten process time, especially at place There is when managing extensive traffic classification problem obvious performance advantage.Meanwhile, instantiation shows, compares the standard of UA methods 65% True rate, the rate of accuracy reached 95% of this method can effectively recognize the equipment of UA method None- identifieds, make overall handheld terminal with it is non- Handheld terminal recognition accuracy is greatly improved.
The present invention a kind of handheld terminal flux recognition system based on machine learning is also provided, system block diagram as shown in Fig. 2 Including:
UA matching units, carry out UA keyword matchs to flow to be identified, the flow to be identified of matching are identified as hand-held Equipment flow or on-handheld device flow.
Training set construction unit, the handheld device flow that UA matching units are identified or on-handheld device flow are added to be used In the training set of machine learning.Specifically, training set S={ D1,D2,......,Dn, in training set, each sample can be by Attribute vector comprising several flow attributions represents that each sample of such as training set is by 5 flow attribution { A1,A2,A3, A4,A5Represent, 5 flow attributions are respectively:Connection persistent period, source payload, source data package size, destination have Effect load and destination data package size.
Unmatched flow to be identified in UA matching units is added concentration to be sorted by collection construction unit to be sorted.Specifically , collection T={ D to be sorted1,D2,......Dn, in category set, each sample can be by the category comprising several flow attributions Property vector representation, such as each sample is by 5 flow attribution { A1,A2,A3,A4,A5Represent, 5 flow attributions are respectively:Connection Persistent period, source payload, source data package size, destination payload and destination data package size.
Decision-tree model construction unit, for being based on C4.5 decision Tree algorithms, each flow attribution in calculating training set Information gain-ratio, and build decision-tree model.
Specifically, decision-tree model construction unit calculates the letter of each flow attribution in training set by formula (1) to (4) Breath gain, then selects the attribute of maximum information ratio of profit increase to set up decision tree from top to bottom using C4.5 decision Tree algorithms, then Using remaining sample in training set, beta pruning is carried out to initial decision tree, so as to remove hooks, final decision tree-model is obtained M。
Flow recognition unit, by collection to be sorted by decision-tree model, identifies handheld device flow and on-handheld device Flow.Specifically, flow recognition unit identifies handheld device flow and non-hand-held by collection T to be sorted by decision-tree model M Equipment flow.
The foregoing is only presently preferred embodiments of the present invention, not to limit the present invention, all spirit in the present invention and Within principle, any modification, equivalent substitution and improvements made etc. should be included within the scope of the present invention.

Claims (5)

1. a kind of handheld terminal method for recognizing flux based on machine learning, it is characterised in that comprise the steps:
Step 1:UA keyword matchs are carried out to flow to be identified, if it does, Direct Recognition is handheld device flow or non-handss Holding equipment flow;If mismatched, into step 2;
Step 2:Based on C4.5 decision Tree algorithms and flow attribution, calculate the information gain-ratio of each flow attribution and build decision tree Model, unmatched flow to be identified are identified as handheld device flow or on-handheld device flow by decision-tree model.
2. a kind of handheld terminal method for recognizing flux based on machine learning according to claim 1, it is characterised in that will The handheld device flow or on-handheld device flow that identify in step 1 add the training set for machine learning, will mismatch Flow to be identified add collection to be sorted.
3. a kind of handheld terminal method for recognizing flux based on machine learning according to claim 2, it is characterised in that step Rapid 2 detailed process is:
Step 2.1:In training set, each sample is represented by the attribute vector comprising several flow attributions;
Step 2.2:Based on C4.5 decision Tree algorithms, the information gain-ratio of each flow attribution in training set is calculated, and structure is determined Plan tree-model;
Step 2.3:Collection to be sorted is classified by decision-tree model, identifies handheld device flow and on-handheld device stream Amount.
4. a kind of handheld terminal method for recognizing flux based on machine learning according to claim 3, it is characterised in that institute Stating flow attribution includes connecting persistent period, source payload, source data package size, destination payload, destination Data package size.
5. a kind of handheld terminal flux recognition system based on machine learning, it is characterised in that include:
UA matching units, carry out UA keyword matchs to flow to be identified, and the flow to be identified of matching is identified as handheld device Flow or on-handheld device flow;
Training set construction unit, the handheld device flow or on-handheld device flow that UA matching units are identified are added for machine The training set of device study;Wherein, in training set, each sample is represented by the attribute vector comprising several flow attributions;
Unmatched flow to be identified in UA matching units is added concentration to be sorted by collection construction unit to be sorted;
Decision-tree model construction unit, for based on C4.5 decision Tree algorithms, calculating the information of each flow attribution in training set Ratio of profit increase, and build decision-tree model;
Flow recognition unit, by collection to be sorted by decision-tree model, identifies handheld device flow and on-handheld device flow.
CN201610903226.1A 2016-10-17 2016-10-17 Handheld terminal flow identification method and system based on machine learning Active CN106572486B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610903226.1A CN106572486B (en) 2016-10-17 2016-10-17 Handheld terminal flow identification method and system based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610903226.1A CN106572486B (en) 2016-10-17 2016-10-17 Handheld terminal flow identification method and system based on machine learning

Publications (2)

Publication Number Publication Date
CN106572486A true CN106572486A (en) 2017-04-19
CN106572486B CN106572486B (en) 2020-11-27

Family

ID=58532047

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610903226.1A Active CN106572486B (en) 2016-10-17 2016-10-17 Handheld terminal flow identification method and system based on machine learning

Country Status (1)

Country Link
CN (1) CN106572486B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108259637A (en) * 2017-11-30 2018-07-06 湖北大学 A kind of NAT device recognition methods and device based on decision tree
CN109063745A (en) * 2018-07-11 2018-12-21 南京邮电大学 A kind of types of network equipment recognition methods and system based on decision tree
CN109450733A (en) * 2018-11-26 2019-03-08 武汉烽火信息集成技术有限公司 A kind of network-termination device recognition methods and system based on machine learning
CN111711946A (en) * 2020-06-28 2020-09-25 北京司马科技有限公司 IoT (Internet of things) equipment identification method and identification system under encrypted wireless network

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102523241A (en) * 2012-01-09 2012-06-27 北京邮电大学 Method and device for classifying network traffic on line based on decision tree high-speed parallel processing
CN105119735A (en) * 2015-07-15 2015-12-02 百度在线网络技术(北京)有限公司 Method and device for determining flow types
US20160092427A1 (en) * 2014-09-30 2016-03-31 Accenture Global Services Limited Language Identification

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102523241A (en) * 2012-01-09 2012-06-27 北京邮电大学 Method and device for classifying network traffic on line based on decision tree high-speed parallel processing
US20160092427A1 (en) * 2014-09-30 2016-03-31 Accenture Global Services Limited Language Identification
CN105119735A (en) * 2015-07-15 2015-12-02 百度在线网络技术(北京)有限公司 Method and device for determining flow types

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李银周: "移动互联网中手机终端与流量特征分析", 《中国优秀硕士学位论文全文数据库信息科学技辑》 *
穆筝: "高速网络下 P2P 流量识别研究", 《信息网络安全》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108259637A (en) * 2017-11-30 2018-07-06 湖北大学 A kind of NAT device recognition methods and device based on decision tree
CN109063745A (en) * 2018-07-11 2018-12-21 南京邮电大学 A kind of types of network equipment recognition methods and system based on decision tree
CN109063745B (en) * 2018-07-11 2023-06-09 南京邮电大学 Network equipment type identification method and system based on decision tree
CN109450733A (en) * 2018-11-26 2019-03-08 武汉烽火信息集成技术有限公司 A kind of network-termination device recognition methods and system based on machine learning
CN109450733B (en) * 2018-11-26 2020-10-23 武汉烽火信息集成技术有限公司 Network terminal equipment identification method and system based on machine learning
CN111711946A (en) * 2020-06-28 2020-09-25 北京司马科技有限公司 IoT (Internet of things) equipment identification method and identification system under encrypted wireless network

Also Published As

Publication number Publication date
CN106572486B (en) 2020-11-27

Similar Documents

Publication Publication Date Title
WO2021135105A1 (en) Object recognition method based on big data, and apparatus, device and storage medium
CN105608179B (en) The method and apparatus for determining the relevance of user identifier
CN106572486A (en) Handheld terminal traffic identification method and system based on machine learning
US10410128B2 (en) Method, device, and server for friend recommendation
CN110147722A (en) A kind of method for processing video frequency, video process apparatus and terminal device
CN109218223B (en) Robust network traffic classification method and system based on active learning
JP2011054179A5 (en)
CN112733146B (en) Penetration testing method, device and equipment based on machine learning and storage medium
CN104036023A (en) Method for creating context fusion tree video semantic indexes
CN114553591B (en) Training method of random forest model, abnormal flow detection method and device
CN108259637A (en) A kind of NAT device recognition methods and device based on decision tree
CN107368526A (en) A kind of data processing method and device
CN112367273A (en) Knowledge distillation-based flow classification method and device for deep neural network model
CN110311870B (en) SSL VPN flow identification method based on density data description
CN112861894A (en) Data stream classification method, device and system
CN108377508B (en) User perception classification method and device based on measurement report data
CN116630749A (en) Industrial equipment fault detection method, device, equipment and storage medium
CN111917665A (en) Terminal application data stream identification method and system
CN109726398B (en) Entity identification and attribute judgment method, system, equipment and medium
CN107133644B (en) Digital library's content analysis system and method
CN111210158A (en) Target address determination method and device, computer equipment and storage medium
WO2023065640A1 (en) Model parameter adjustment method and apparatus, electronic device and storage medium
JP5476643B2 (en) Flow classification method, system, and program
CN109840535B (en) Method and device for realizing terrain classification
CN114362982A (en) Flow subdivision identification method, system, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant