CN106572486A - Handheld terminal traffic identification method and system based on machine learning - Google Patents
Handheld terminal traffic identification method and system based on machine learning Download PDFInfo
- Publication number
- CN106572486A CN106572486A CN201610903226.1A CN201610903226A CN106572486A CN 106572486 A CN106572486 A CN 106572486A CN 201610903226 A CN201610903226 A CN 201610903226A CN 106572486 A CN106572486 A CN 106572486A
- Authority
- CN
- China
- Prior art keywords
- flow
- handheld device
- identified
- decision
- machine learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W24/00—Supervisory, monitoring or testing arrangements
- H04W24/02—Arrangements for optimising operational condition
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses a handheld terminal traffic identification method and a system based on machine learning. The method comprises the following steps: 1, UA keyword matching is carried out on to-be-identified traffic, the to-be-identified traffic is directly identified as the handheld device traffic or the non handheld device traffic in the case of matching, and a second step is carried out in the case of not matching; 2, based on a C4.5 decision tree algorithm and traffic attributes, the information gain rate of each traffic attribute is calculated and a decision tree model is built, and the unmatched to-be-identified traffic is identified as the handheld device traffic or the non handheld device traffic through the decision tree model. When the method adopts the C4.5 decision tree algorithm for classifying the traffic which can not be identified by the UA method, comparison on the traffic attribute values only needs to be carried out, the processing is simple relatively, the processing time is shortened obviously, and the handheld terminal identification accuracy and the non handheld terminal identification accuracy can be greatly improved.
Description
Technical field
The present invention relates to technical field of communication network, and in particular to a kind of handheld terminal flow based on machine learning is recognized
Method and system.
Background technology
Mobile data flow has accounted for the 47% of global ip flow at present, and wherein WIFI flows have accounted for whole mobile data stream
More than the 90% of amount.Mobile terminal flow identification under WIFI environment manages significant to internet traffic.
Recognition methodss to mobile terminal and handheld device flow mainly have three kinds, IMEI (International
Mobile Equipment Identity) identification, MAC identifications and UA (user agent, user agent) identifications.Mobile communication
Under network environment, the terminal surfed the Net by SIM authentication modes, mobile operator can obtain IMEI information and be identified, identification
Accuracy rate is high and comparative maturity, but IMEI methods are only applicable to mobile communications network environment, cannot under WiFi network environment
Using.Although the identification based on equipment two layer MAC address is limited with certain discrimination, two layer MAC address spread scope,
Three-layer network cannot be penetrated, the MAC Address for obtaining whole network access device on a wide area network is difficult to realize.Also some pass through
Build ad hoc network environment and realize the hand-held differentiation with non-handheld terminal flow, such as in equipment access phase, by certain
Checking causes handheld terminal to be linked into different switching equipment from non-handheld terminal, reaches the purpose of traffic differentiation.This realization
Method is relatively complicated, needs to increase verification mode, changes original network structure, in real network management and does not apply to.
User agent's UA recognition methodss are by reading user agent character strings in http request, with known UA characters
String storehouse is matched, and identifies device type and browser type.Handheld device, including mobile phone, panel computer, intelligent watch,
Handhold GPS etc., its UA keyword can be obtained from published UA lists, and the keyword of wherein handheld device has:Android,
IPad, iPhone, ARCHOS, BlackBerry, CUPCAKE, FacebookTouch, iPod, Kindle, LG, Links,
Linux armv6l, Linux armv7l, Maemo, Minimo, Mobile Safari, Nokia, OperaMini,
OperaMobi, PalmSource, PlayStation, SAMSUNG, Symbian, SymbOS, webOS, Windows CE,
WindowsMobile, Zaurus;Keyword used in on-handheld device has:Windows NT, Windows 7, Windows
Vista, WindowsXP, Windows Server, Intel Mac OS X, PPC Mac OS X, MacBook, iMac,
Fedora, Ubuntu, Gentoo, SUSE, Linux x8664, Linux i686, WiiConnect.Recognition methodss based on UA
It is easier to realize, while not limited by network access.But it is this to directly read user agent character strings and compare known
The method accuracy rate of UA dictionaries corresponding with terminal is general.Meanwhile, UA recognition methodss are affected ratio by new architecture, mountain vallage machine and PC
It is larger, cause under real network environment, the accuracy rate of identification is relatively low, and there is the type of a large amount of UA None- identifieds, and labelling
For unknown.Truthful data analysis shows that unknown accounts for the 35% of all connections under typical Campus Network environment.
The content of the invention
In view of this, it is necessary to which a kind of handheld terminal based on machine learning that can improve flow recognition accuracy is provided
Method for recognizing flux and system.
A kind of handheld terminal method for recognizing flux based on machine learning, comprises the steps:
Step 1:UA keyword matchs are carried out to flow to be identified, if it does, Direct Recognition be handheld device flow or
On-handheld device flow;If mismatched, into step 2;
Step 2:Based on C4.5 decision Tree algorithms and flow attribution, the information gain-ratio structure for calculating each flow attribution is determined
Plan tree-model, unmatched flow to be identified are identified as handheld device flow or on-handheld device flow by decision-tree model.
And, a kind of handheld terminal flux recognition system based on machine learning, including:
UA matching units, carry out UA keyword matchs to flow to be identified, the flow to be identified of matching are identified as hand-held
Equipment flow or on-handheld device flow;
Training set construction unit, the handheld device flow that UA matching units are identified or on-handheld device flow are added to be used
In the training set of machine learning;Wherein, in training set, each sample is represented by the attribute vector comprising several flow attributions;
Unmatched flow to be identified in UA matching units is added concentration to be sorted by collection construction unit to be sorted;
Decision-tree model construction unit, for being based on C4.5 decision Tree algorithms, each flow attribution in calculating training set
Information gain-ratio, and build decision-tree model;
Flow recognition unit, by collection to be sorted by decision-tree model, identifies handheld device flow and on-handheld device
Flow.
A kind of handheld terminal method for recognizing flux and system based on machine learning of the present invention, is calculated using C4.5 decision trees
When method is classified to unmatched flow to be identified, it is only necessary to carry out flow attribution value and compare, process relatively easy, hence it is evident that shorten
Process time;Meanwhile, the equipment of UA method None- identifieds can be effectively recognized, overall handheld terminal is recognized with non-handheld terminal
Accuracy rate is greatly improved.
Description of the drawings
Fig. 1 is a kind of flow chart of the handheld terminal method for recognizing flux based on machine learning of the present invention;
Fig. 2 is a kind of block diagram of the handheld terminal flux recognition system based on machine learning of the present invention.
Specific embodiment
In order that the objects, technical solutions and advantages of the present invention become more apparent, it is below in conjunction with drawings and Examples, right
The present invention is further elaborated, it will be appreciated that specific embodiment described herein only to explain the present invention, and
It is not used in the restriction present invention.
The flow process of a kind of handheld terminal method for recognizing flux based on machine learning that the present invention is provided, as shown in figure 1, tool
Body process is as follows:
Step 1:UA keyword matchs are carried out to flow to be identified, if it does, Direct Recognition be handheld device flow or
On-handheld device flow;If mismatched, into step 2.
Wherein, the handheld device flow or on-handheld device flow for step 1 being identified adds the instruction for machine learning
Practice collection S, unmatched flow to be identified is added into collection T to be sorted.
Step 2:Based on C4.5 decision Tree algorithms and flow attribution, the information gain-ratio structure for calculating each flow attribution is determined
Plan tree-model, unmatched flow to be identified are identified as handheld device flow or on-handheld device flow by decision-tree model.
Specific process is as follows:
Step 2.1:Training set and each sample of concentration to be sorted are represented by the attribute vector comprising several flow attributions.
Specifically, training set S={ D1,D2,......,Dn, collection T={ D to be sorted1,D2,......Dn}.Wherein, training set and classification
Concentrate each sample be represented by the attribute vector comprising several flow attributions, each sample of such as training set is by 5
Individual flow attribution { A1,A2,A3,A4,A5Represent, 5 flow attributions respectively connect persistent period, source payload, source
Data package size, destination payload and destination data package size.
Step 2.2:Based on C4.5 decision Tree algorithms, the information gain-ratio of each flow attribution in training set, and structure are calculated
Build decision-tree model.
Specifically, as the flow attribution of training set S is all the attribute of discrete, thus each property value to be carried out from
Dispersion.After discretization, it is assumed that attribute AmThe interval having after k discretization, according to AmDifferent intervals, can by S draw
It is divided into C1,C2,......,CkCommon k subset, therefore deduces that, sample set to the average information classified is:
Wherein, P (Cp)=| Cp|/S,1≤p≤k.For wherein arbitrarily attribute Ai, it is assumed that there is t different value aq(1≤q
≤ t), then according to AiDifferent values, S can be divided into S1,S2,......StCommon t subset, while can be by C1,
C2,......,CkK*t subset is divided into, each subset CpqRepresent in Ai=aqUnder conditions of belong to the sample set of pth class,
Wherein 1≤p≤k, 1≤q≤t.By AiAfter being divided, sample set to the average information classified is:
Wherein,P(Cpq)=| Cpq|/|S|.Using AiInformation gain f divided by SG(S,
Ai) be:
fG(S,Ai)=H (S)-H (S/Ai) (3)
By formula (3), information gain fG(S,Ai) represent as divide after probabilistic decline degree.Using attribute
AiDivide the information gain-ratio f of SGR(S,Ai) for the ratio of information gain and segmentation information amount, i.e.,:
Wherein, segmentation information amountC4.5 decision Tree algorithms select maximum letter
The attribute of breath ratio of profit increase sets up decision tree from top to bottom, then using remaining sample in training set, initial decision tree is cut
Branch, so as to remove hooks, obtains final decision tree-model M.
Step 2.3:Collection to be sorted is classified by decision-tree model, identifies handheld device flow and on-handheld device
Flow.
Specifically, collection T to be sorted is identified into handheld device flow and on-handheld device flow by decision-tree model M.
The method of the present invention carries out traffic classification using C4.5 decision Tree algorithms, can not rely on the priori of network flow sample
Probability, can be effectively prevented from network flow sample distribution and change brought negative influence, while C4.5 decision-tree models are at place
During reason network under test stream sample, it is only necessary to carry out property value comparison, process relatively easy, hence it is evident that shorten process time, especially at place
There is when managing extensive traffic classification problem obvious performance advantage.Meanwhile, instantiation shows, compares the standard of UA methods 65%
True rate, the rate of accuracy reached 95% of this method can effectively recognize the equipment of UA method None- identifieds, make overall handheld terminal with it is non-
Handheld terminal recognition accuracy is greatly improved.
The present invention a kind of handheld terminal flux recognition system based on machine learning is also provided, system block diagram as shown in Fig. 2
Including:
UA matching units, carry out UA keyword matchs to flow to be identified, the flow to be identified of matching are identified as hand-held
Equipment flow or on-handheld device flow.
Training set construction unit, the handheld device flow that UA matching units are identified or on-handheld device flow are added to be used
In the training set of machine learning.Specifically, training set S={ D1,D2,......,Dn, in training set, each sample can be by
Attribute vector comprising several flow attributions represents that each sample of such as training set is by 5 flow attribution { A1,A2,A3,
A4,A5Represent, 5 flow attributions are respectively:Connection persistent period, source payload, source data package size, destination have
Effect load and destination data package size.
Unmatched flow to be identified in UA matching units is added concentration to be sorted by collection construction unit to be sorted.Specifically
, collection T={ D to be sorted1,D2,......Dn, in category set, each sample can be by the category comprising several flow attributions
Property vector representation, such as each sample is by 5 flow attribution { A1,A2,A3,A4,A5Represent, 5 flow attributions are respectively:Connection
Persistent period, source payload, source data package size, destination payload and destination data package size.
Decision-tree model construction unit, for being based on C4.5 decision Tree algorithms, each flow attribution in calculating training set
Information gain-ratio, and build decision-tree model.
Specifically, decision-tree model construction unit calculates the letter of each flow attribution in training set by formula (1) to (4)
Breath gain, then selects the attribute of maximum information ratio of profit increase to set up decision tree from top to bottom using C4.5 decision Tree algorithms, then
Using remaining sample in training set, beta pruning is carried out to initial decision tree, so as to remove hooks, final decision tree-model is obtained
M。
Flow recognition unit, by collection to be sorted by decision-tree model, identifies handheld device flow and on-handheld device
Flow.Specifically, flow recognition unit identifies handheld device flow and non-hand-held by collection T to be sorted by decision-tree model M
Equipment flow.
The foregoing is only presently preferred embodiments of the present invention, not to limit the present invention, all spirit in the present invention and
Within principle, any modification, equivalent substitution and improvements made etc. should be included within the scope of the present invention.
Claims (5)
1. a kind of handheld terminal method for recognizing flux based on machine learning, it is characterised in that comprise the steps:
Step 1:UA keyword matchs are carried out to flow to be identified, if it does, Direct Recognition is handheld device flow or non-handss
Holding equipment flow;If mismatched, into step 2;
Step 2:Based on C4.5 decision Tree algorithms and flow attribution, calculate the information gain-ratio of each flow attribution and build decision tree
Model, unmatched flow to be identified are identified as handheld device flow or on-handheld device flow by decision-tree model.
2. a kind of handheld terminal method for recognizing flux based on machine learning according to claim 1, it is characterised in that will
The handheld device flow or on-handheld device flow that identify in step 1 add the training set for machine learning, will mismatch
Flow to be identified add collection to be sorted.
3. a kind of handheld terminal method for recognizing flux based on machine learning according to claim 2, it is characterised in that step
Rapid 2 detailed process is:
Step 2.1:In training set, each sample is represented by the attribute vector comprising several flow attributions;
Step 2.2:Based on C4.5 decision Tree algorithms, the information gain-ratio of each flow attribution in training set is calculated, and structure is determined
Plan tree-model;
Step 2.3:Collection to be sorted is classified by decision-tree model, identifies handheld device flow and on-handheld device stream
Amount.
4. a kind of handheld terminal method for recognizing flux based on machine learning according to claim 3, it is characterised in that institute
Stating flow attribution includes connecting persistent period, source payload, source data package size, destination payload, destination
Data package size.
5. a kind of handheld terminal flux recognition system based on machine learning, it is characterised in that include:
UA matching units, carry out UA keyword matchs to flow to be identified, and the flow to be identified of matching is identified as handheld device
Flow or on-handheld device flow;
Training set construction unit, the handheld device flow or on-handheld device flow that UA matching units are identified are added for machine
The training set of device study;Wherein, in training set, each sample is represented by the attribute vector comprising several flow attributions;
Unmatched flow to be identified in UA matching units is added concentration to be sorted by collection construction unit to be sorted;
Decision-tree model construction unit, for based on C4.5 decision Tree algorithms, calculating the information of each flow attribution in training set
Ratio of profit increase, and build decision-tree model;
Flow recognition unit, by collection to be sorted by decision-tree model, identifies handheld device flow and on-handheld device flow.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610903226.1A CN106572486B (en) | 2016-10-17 | 2016-10-17 | Handheld terminal flow identification method and system based on machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610903226.1A CN106572486B (en) | 2016-10-17 | 2016-10-17 | Handheld terminal flow identification method and system based on machine learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106572486A true CN106572486A (en) | 2017-04-19 |
CN106572486B CN106572486B (en) | 2020-11-27 |
Family
ID=58532047
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610903226.1A Active CN106572486B (en) | 2016-10-17 | 2016-10-17 | Handheld terminal flow identification method and system based on machine learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106572486B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108259637A (en) * | 2017-11-30 | 2018-07-06 | 湖北大学 | A kind of NAT device recognition methods and device based on decision tree |
CN109063745A (en) * | 2018-07-11 | 2018-12-21 | 南京邮电大学 | A kind of types of network equipment recognition methods and system based on decision tree |
CN109450733A (en) * | 2018-11-26 | 2019-03-08 | 武汉烽火信息集成技术有限公司 | A kind of network-termination device recognition methods and system based on machine learning |
CN111711946A (en) * | 2020-06-28 | 2020-09-25 | 北京司马科技有限公司 | IoT (Internet of things) equipment identification method and identification system under encrypted wireless network |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102523241A (en) * | 2012-01-09 | 2012-06-27 | 北京邮电大学 | Method and device for classifying network traffic on line based on decision tree high-speed parallel processing |
CN105119735A (en) * | 2015-07-15 | 2015-12-02 | 百度在线网络技术(北京)有限公司 | Method and device for determining flow types |
US20160092427A1 (en) * | 2014-09-30 | 2016-03-31 | Accenture Global Services Limited | Language Identification |
-
2016
- 2016-10-17 CN CN201610903226.1A patent/CN106572486B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102523241A (en) * | 2012-01-09 | 2012-06-27 | 北京邮电大学 | Method and device for classifying network traffic on line based on decision tree high-speed parallel processing |
US20160092427A1 (en) * | 2014-09-30 | 2016-03-31 | Accenture Global Services Limited | Language Identification |
CN105119735A (en) * | 2015-07-15 | 2015-12-02 | 百度在线网络技术(北京)有限公司 | Method and device for determining flow types |
Non-Patent Citations (2)
Title |
---|
李银周: "移动互联网中手机终端与流量特征分析", 《中国优秀硕士学位论文全文数据库信息科学技辑》 * |
穆筝: "高速网络下 P2P 流量识别研究", 《信息网络安全》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108259637A (en) * | 2017-11-30 | 2018-07-06 | 湖北大学 | A kind of NAT device recognition methods and device based on decision tree |
CN109063745A (en) * | 2018-07-11 | 2018-12-21 | 南京邮电大学 | A kind of types of network equipment recognition methods and system based on decision tree |
CN109063745B (en) * | 2018-07-11 | 2023-06-09 | 南京邮电大学 | Network equipment type identification method and system based on decision tree |
CN109450733A (en) * | 2018-11-26 | 2019-03-08 | 武汉烽火信息集成技术有限公司 | A kind of network-termination device recognition methods and system based on machine learning |
CN109450733B (en) * | 2018-11-26 | 2020-10-23 | 武汉烽火信息集成技术有限公司 | Network terminal equipment identification method and system based on machine learning |
CN111711946A (en) * | 2020-06-28 | 2020-09-25 | 北京司马科技有限公司 | IoT (Internet of things) equipment identification method and identification system under encrypted wireless network |
Also Published As
Publication number | Publication date |
---|---|
CN106572486B (en) | 2020-11-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021135105A1 (en) | Object recognition method based on big data, and apparatus, device and storage medium | |
CN105608179B (en) | The method and apparatus for determining the relevance of user identifier | |
CN106572486A (en) | Handheld terminal traffic identification method and system based on machine learning | |
US10410128B2 (en) | Method, device, and server for friend recommendation | |
CN110147722A (en) | A kind of method for processing video frequency, video process apparatus and terminal device | |
CN109218223B (en) | Robust network traffic classification method and system based on active learning | |
JP2011054179A5 (en) | ||
CN112733146B (en) | Penetration testing method, device and equipment based on machine learning and storage medium | |
CN104036023A (en) | Method for creating context fusion tree video semantic indexes | |
CN114553591B (en) | Training method of random forest model, abnormal flow detection method and device | |
CN108259637A (en) | A kind of NAT device recognition methods and device based on decision tree | |
CN107368526A (en) | A kind of data processing method and device | |
CN112367273A (en) | Knowledge distillation-based flow classification method and device for deep neural network model | |
CN110311870B (en) | SSL VPN flow identification method based on density data description | |
CN112861894A (en) | Data stream classification method, device and system | |
CN108377508B (en) | User perception classification method and device based on measurement report data | |
CN116630749A (en) | Industrial equipment fault detection method, device, equipment and storage medium | |
CN111917665A (en) | Terminal application data stream identification method and system | |
CN109726398B (en) | Entity identification and attribute judgment method, system, equipment and medium | |
CN107133644B (en) | Digital library's content analysis system and method | |
CN111210158A (en) | Target address determination method and device, computer equipment and storage medium | |
WO2023065640A1 (en) | Model parameter adjustment method and apparatus, electronic device and storage medium | |
JP5476643B2 (en) | Flow classification method, system, and program | |
CN109840535B (en) | Method and device for realizing terrain classification | |
CN114362982A (en) | Flow subdivision identification method, system, electronic device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |