CN112949768A - Traffic classification method based on LSTM - Google Patents

Traffic classification method based on LSTM Download PDF

Info

Publication number
CN112949768A
CN112949768A CN202110371746.3A CN202110371746A CN112949768A CN 112949768 A CN112949768 A CN 112949768A CN 202110371746 A CN202110371746 A CN 202110371746A CN 112949768 A CN112949768 A CN 112949768A
Authority
CN
China
Prior art keywords
domain name
lstm
classifier
connection
data packet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110371746.3A
Other languages
Chinese (zh)
Inventor
冯杰
李嘉伟
周谊成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Ruilisi Technology Co ltd
Original Assignee
Suzhou Ruilisi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Ruilisi Technology Co ltd filed Critical Suzhou Ruilisi Technology Co ltd
Priority to CN202110371746.3A priority Critical patent/CN112949768A/en
Publication of CN112949768A publication Critical patent/CN112949768A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

A traffic classification method based on LSTM comprises the following steps: (1) constructing an LSTM network; (2) collecting global domain names, analyzing the domain names in the collecting process, and putting the domain names into a training data set; (3) manually marking whether the domain name in the step (2) is used as a CDN function; (4) inputting the training data set into an LSTM network, updating the weight in the network through a back propagation algorithm, and finishing the construction of the classifier; (5) the classifier receives the new connection and analyzes the new connection; (6) and selecting a corresponding acceleration method. By constructing the accelerator by using the LSTM network and classifying the domain names for downloading and non-downloading purposes by using the trained classifier, the discrimination capability of each domain name of an acceleration service team is improved, and the purposes of reducing the working intensity of manual statistics, improving the adaptability of games and improving the user experience are achieved.

Description

Traffic classification method based on LSTM
Technical Field
The invention relates to the field of network acceleration application, in particular to a traffic classification method based on LSTM.
Background
In the field of game acceleration, there are two types of connections that need to be accelerated: bandwidth acceleration and delay acceleration. Bandwidth acceleration refers to accelerating the download speed of games, web pages and the like; delay acceleration refers to reducing network delay in online gaming. The game acceleration provider needs to purchase different operator lines for the two acceleration requirements, and distinguish the types of connections generated by the game in the acceleration game, and accurately place the connections into the two lines. In the conventional process, different games and different connected line types are collected manually to determine the acceleration form of the game, but the labor consumption is huge and the confirmation efficiency is low.
The information disclosed in this background section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.
Disclosure of Invention
In order to solve the technical problems, the invention provides a traffic classification method based on LSTM, so as to achieve the purposes of reducing the working intensity of manual statistics, improving the adaptability of games and improving the user experience.
In order to achieve the purpose, the technical scheme of the invention is as follows:
a traffic classification method based on LSTM comprises the following steps:
(1) constructing an LSTM network;
(2) collecting global domain names, analyzing the domain names in the collecting process, and putting the domain names into a training data set;
(3) manually marking whether the domain name in the step (2) is used as a CDN function;
(4) inputting the training data set into an LSTM network, updating the weight in the network through a back propagation algorithm, and finishing the construction of the classifier;
(5) the classifier receives the new connection and analyzes the new connection;
(6) and selecting a corresponding acceleration method.
Preferably, the analysis in step (5) is as follows:
(1-1) if the connection is 443, reading a newly connected first data packet, and analyzing the newly connected first data packet as a ClientHello data packet of the TLS; if the SNI field exists, taking out the SNI field as a connected target domain name;
(1-2) if the connection is 80, reading the newly connected first data packet, and analyzing the newly connected first data packet as HTTPHEADER data format; if the Host field exists, taking out the value as the target domain name of the connection.
(2) Analyzing the CNAME value of the domain name, and if the CNAME value exists, inputting the CNAME value into a classifier for classification; if the CNAME value does not exist, the original domain name is directly input into the classifier for classification.
(3) And (3) determining an acceleration scheme of the connection according to the classification output in the step (2).
The invention has the following advantages:
according to the invention, the LSTM network is used for constructing the accelerator, and the trained classifier is used for classifying the domain names for downloading and non-downloading purposes, so that the discrimination capability of each domain name of an acceleration service team is improved, the working intensity of manual statistics is reduced, the adaptability of games is improved, and the user experience is improved.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below.
The invention provides a traffic classification method based on LSTM, which has the working principle that an accelerator is constructed by using an LSTM network, and a trained classifier is used for classifying domain names for downloading and non-downloading purposes, so that the discrimination capability of each domain name of an acceleration service team is improved, the working intensity of manual statistics is reduced, the adaptability of games is improved, and the purpose of improving user experience is achieved.
The present invention will be described in further detail with reference to examples and specific embodiments.
A traffic classification method based on LSTM comprises the following steps:
(1) constructing an LSTM network;
(2) collecting global domain names, analyzing the domain names in the collecting process, and putting the domain names into a training data set;
(3) manually marking whether the domain name in the step (2) is used as a CDN function;
(4) inputting the training data set into an LSTM network, updating the weight in the network through a back propagation algorithm, and finishing the construction of the classifier;
(5) the classifier receives the new connection and analyzes the new connection;
(5-1-1), if the connection is 443, reading the newly connected first data packet, and analyzing the newly connected first data packet as a ClientHello data packet of the TLS; if the SNI field exists, taking out the SNI field as a connected target domain name;
(5-1-2), if the connection is 80, reading the newly connected first data packet, and analyzing the newly connected first data packet as HTTPHEADER data format; if the Host field exists, taking out the value as the target domain name of the connection.
(5-2) analyzing the CNAME value of the domain name, and if the CNAME value exists, inputting the CNAME value into a classifier for classification; if the CNAME value does not exist, the original domain name is directly input into the classifier for classification.
And (5-3) determining an acceleration scheme of the connection according to the classification output in the step (5-2).
(6) And (5) accelerating by using the acceleration scheme confirmed in the step (5-3).
The specific use steps of the invention are as follows:
in the current game field, the server is mostly built on the CDN network, and the domain name of the CDN network has a certain degree of identification.
When a TCP connection is established, a target domain name of the connection is obtained by reading an SNI field in a TLS (transport layer Security) header packet or a Host field in an HTTP (hyper text transport protocol) header, and whether the target domain name belongs to a downloading domain name or not is judged, so that whether the processing mode of the current connection is bandwidth acceleration or delay acceleration can be determined.
Because the domain name usually consists of English words and is connected by a point number, the domain name can be divided into a plurality of words by the point number; and performing one-hot coding on the segmented domain name words according to the word dictionary.
Noting the encoded vector as M0. To add a priori knowledge and improve classification success rate, we modify M0The numerical value of (1) is set to the word with higher probability (such as CDN, data, video, etc.) appearing in the CDN domain name, and is set to 0.5 (such as api, log, etc.) to the word with lower probability.
If CNAME exists in the domain name, M is equal toγM0. Where γ is the addition factor, 1.2 in this example.
When a new connection is received, the connection is made,
if the connection is 443, reading the newly connected first data packet, and analyzing the newly connected first data packet as a ClientHello data packet of the TLS; if the SNI field exists, taking out the SNI field as a connected target domain name;
if the connection is 80, reading the first data packet of the new connection, and analyzing the first data packet as HTTPHEADER data format; if the Host field exists, taking out the value as the target domain name of the connection.
Analyzing the CNAME value of the domain name, and if the CNAME value exists, inputting the CNAME value into a classifier for classification; if the CNAME value does not exist, the original domain name is directly input into the classifier for classification.
An acceleration scheme for the connection is determined based on the corresponding classification output.
Through the mode, the traffic classification method based on the LSTM, provided by the invention, has the advantages that the accelerator is constructed by utilizing the LSTM network, and the domain names are classified for downloading and non-downloading purposes by utilizing the trained classifier, so that the discrimination capability of each domain name of an acceleration service team is improved, the working intensity of manual statistics is reduced, the adaptability of games is improved, and the user experience is improved.
The above description is only a preferred embodiment of the LSTM-based traffic classification method disclosed in the present invention, and it should be noted that, for those skilled in the art, many variations and modifications can be made without departing from the inventive concept, and these are within the scope of the present invention.

Claims (2)

1. A traffic classification method based on LSTM is characterized by comprising the following steps:
(1) constructing an LSTM network;
(2) collecting global domain names, analyzing the domain names in the collecting process, and putting the domain names into a training data set;
(3) manually marking whether the domain name in the step (2) is used as a CDN function;
(4) inputting the training data set into an LSTM network, updating the weight in the network through a back propagation algorithm, and finishing the construction of the classifier;
(5) the classifier receives the new connection and analyzes the new connection;
(6) and selecting a corresponding acceleration method.
2. A LSTM based traffic classification method according to claim 1, wherein the analysis in step (5) is as follows:
(1-1) if the connection is 443, reading a newly connected first data packet, and analyzing the newly connected first data packet as a ClientHello data packet of the TLS; if the SNI field exists, taking out the SNI field as a connected target domain name;
(1-2) if the connection is 80, reading the newly connected first data packet, and analyzing the newly connected first data packet as HTTPHEADER data format; if the Host field exists, taking out the value as the target domain name of the connection.
(2) Analyzing the CNAME value of the domain name, and if the CNAME value exists, inputting the CNAME value into a classifier for classification; if the CNAME value does not exist, the original domain name is directly input into the classifier for classification.
(3) And (3) determining an acceleration scheme of the connection according to the classification output in the step (2).
CN202110371746.3A 2021-04-07 2021-04-07 Traffic classification method based on LSTM Pending CN112949768A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110371746.3A CN112949768A (en) 2021-04-07 2021-04-07 Traffic classification method based on LSTM

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110371746.3A CN112949768A (en) 2021-04-07 2021-04-07 Traffic classification method based on LSTM

Publications (1)

Publication Number Publication Date
CN112949768A true CN112949768A (en) 2021-06-11

Family

ID=76232352

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110371746.3A Pending CN112949768A (en) 2021-04-07 2021-04-07 Traffic classification method based on LSTM

Country Status (1)

Country Link
CN (1) CN112949768A (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103825887A (en) * 2014-02-14 2014-05-28 深信服网络科技(深圳)有限公司 Hypertext transfer protocol over secure socket layer (HTTPS) encryption-based web filtering method and system
CN106603734A (en) * 2015-10-16 2017-04-26 任子行网络技术股份有限公司 CDN service IP detection method and system
CN108156174A (en) * 2018-01-15 2018-06-12 深圳市联软科技股份有限公司 Botnet detection method, device, equipment and medium based on the analysis of C&C domain names
CN109361779A (en) * 2018-10-22 2019-02-19 江苏满运软件科技有限公司 The management method of domain name and system, node server in distributed system
CN109361575A (en) * 2018-12-20 2019-02-19 哈尔滨工业大学(威海) A kind of method and its system obtaining analysis DNS data on flows
CN109450945A (en) * 2018-12-26 2019-03-08 成都西维数码科技有限公司 A kind of web page access method for safety monitoring based on SNI
CN109977118A (en) * 2019-03-21 2019-07-05 东南大学 A kind of abnormal domain name detection method of word-based embedded technology and LSTM
CN110049022A (en) * 2019-03-27 2019-07-23 深圳市腾讯计算机系统有限公司 A kind of domain name access control method, device and computer readable storage medium
CN110191103A (en) * 2019-05-10 2019-08-30 长安通信科技有限责任公司 A kind of DGA domain name detection classification method
CN111865990A (en) * 2020-07-23 2020-10-30 上海中通吉网络技术有限公司 Method, device, equipment and system for managing and controlling malicious reverse connection behavior of intranet
CN112019651A (en) * 2020-08-26 2020-12-01 重庆理工大学 DGA domain name detection method using depth residual error network and character-level sliding window
CN112217679A (en) * 2020-10-16 2021-01-12 腾讯科技(深圳)有限公司 Application program acceleration method and device, computer equipment and storage medium

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103825887A (en) * 2014-02-14 2014-05-28 深信服网络科技(深圳)有限公司 Hypertext transfer protocol over secure socket layer (HTTPS) encryption-based web filtering method and system
CN106603734A (en) * 2015-10-16 2017-04-26 任子行网络技术股份有限公司 CDN service IP detection method and system
CN108156174A (en) * 2018-01-15 2018-06-12 深圳市联软科技股份有限公司 Botnet detection method, device, equipment and medium based on the analysis of C&C domain names
CN109361779A (en) * 2018-10-22 2019-02-19 江苏满运软件科技有限公司 The management method of domain name and system, node server in distributed system
CN109361575A (en) * 2018-12-20 2019-02-19 哈尔滨工业大学(威海) A kind of method and its system obtaining analysis DNS data on flows
CN109450945A (en) * 2018-12-26 2019-03-08 成都西维数码科技有限公司 A kind of web page access method for safety monitoring based on SNI
CN109977118A (en) * 2019-03-21 2019-07-05 东南大学 A kind of abnormal domain name detection method of word-based embedded technology and LSTM
CN110049022A (en) * 2019-03-27 2019-07-23 深圳市腾讯计算机系统有限公司 A kind of domain name access control method, device and computer readable storage medium
CN110191103A (en) * 2019-05-10 2019-08-30 长安通信科技有限责任公司 A kind of DGA domain name detection classification method
CN111865990A (en) * 2020-07-23 2020-10-30 上海中通吉网络技术有限公司 Method, device, equipment and system for managing and controlling malicious reverse connection behavior of intranet
CN112019651A (en) * 2020-08-26 2020-12-01 重庆理工大学 DGA domain name detection method using depth residual error network and character-level sliding window
CN112217679A (en) * 2020-10-16 2021-01-12 腾讯科技(深圳)有限公司 Application program acceleration method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN107967311A (en) A kind of method and apparatus classified to network data flow
CN107592312A (en) A kind of malware detection method based on network traffics
CN105871619B (en) A kind of flow load type detection method based on n-gram multiple features
CN113489751B (en) Network traffic filtering rule conversion method based on deep learning
CN102420723A (en) Anomaly detection method for various kinds of intrusion
CN113378899B (en) Abnormal account identification method, device, equipment and storage medium
CN109151880A (en) Mobile application flow identification method based on multilayer classifier
CN110868404B (en) Industrial control equipment automatic identification method based on TCP/IP fingerprint
CN112468501A (en) URL-oriented phishing website detection method
CN114422211B (en) HTTP malicious traffic detection method and device based on graph attention network
CN112003869A (en) Vulnerability identification method based on flow
CN111224998B (en) Botnet identification method based on extreme learning machine
CN113408707A (en) Network encryption traffic identification method based on deep learning
CN114189350B (en) LightGBM-based train communication network intrusion detection method
CN113746804B (en) DNS hidden channel detection method, device, equipment and storage medium
CN107832611B (en) Zombie program detection and classification method combining dynamic and static characteristics
CN110519228A (en) A kind of black recognition methods and system for producing malice cloud robot under scene
Xu et al. Trafficgcn: Mobile application encrypted traffic classification based on gcn
CN112949768A (en) Traffic classification method based on LSTM
CN111444364B (en) Image detection method and device
CN113726561A (en) Business type recognition method for training convolutional neural network by using federal learning
CN109327404B (en) P2P prediction method and system based on naive Bayes classification algorithm, server and medium
CN117318980A (en) Small sample scene-oriented self-supervision learning malicious traffic detection method
CN112134847A (en) Attack detection method based on user flow behavior baseline
CN116248530A (en) Encryption flow identification method based on long-short-time neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination