WO2020119481A1 - Procédé et système de classification de trafic de réseau basés sur un apprentissage profond, et dispositif électronique - Google Patents

Procédé et système de classification de trafic de réseau basés sur un apprentissage profond, et dispositif électronique Download PDF

Info

Publication number
WO2020119481A1
WO2020119481A1 PCT/CN2019/122001 CN2019122001W WO2020119481A1 WO 2020119481 A1 WO2020119481 A1 WO 2020119481A1 CN 2019122001 W CN2019122001 W CN 2019122001W WO 2020119481 A1 WO2020119481 A1 WO 2020119481A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
network traffic
network
classification
sample data
Prior art date
Application number
PCT/CN2019/122001
Other languages
English (en)
Chinese (zh)
Inventor
赵世林
叶可江
须成忠
Original Assignee
深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳先进技术研究院 filed Critical 深圳先进技术研究院
Publication of WO2020119481A1 publication Critical patent/WO2020119481A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/04Network management architectures or arrangements
    • H04L41/044Network management architectures or arrangements comprising hierarchical management structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/02Capturing of monitoring data
    • H04L43/028Capturing of monitoring data by filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2441Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/31Flow control; Congestion control by tagging of packets, e.g. using discard eligibility [DE] bits
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
    • H04L69/161Implementation details of TCP/IP or UDP/IP stack architecture; Specification of modified or new header fields

Definitions

  • This application belongs to the technical field of network traffic classification, and in particular relates to a network traffic classification method, system, and electronic device based on deep learning.
  • Network traffic classification plays an important role in network management, resource allocation, on-demand services, and security systems.
  • network resources can be accurately managed,
  • the effective reuse of resources and the provision of personalized services play a very good role, and are also very important for enterprises to save unnecessary expenses on the network. Therefore, how to accurately classify network traffic and improve network resource reuse and personalized services is a major challenge.
  • Network traffic classification based on representation learning By preprocessing the obtained network traffic data, using the representation learning algorithm to extract the feature of the preprocessed network traffic data, the network traffic data is generated into a network flow vector, according to the network flow direction To classify the network traffic data by volume, it can realize the efficient classification of network traffic.
  • Network traffic classification method based on two-stage sequence feature learning two-stage use of short- and long-term memory neural networks to learn the sequence characteristics of network traffic in two stages at the two levels of data packet and network stream, the first stage is based on the flow byte sequence A sequence of packet vectors is generated on the second stage. In the second stage, a network flow vector is further generated based on the sequence of packet vectors. Finally, a classifier is used to perform traffic classification on the network flow vector. This method fully considers the internal structure and organization relationship of network traffic, effectively utilizes the time series feature learning ability of long-term and short-term memory neural network, and classifies after obtaining more comprehensive and comprehensive traffic characteristics, which can achieve a more accurate network traffic classification effect.
  • a network traffic classification method based on hierarchical spatio-temporal feature learning obtaining the spatial characteristics of the network traffic data through the first neural network; obtaining the temporal characteristics of the network traffic data through the second neural network; according to the spatial characteristics and the Time series features classify the network traffic.
  • This method can get more comprehensive and accurate traffic feature information, which can effectively improve the network traffic classification ability; using a better traffic feature set can effectively reduce the false alarm rate.
  • the existing network traffic classification methods are based on traditional machine learning technology, the classification performance is very dependent on the design of traffic characteristics, and how to accurately describe the feature set of traffic characteristics requires a lot of manual design, This is still a difficulty in solving the problem of network traffic classification.
  • most of the current network traffic classification methods basically propose various optimization and improvement algorithms for the classification algorithm module in the training phase, but the local characteristics contained in the original data of the network traffic are rarely studied and excavated. Classification performance is unstable.
  • the present application provides a method, system and electronic device for network traffic classification based on deep learning, aiming to solve at least to a certain extent one of the above technical problems in the prior art.
  • a network traffic classification method based on deep learning including the following steps:
  • Step a Capture network traffic sample data
  • Step b Extract the global feature data set of the network traffic sample data through a deep learning classification algorithm
  • Step c Construct a random forest classification model according to the global feature data set, and output the network traffic classification result through the random forest classification model.
  • the technical solution adopted in the embodiment of the present application further includes: in the step a, the capturing of network traffic sample data specifically includes: selecting a network data center to collect all network data packets; and at the same time, acquiring the network data packet corresponding time period System network logs generated by communication between internal network traffic.
  • the technical solution adopted in the embodiment of the present application further includes: in the step a, the network traffic sample data further includes: detecting network traffic sample data, preprocessing the network traffic sample data, and filtering out the network traffic sample data Incomplete network packets, and delete retransmitted network packets.
  • the network traffic sample data further includes: performing sample labeling processing on the preprocessed network traffic sample data to obtain a network flow data set;
  • the labeling of the sample is specifically: analyzing the network traffic sample data, finding out the natural attributes of each application and the IP address and transmission protocol between communicating with other applications; extracting the system network log and each application Associate the IP endpoints and the number of transmission packets to determine the category of the network traffic sample data, and combine the two applications with the IP address and transmission protocol of each application to complete the marking of the network traffic sample data; finally, use Deep packet inspection technology performs feature fingerprint matching on unknown traffic data to complete the marking of unknown traffic data.
  • the technical solution adopted in the embodiment of the present application further includes: in the step b, the global feature data set for extracting the network traffic sample data through the deep learning classification algorithm specifically includes:
  • Step b1 Enter the network flow data set
  • Step b2 Use the correlation between the flow data contained in the four layers of the TCP/IP protocol to sequentially extract the flow data of the application layer, transmission layer, network layer, and data link layer of each network packet in sequence;
  • Step b3 According to the importance of the data contained in the four layers of the TCP/IP protocol, sequentially divide and extract the traffic data of different sizes for each layer in proportion;
  • Step b4 The extracted flow data is composed into one-dimensional M bytes, and the M bytes are converted into N pixels;
  • Step b5 Convert the N pixels into a gray image of standard size to form a new gray image data set
  • Step b6 Send the grayscale image data set to the input layer of the convolutional neural network model. After continuously adaptively adjusting the size and number of the convolutional layer and the pooling layer, perform the convolution operation according to the bad, to obtain high-dimensional Global feature dataset.
  • a network traffic classification system based on deep learning including:
  • Data acquisition module used to capture network traffic sample data
  • Feature extraction module used to extract the global feature data set of the network traffic sample data through a deep learning classification algorithm
  • Classification model building module used to build a random forest classification model according to the global feature data set
  • Result output module used to output network traffic classification results.
  • the technical solution adopted in the embodiments of the present application further includes: the data acquisition module capturing network traffic sample data specifically includes: selecting a network data center to collect all network data packets; and at the same time, acquiring network traffic within a time period corresponding to the network data packets System network logs generated during the exchange.
  • the technical solution adopted in the embodiment of the present application further includes a data preprocessing module, which is used to detect network traffic sample data, preprocess the network traffic sample data, and filter out incompleteness in the network traffic sample data Network data packets, and delete the retransmitted network data packets.
  • a data preprocessing module which is used to detect network traffic sample data, preprocess the network traffic sample data, and filter out incompleteness in the network traffic sample data Network data packets, and delete the retransmitted network data packets.
  • the technical solution adopted in the embodiment of the present application further includes a data labeling module, and the data labeling module is used to perform sample labeling processing on the preprocessed network traffic sample data to obtain a network flow data set; the sample labeling
  • the tags are specifically: analyzing the network traffic sample data to find out the natural attributes of each application and the IP address and transmission protocol between communicating with other applications; extracting the IP associated with each application in the system network log Endpoints and the number of transmission packets, determine the category of the network traffic sample data, and combine the IP address and transmission protocol of each application to integrate the two to complete the marking of the network traffic sample data; Finally, use deep packet inspection technology Perform feature fingerprint matching on unknown traffic data to complete the marking of unknown traffic data.
  • the technical solution adopted in the embodiments of the present application further includes: the feature extraction module extracts the global feature data set of the network traffic sample data through a deep learning classification algorithm specifically: input network flow data set; using the TCP/IP protocol four-layer laboratory Contains the degree of correlation between traffic data, and sequentially extracts the traffic data of the application layer, transport layer, network layer, and data link layer of each network packet; according to the importance of the data contained in the four layers of the TCP/IP protocol , In order to divide and extract the traffic data of different sizes in each layer in turn according to the proportion; the extracted traffic data is composed of one-dimensional M bytes, and the M bytes are converted into N pixels; the N pixels The points are converted into standard-sized grayscale images to form a new grayscale image dataset; the grayscale image dataset is sent to the input layer of the convolutional neural network model, and the convolutional layer and pooling are continuously adaptively adjusted The size and number of layers are convolved according to the bad, and a high-dimensional global feature data set is obtained.
  • an electronic device including:
  • At least one processor At least one processor
  • a memory communicatively connected to the at least one processor; wherein,
  • the memory stores instructions executable by the one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform the following of the deep learning-based network traffic classification method described above operating:
  • Step a Capture network traffic sample data
  • Step b Extract the global feature data set of the network traffic sample data through a deep learning classification algorithm
  • Step c Construct a random forest classification model according to the global feature data set, and output the network traffic classification result through the random forest classification model.
  • the beneficial effects produced by the embodiments of the present application are: the deep learning-based network traffic classification method, system and electronic device of the embodiments of the present application use the potential characteristics of the traffic data of each layer in the TCP/IP protocol for classification, The classification accuracy is improved, and at the same time, the depth of the data contained in each layer is mined in proportion to the depth, which well guarantees the high cohesion of the features of each layer.
  • the results show stable classification performance, can handle high-dimensional traffic data, and do not need to make feature selection.
  • the present application can effectively guarantee the high accuracy and high performance of the network traffic classification, and at the same time, it can improve the classification efficiency, shorten the training time, and reduce the calculation overhead.
  • FIG. 1 is a flowchart of a network traffic classification method based on deep learning according to an embodiment of the present application
  • FIG. 2 is a flowchart of feature extraction by a deep learning classification algorithm according to an embodiment of the present application
  • FIG. 3 is a schematic structural diagram of a network traffic classification system based on deep learning according to an embodiment of the present application
  • FIG. 4 is a schematic structural diagram of a hardware device of a network traffic classification method based on deep learning provided by an embodiment of the present application.
  • the deep learning-based network traffic classification method of the embodiment of the present application uses deep learning hidden feature extraction technology to accurately mine a large number of hidden traffic feature sets in network traffic to ensure that network traffic classification In the process, the flow feature set in the network traffic is fully and efficiently used to accurately classify and identify the network traffic.
  • FIG. 1 is a flowchart of a deep learning-based network traffic classification method according to an embodiment of the present application.
  • the network traffic classification method based on deep learning in the embodiment of the present application includes the following steps:
  • Step 100 Capture sample data of network traffic
  • capturing network traffic sample data specifically includes: selecting a large network data center and using Wireshark software to collect all network data packets; at the same time, for labeling data, and setting up high-performance network monitoring software for continuous capture to obtain network data
  • the system network log generated by communication between network traffic during the time period corresponding to the packet.
  • Step 200 Detect network traffic sample data, and preprocess the network traffic sample data
  • the preprocessing of the network traffic sample data specifically includes: first, in order to prevent the incomplete network data packets generated by the transmission disconnection caused by the unstable three-way handshake of TCP (Transmission Control Protocol), the incomplete network data needs to be filtered out package. Secondly, in order to avoid the retransmission of network packets caused by the loss of acknowledgement packets during TCP connection, the retransmitted network packets need to be deleted.
  • TCP Transmission Control Protocol
  • Step 300 Perform sample labeling processing on the preprocessed network traffic sample data to obtain a network flow data set
  • the sample labeling specifically includes: first, analyze the network traffic sample data to find out the natural attributes of each application and the key information between communicating with other applications, including the IP address, transmission protocol, etc.; second, extract In the system network log, the IP endpoints and the number of transmission packets associated with each application are used to determine the category of network traffic sample data, and combined with the IP address and transmission protocol of each application to associate and merge the two to complete the marking of network traffic sample data. Finally, the DPI (Deep Packet Inspection) technology is used to perform fingerprint matching on unknown traffic data to complete the marking of unknown traffic data.
  • DPI Deep Packet Inspection
  • Step 400 Extract the global feature data set of the network flow data set through a deep learning classification algorithm
  • step 400 the embodiment of the present application uses the degree of association of each layer of protocol data in the traffic packets in the network traffic to re-extract and distribute the data set.
  • FIG. 2 is a flowchart of extracting global feature data of the deep learning classification algorithm according to an embodiment of the present application, which specifically includes the following steps:
  • Step 401 Enter the network stream data set
  • Step 402 Use the correlation between the flow data contained in the four layers of the TCP/IP protocol to sequentially extract the flow data of the application layer, transmission layer, network layer, and data link layer of each network packet in sequence;
  • Step 403 According to the importance of the data contained in the four layers of the TCP/IP protocol, sequentially divide and extract traffic data of different sizes for each layer according to a certain ratio;
  • step 403 the present application deeply digs in proportion to the importance of the data contained in each layer, which well guarantees the high cohesion of the features of each layer.
  • Step 404 Combine the extracted traffic data into one-dimensional M bytes, and convert the M bytes into N pixels;
  • Step 405 Convert the N pixels into a gray image of standard size (X, X, 1) to form a new gray image data set;
  • Step 406 Send the grayscale image data set to the input layer of the convolutional neural network model. After continuously adaptively adjusting the size and number of the convolutional layer and the pooling layer, perform the convolution operation according to the bad, to obtain a high-dimensional global Feature data set;
  • Step 407 Compress the image in the global feature data set to reduce parameters without affecting the image quality by downsampling
  • the downsampling method is specifically: the pooling layer is set to use MaxPooling (maximum pooling), the size is 2*2, the step size is 1, the maximum value of each window is updated, then the size of the image will be determined by Feature_map Becomes 2*2: (Feature_map-2)+1.
  • Step 408 Repeat steps 407 and 408 until a large number of local features are extracted and the convolution operation is terminated after the set learning rate is satisfied;
  • Step 409 The local feature extraction result is input to the Flatten layer, and the Flatten layer outputs a one-dimensional global feature data set.
  • Step 500 Perform classification training on the extracted global feature data set, construct a random forest classification model, and output a network traffic classification result through the random forest classification model.
  • step 500 the present application first uses a convolutional neural network to extract a global feature data set, and then uses the extracted global feature data set to train a random forest classification model. During the training process, it can detect the mutual influence of features (features), which is effective Guarantees the high precision and high performance of network traffic classification.
  • a random forest algorithm using supervised learning is used for modeling. According to the results given by each decision tree in the forest, not only can the category judgment of known traffic be obtained, but also the classification of unknown traffic can be determined by voting.
  • the test results show that the random forest classification model of the embodiment of the present application has very high classification accuracy, and at the same time, it can improve classification efficiency, shorten training time, and reduce calculation overhead.
  • FIG. 3 is a schematic structural diagram of a network traffic classification system based on deep learning according to an embodiment of the present application.
  • the network flow classification system based on deep learning in the embodiment of the present application includes a data acquisition module, a data preprocessing module, a data labeling module, a feature extraction module, a classification model construction module, and a result output module.
  • Data acquisition module used to capture network traffic sample data; among them, capturing network traffic sample data specifically includes: selecting a large network data center and using Wireshark software to collect all network data packets; at the same time, for label data, and setting up high-performance network monitoring The software continuously captures and obtains the system network log generated by the communication between the network traffic within the corresponding time period of the network data packet.
  • Data pre-processing module used to detect network traffic sample data and pre-process network traffic sample data; among them, network traffic sample data pre-processing specifically includes: first, in order to prevent TCP (Transmission Control Protocol) three-way handshake Instability leads to incomplete network data packets caused by disconnection. Incomplete network data packets need to be filtered out. Secondly, in order to avoid the retransmission of network packets caused by the loss of acknowledgement packets during TCP connection, the retransmitted network packets need to be deleted.
  • TCP Transmission Control Protocol
  • Data labeling module used for sample labeling the pre-processed network traffic sample data to obtain a network flow data set; among them, the sample labeling specifically includes: first, analyze the network traffic sample data to find each application The natural attributes of and the key information exchanged with other applications, including IP addresses, transmission protocols, etc.; second, extract the IP endpoints and the number of transmission packets associated with each application in the system network log to determine the network traffic sample data belongs to Category, and combine the IP address and transmission protocol of each application to associate and merge the two to complete the marking of network traffic sample data; finally, use DPI (Deep Packet Inspection) technology to perform feature fingerprint matching on unknown traffic data, Complete tagging of unknown traffic data.
  • DPI Deep Packet Inspection
  • Feature extraction module used to extract the global feature data set of the network flow data set through a deep learning classification algorithm; the embodiments of the present application use the degree of association of each layer of protocol data in the traffic packets in the network traffic to re-extract and distribute the data set.
  • the global feature data set extraction method includes:
  • the traffic data of different sizes of each layer is sequentially divided and extracted according to a certain ratio
  • the extracted flow data is composed of one-dimensional M bytes, and the M bytes are converted into N pixels;
  • Feature_map (feature map) ( wide+2*padding_size-filter_size)/stride+1, the specific size can be set according to the actual application.
  • the image in the global feature data set is compressed to reduce the parameters without affecting the image quality;
  • the downsampling method is specifically: the pooling layer is set to use MaxPooling (maximum pooling), the size is 2*2, step size is 1, update with the largest value of each window, then the size of the image will change from Feature_map to 2*2: (Feature_map-2)+1.
  • the local feature extraction result is input to the Flatten (flattening) layer, and the Flatten layer outputs a one-dimensional global feature dataset.
  • Classification model building module used for classification training on the extracted global feature data set to build a random forest classification model; this application first uses a convolutional neural network to extract the global feature data set, and then uses the extracted global feature data set to train the random forest classification
  • the model during the training process, can detect the interaction between features (features), and effectively guarantee the high accuracy and high performance of network traffic classification.
  • Result output module used to output network traffic classification results.
  • the device includes one or more processors and memory. Taking a processor as an example, the device may further include: an input system and an output system.
  • the processor, memory, input system, and output system may be connected through a bus or in other ways.
  • connection through a bus is used as an example.
  • the memory can be used to store non-transitory software programs, non-transitory computer executable programs, and modules.
  • the processor runs non-transitory software programs, instructions, and modules stored in the memory to execute various functional applications and data processing of the electronic device, that is, to implement the processing methods of the foregoing method embodiments.
  • the memory may include a storage program area and a storage data area, where the storage program area may store an operating system and application programs required by at least one function; the storage data area may store data, and the like.
  • the memory may include a high-speed random access memory, and may also include a non-transitory memory, such as at least one magnetic disk storage device, a flash memory device, or other non-transitory solid-state storage devices.
  • the memory optionally includes memories remotely located with respect to the processor, and these remote memories may be connected to the processing system via a network. Examples of the above network include but are not limited to the Internet, intranet, local area network, mobile communication network, and combinations thereof.
  • the input system can receive input digital or character information, and generate signal input.
  • the output system may include display devices such as display screens.
  • the one or more modules are stored in the memory, and when executed by the one or more processors, perform the following operations of any of the foregoing method embodiments:
  • Step a Capture network traffic sample data
  • Step b Extract the global feature data set of the network traffic sample data through a deep learning classification algorithm
  • Step c Construct a random forest classification model according to the global feature data set, and output the network traffic classification result through the random forest classification model.
  • the above-mentioned products can execute the method provided in the embodiments of the present application, and have function modules and beneficial effects corresponding to the execution method.
  • function modules and beneficial effects corresponding to the execution method For technical details that are not described in detail in this embodiment, refer to the method provided in the embodiments of the present application.
  • An embodiment of the present application provides a non-transitory (non-volatile) computer storage medium that stores computer-executable instructions, and the computer-executable instructions can perform the following operations:
  • Step a Capture network traffic sample data
  • Step b Extract the global feature data set of the network traffic sample data through a deep learning classification algorithm
  • Step c Construct a random forest classification model according to the global feature data set, and output the network traffic classification result through the random forest classification model.
  • An embodiment of the present application provides a computer program product.
  • the computer program product includes a computer program stored on a non-transitory computer-readable storage medium.
  • the computer program includes program instructions. When the program instructions are executed by a computer To cause the computer to perform the following operations:
  • Step a Capture network traffic sample data
  • Step b Extract the global feature data set of the network traffic sample data through a deep learning classification algorithm
  • Step c Construct a random forest classification model according to the global feature data set, and output the network traffic classification result through the random forest classification model.
  • the deep learning-based network traffic classification method, system, and electronic device of the embodiment of the present application use the potential characteristics of each layer of traffic data in the TCP/IP protocol for classification, which improves the classification accuracy, and at the same time according to the importance of the data contained in each layer Deep digging according to the ratio guarantees the high cohesion of the features of each layer.
  • the results show stable classification performance, can handle high-dimensional traffic data, and do not need to make feature selection.
  • the present application can effectively guarantee the high accuracy and high performance of the network traffic classification, and at the same time, it can improve the classification efficiency, shorten the training time, and reduce the calculation overhead.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

La présente invention concerne un procédé et un système de classification de trafic de réseau basés sur un apprentissage profond, et un dispositif électronique. Le procédé comprend les étapes suivantes : étape a : capturer des données d'échantillon de trafic de réseau; étape b : extraire un ensemble de données de caractéristiques globales des données d'échantillon de trafic de réseau au moyen d'un algorithme de classification d'apprentissage profond; et étape c : construire un modèle de classification de forêt aléatoire selon l'ensemble de données de caractéristiques globales, et délivrer en sortie un résultat de classification de trafic de réseau au moyen du modèle de classification de forêt aléatoire. Dans la présente invention, le modèle de classification de forêt aléatoire est entraîné en utilisant des caractéristiques globales extraites, le résultat montre des performances de classification stables, des données de trafic ultra-haute dimension peuvent être traitées, et une sélection de caractéristiques n'est pas nécessaire. Par comparaison avec l'état de la technique, la présente invention peut garantir efficacement une précision élevée et des performances élevées de classification de trafic de réseau; en outre, l'efficacité de classification peut être améliorée, le temps d'apprentissage peut être raccourci, et le surdébit de calcul peut être réduit.
PCT/CN2019/122001 2018-12-11 2019-11-29 Procédé et système de classification de trafic de réseau basés sur un apprentissage profond, et dispositif électronique WO2020119481A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811507380.2A CN109639481B (zh) 2018-12-11 2018-12-11 一种基于深度学习的网络流量分类方法、系统及电子设备
CN201811507380.2 2018-12-11

Publications (1)

Publication Number Publication Date
WO2020119481A1 true WO2020119481A1 (fr) 2020-06-18

Family

ID=66072697

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/122001 WO2020119481A1 (fr) 2018-12-11 2019-11-29 Procédé et système de classification de trafic de réseau basés sur un apprentissage profond, et dispositif électronique

Country Status (2)

Country Link
CN (1) CN109639481B (fr)
WO (1) WO2020119481A1 (fr)

Cited By (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111817982A (zh) * 2020-07-27 2020-10-23 南京信息工程大学 一种面向类别不平衡下的加密流量识别方法
CN111860628A (zh) * 2020-07-08 2020-10-30 上海乘安科技集团有限公司 一种基于深度学习的流量识别与特征提取方法
CN112187664A (zh) * 2020-09-23 2021-01-05 东南大学 一种基于半监督学习的应用流自动分类方法
CN112235264A (zh) * 2020-09-28 2021-01-15 国家计算机网络与信息安全管理中心 一种基于深度迁移学习的网络流量识别方法及装置
CN112235314A (zh) * 2020-10-29 2021-01-15 东巽科技(北京)有限公司 网络流量检测方法和装置及设备
CN112364878A (zh) * 2020-09-25 2021-02-12 江苏师范大学 一种复杂背景下基于深度学习的电力线分类方法
CN112468509A (zh) * 2020-12-09 2021-03-09 湖北松颢科技有限公司 一种基于深度学习技术的流量数据自动检测方法及装置
CN112615713A (zh) * 2020-12-22 2021-04-06 东软集团股份有限公司 隐蔽信道的检测方法、装置、可读存储介质及电子设备
CN112651435A (zh) * 2020-12-22 2021-04-13 中国南方电网有限责任公司 一种基于自学习的电力网络探针流量异常的检测方法
CN113124949A (zh) * 2021-04-06 2021-07-16 深圳市联恒星科技有限公司 一种多相流流量检测方法及系统
CN113177209A (zh) * 2021-04-19 2021-07-27 北京邮电大学 基于深度学习的加密流量分类方法及相关设备
CN113256507A (zh) * 2021-04-01 2021-08-13 南京信息工程大学 一种针对二进制流量数据生成图像的注意力增强方法
CN113660273A (zh) * 2021-08-18 2021-11-16 国家电网公司东北分部 超融合架构下基于深度学习的入侵检测方法及装置
CN113783795A (zh) * 2021-07-19 2021-12-10 北京邮电大学 加密流量分类方法及相关设备
CN113872939A (zh) * 2021-08-30 2021-12-31 济南浪潮数据技术有限公司 一种流量检测方法、装置及存储介质
CN113949653A (zh) * 2021-10-18 2022-01-18 中铁二院工程集团有限责任公司 一种基于深度学习的加密协议识别方法及系统
CN113965524A (zh) * 2021-09-29 2022-01-21 河海大学 一种网络流量分类方法以及基于该方法的流量控制系统
CN114338437A (zh) * 2022-01-13 2022-04-12 北京邮电大学 网络流量分类方法、装置、电子设备及存储介质
CN114500387A (zh) * 2022-02-14 2022-05-13 重庆邮电大学 基于机器学习的移动应用流量识别方法及系统
CN114553790A (zh) * 2022-03-12 2022-05-27 北京工业大学 一种基于多模态特征的小样本学习物联网流量分类方法及系统
CN114615007A (zh) * 2022-01-13 2022-06-10 中国科学院信息工程研究所 一种基于随机森林的隧道混合流量分类方法及系统
CN114765634A (zh) * 2021-01-13 2022-07-19 腾讯科技(深圳)有限公司 网络协议识别方法、装置、电子设备及可读存储介质
CN114915575A (zh) * 2022-06-02 2022-08-16 电子科技大学 一种基于人工智能的网络流量检测装置
CN115065560A (zh) * 2022-08-16 2022-09-16 国网智能电网研究院有限公司 基于业务时序特征分析的数据交互防泄漏检测方法及装置
CN115134168A (zh) * 2022-08-29 2022-09-30 成都盛思睿信息技术有限公司 基于卷积神经网络的云平台隐蔽通道检测方法及系统
CN115150840A (zh) * 2022-05-18 2022-10-04 西安交通大学 一种基于深度学习的移动网络流量预测方法
CN115242496A (zh) * 2022-07-20 2022-10-25 安徽工业大学 一种基于残差网络的Tor加密流量应用行为分类方法及装置
CN115277113A (zh) * 2022-07-06 2022-11-01 国网山西省电力公司信息通信分公司 一种基于集成学习的电网网络入侵事件检测识别方法
CN115442276A (zh) * 2022-08-23 2022-12-06 华能吉林发电有限公司长春热电厂 一种被动获取工控设备日志的方法
CN115514720A (zh) * 2022-09-19 2022-12-23 华东师范大学 一种面向可编程数据平面的用户活动分类方法及应用
CN114884704B (zh) * 2022-04-21 2023-03-10 中国科学院信息工程研究所 一种基于对合和投票的网络流量异常行为检测方法和系统
CN115993831A (zh) * 2023-03-23 2023-04-21 安徽大学 基于深度强化学习的机器人无目标网络的路径规划方法
CN116599779A (zh) * 2023-07-19 2023-08-15 中国电信股份有限公司江西分公司 一种增加网络安全性能的IPv6云转换方法
CN116842459A (zh) * 2023-09-01 2023-10-03 国网信息通信产业集团有限公司 一种基于小样本学习的电能计量故障诊断方法及诊断终端
CN116915512A (zh) * 2023-09-14 2023-10-20 国网江苏省电力有限公司常州供电分公司 电网中通信流量的检测方法、检测装置
CN117633665A (zh) * 2024-01-26 2024-03-01 深圳市互盟科技股份有限公司 一种网络数据监控方法及系统
CN117938545A (zh) * 2024-03-21 2024-04-26 中国信息通信研究院 一种基于加密流量的不良信息样本扩增方法和系统
CN117633665B (zh) * 2024-01-26 2024-05-28 深圳市互盟科技股份有限公司 一种网络数据监控方法及系统

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109639481B (zh) * 2018-12-11 2020-10-27 深圳先进技术研究院 一种基于深度学习的网络流量分类方法、系统及电子设备
CN110012029B (zh) * 2019-04-22 2020-05-26 中国科学院声学研究所 一种区分加密和非加密压缩流量的方法和系统
CN110048962A (zh) * 2019-04-24 2019-07-23 广东工业大学 一种网络流量分类的方法、系统及设备
CN110097120B (zh) * 2019-04-30 2022-08-26 南京邮电大学 网络流量数据分类方法、设备及计算机存储介质
CN110311829B (zh) * 2019-05-24 2021-03-16 西安电子科技大学 一种基于机器学习加速的网络流量分类方法
CN110225009B (zh) * 2019-05-27 2020-06-05 四川大学 一种基于通信行为画像的代理使用者检测方法
CN111131069B (zh) * 2019-11-25 2021-06-08 北京理工大学 一种基于深度学习策略的异常加密流量检测与分类方法
CN110896381B (zh) * 2019-11-25 2021-10-29 中国科学院深圳先进技术研究院 一种基于深度神经网络的流量分类方法、系统及电子设备
CN111224892B (zh) * 2019-12-26 2023-08-01 中国人民解放军国防科技大学 一种基于fpga随机森林模型的流量分类方法及系统
CN111917600A (zh) * 2020-06-12 2020-11-10 贵州大学 一种基于Spark性能优化的网络流量分类装置及分类方法
CN112200256A (zh) * 2020-10-16 2021-01-08 鹏城实验室 基于深度学习的sketch网络测量方法及电子设备
CN112511384B (zh) * 2020-11-26 2022-09-02 广州品唯软件有限公司 流量数据处理方法、装置、计算机设备和存储介质
CN112580708B (zh) * 2020-12-10 2024-03-05 上海阅维科技股份有限公司 从应用程序生成的加密流量中识别上网行为的方法
CN112804253B (zh) * 2021-02-04 2022-07-12 湖南大学 一种网络流量分类检测方法、系统及存储介质
CN115514686A (zh) * 2021-06-23 2022-12-23 深信服科技股份有限公司 一种流量采集方法、装置及电子设备和存储介质
CN113591950A (zh) * 2021-07-19 2021-11-02 中国海洋大学 一种随机森林网络流量分类方法、系统、存储介质
CN115296919B (zh) * 2022-08-15 2023-04-25 江西师范大学 一种边缘网关对特殊流量包计算方法及系统
CN116051883A (zh) * 2022-12-09 2023-05-02 哈尔滨理工大学 一种基于CNN-Transformer混合架构的网络流量分类方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150295805A1 (en) * 2013-04-15 2015-10-15 International Business Machines Corporation Identification and classification of web traffic inside encrypted network tunnels
CN105141455A (zh) * 2015-08-24 2015-12-09 西南大学 一种基于统计特征的有噪网络流量分类建模方法
CN108900432A (zh) * 2018-07-05 2018-11-27 中山大学 一种基于网络流行为的内容感知方法
CN109639481A (zh) * 2018-12-11 2019-04-16 深圳先进技术研究院 一种基于深度学习的网络流量分类方法、系统及电子设备

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104601486A (zh) * 2013-10-30 2015-05-06 阿里巴巴集团控股有限公司 一种网络流量的分流方法和装置
US20160283859A1 (en) * 2015-03-25 2016-09-29 Cisco Technology, Inc. Network traffic classification
CN106096411B (zh) * 2016-06-08 2018-09-18 浙江工业大学 一种基于字节码图像聚类的Android恶意代码家族分类方法
CN108021940B (zh) * 2017-11-30 2023-04-18 中国银联股份有限公司 基于机器学习的数据分类方法及系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150295805A1 (en) * 2013-04-15 2015-10-15 International Business Machines Corporation Identification and classification of web traffic inside encrypted network tunnels
CN105141455A (zh) * 2015-08-24 2015-12-09 西南大学 一种基于统计特征的有噪网络流量分类建模方法
CN108900432A (zh) * 2018-07-05 2018-11-27 中山大学 一种基于网络流行为的内容感知方法
CN109639481A (zh) * 2018-12-11 2019-04-16 深圳先进技术研究院 一种基于深度学习的网络流量分类方法、系统及电子设备

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LEI ZANG ET AL: "Application of Machine Learning in Cyberspace Security Research", CHINESE JOURNAL OF COMPUTERS, vol. 41, no. 9, 30 September 2018 (2018-09-30), pages 1 - 35, XP055712299 *
SHUANG ZHAO ET AL: "Review: Traffic Identification Based on Machine Learning", COMPUTER ENGINEERING & SCIENCE, vol. 40, no. 10, 25 October 2018 (2018-10-25), pages 1746 - 1756, XP055712306, ISSN: 1007-130X, DOI: 10.3969/j.issn.1007-130X.2018.10.005 *

Cited By (53)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860628A (zh) * 2020-07-08 2020-10-30 上海乘安科技集团有限公司 一种基于深度学习的流量识别与特征提取方法
CN111817982A (zh) * 2020-07-27 2020-10-23 南京信息工程大学 一种面向类别不平衡下的加密流量识别方法
CN112187664A (zh) * 2020-09-23 2021-01-05 东南大学 一种基于半监督学习的应用流自动分类方法
CN112364878A (zh) * 2020-09-25 2021-02-12 江苏师范大学 一种复杂背景下基于深度学习的电力线分类方法
CN112235264A (zh) * 2020-09-28 2021-01-15 国家计算机网络与信息安全管理中心 一种基于深度迁移学习的网络流量识别方法及装置
CN112235314A (zh) * 2020-10-29 2021-01-15 东巽科技(北京)有限公司 网络流量检测方法和装置及设备
CN112468509A (zh) * 2020-12-09 2021-03-09 湖北松颢科技有限公司 一种基于深度学习技术的流量数据自动检测方法及装置
CN112615713B (zh) * 2020-12-22 2024-02-23 东软集团股份有限公司 隐蔽信道的检测方法、装置、可读存储介质及电子设备
CN112615713A (zh) * 2020-12-22 2021-04-06 东软集团股份有限公司 隐蔽信道的检测方法、装置、可读存储介质及电子设备
CN112651435A (zh) * 2020-12-22 2021-04-13 中国南方电网有限责任公司 一种基于自学习的电力网络探针流量异常的检测方法
CN112651435B (zh) * 2020-12-22 2022-12-20 中国南方电网有限责任公司 一种基于自学习的电力网络探针流量异常的检测方法
CN114765634B (zh) * 2021-01-13 2023-12-12 腾讯科技(深圳)有限公司 网络协议识别方法、装置、电子设备及可读存储介质
CN114765634A (zh) * 2021-01-13 2022-07-19 腾讯科技(深圳)有限公司 网络协议识别方法、装置、电子设备及可读存储介质
CN113256507A (zh) * 2021-04-01 2021-08-13 南京信息工程大学 一种针对二进制流量数据生成图像的注意力增强方法
CN113256507B (zh) * 2021-04-01 2023-11-21 南京信息工程大学 一种针对二进制流量数据生成图像的注意力增强方法
CN113124949A (zh) * 2021-04-06 2021-07-16 深圳市联恒星科技有限公司 一种多相流流量检测方法及系统
CN113177209A (zh) * 2021-04-19 2021-07-27 北京邮电大学 基于深度学习的加密流量分类方法及相关设备
CN113783795B (zh) * 2021-07-19 2023-07-25 北京邮电大学 加密流量分类方法及相关设备
CN113783795A (zh) * 2021-07-19 2021-12-10 北京邮电大学 加密流量分类方法及相关设备
CN113660273A (zh) * 2021-08-18 2021-11-16 国家电网公司东北分部 超融合架构下基于深度学习的入侵检测方法及装置
CN113872939A (zh) * 2021-08-30 2021-12-31 济南浪潮数据技术有限公司 一种流量检测方法、装置及存储介质
CN113965524A (zh) * 2021-09-29 2022-01-21 河海大学 一种网络流量分类方法以及基于该方法的流量控制系统
CN113949653A (zh) * 2021-10-18 2022-01-18 中铁二院工程集团有限责任公司 一种基于深度学习的加密协议识别方法及系统
CN114615007A (zh) * 2022-01-13 2022-06-10 中国科学院信息工程研究所 一种基于随机森林的隧道混合流量分类方法及系统
CN114615007B (zh) * 2022-01-13 2023-05-23 中国科学院信息工程研究所 一种基于随机森林的隧道混合流量分类方法及系统
CN114338437B (zh) * 2022-01-13 2023-12-29 北京邮电大学 网络流量分类方法、装置、电子设备及存储介质
CN114338437A (zh) * 2022-01-13 2022-04-12 北京邮电大学 网络流量分类方法、装置、电子设备及存储介质
CN114500387A (zh) * 2022-02-14 2022-05-13 重庆邮电大学 基于机器学习的移动应用流量识别方法及系统
CN114553790A (zh) * 2022-03-12 2022-05-27 北京工业大学 一种基于多模态特征的小样本学习物联网流量分类方法及系统
CN114884704B (zh) * 2022-04-21 2023-03-10 中国科学院信息工程研究所 一种基于对合和投票的网络流量异常行为检测方法和系统
CN115150840A (zh) * 2022-05-18 2022-10-04 西安交通大学 一种基于深度学习的移动网络流量预测方法
CN115150840B (zh) * 2022-05-18 2024-03-12 西安交通大学 一种基于深度学习的移动网络流量预测方法
CN114915575A (zh) * 2022-06-02 2022-08-16 电子科技大学 一种基于人工智能的网络流量检测装置
CN114915575B (zh) * 2022-06-02 2023-04-07 电子科技大学 一种基于人工智能的网络流量检测装置
CN115277113A (zh) * 2022-07-06 2022-11-01 国网山西省电力公司信息通信分公司 一种基于集成学习的电网网络入侵事件检测识别方法
CN115242496A (zh) * 2022-07-20 2022-10-25 安徽工业大学 一种基于残差网络的Tor加密流量应用行为分类方法及装置
CN115242496B (zh) * 2022-07-20 2024-04-16 安徽工业大学 一种基于残差网络的Tor加密流量应用行为分类方法及装置
CN115065560A (zh) * 2022-08-16 2022-09-16 国网智能电网研究院有限公司 基于业务时序特征分析的数据交互防泄漏检测方法及装置
CN115442276A (zh) * 2022-08-23 2022-12-06 华能吉林发电有限公司长春热电厂 一种被动获取工控设备日志的方法
CN115134168A (zh) * 2022-08-29 2022-09-30 成都盛思睿信息技术有限公司 基于卷积神经网络的云平台隐蔽通道检测方法及系统
CN115514720B (zh) * 2022-09-19 2023-09-19 华东师范大学 一种面向可编程数据平面的用户活动分类方法及应用
CN115514720A (zh) * 2022-09-19 2022-12-23 华东师范大学 一种面向可编程数据平面的用户活动分类方法及应用
CN115993831A (zh) * 2023-03-23 2023-04-21 安徽大学 基于深度强化学习的机器人无目标网络的路径规划方法
CN115993831B (zh) * 2023-03-23 2023-06-09 安徽大学 基于深度强化学习的机器人无目标网络的路径规划方法
CN116599779A (zh) * 2023-07-19 2023-08-15 中国电信股份有限公司江西分公司 一种增加网络安全性能的IPv6云转换方法
CN116599779B (zh) * 2023-07-19 2023-10-27 中国电信股份有限公司江西分公司 一种增加网络安全性能的IPv6云转换方法
CN116842459A (zh) * 2023-09-01 2023-10-03 国网信息通信产业集团有限公司 一种基于小样本学习的电能计量故障诊断方法及诊断终端
CN116842459B (zh) * 2023-09-01 2023-11-21 国网信息通信产业集团有限公司 一种基于小样本学习的电能计量故障诊断方法及诊断终端
CN116915512B (zh) * 2023-09-14 2023-12-01 国网江苏省电力有限公司常州供电分公司 电网中通信流量的检测方法、检测装置
CN116915512A (zh) * 2023-09-14 2023-10-20 国网江苏省电力有限公司常州供电分公司 电网中通信流量的检测方法、检测装置
CN117633665A (zh) * 2024-01-26 2024-03-01 深圳市互盟科技股份有限公司 一种网络数据监控方法及系统
CN117633665B (zh) * 2024-01-26 2024-05-28 深圳市互盟科技股份有限公司 一种网络数据监控方法及系统
CN117938545A (zh) * 2024-03-21 2024-04-26 中国信息通信研究院 一种基于加密流量的不良信息样本扩增方法和系统

Also Published As

Publication number Publication date
CN109639481A (zh) 2019-04-16
CN109639481B (zh) 2020-10-27

Similar Documents

Publication Publication Date Title
WO2020119481A1 (fr) Procédé et système de classification de trafic de réseau basés sur un apprentissage profond, et dispositif électronique
CN110896381B (zh) 一种基于深度神经网络的流量分类方法、系统及电子设备
WO2020062390A1 (fr) Procédé et système de classification de trafic de réseau et dispositif électronique
CN113162908B (zh) 一种基于深度学习的加密流量检测方法及系统
CN109145759B (zh) 车辆属性识别方法、装置、服务器及存储介质
WO2019015684A1 (fr) Procédé et appareil d'élimination de réduplication d'image faciale, dispositif électronique, support d'informations et programme
CN111860628A (zh) 一种基于深度学习的流量识别与特征提取方法
WO2021238019A1 (fr) Système et procédé de détection de flux de trafic en temps réel basés sur un réseau neuronal de fusion de caractéristiques de convolution fantôme
EP4035070B1 (fr) Procédé et serveur pour faciliter un entraînement amélioré d'un processus supervisé d'apprentissage automatique
CN111147394B (zh) 一种远程桌面协议流量行为的多级分类检测方法
CN112367273B (zh) 基于知识蒸馏的深度神经网络模型的流量分类方法及装置
CN111311570A (zh) 基于无人机巡检的输电线路关键器件缺陷识别方法
CN110034966B (zh) 一种基于机器学习的数据流分类方法及系统
CN110532959B (zh) 基于双通道三维卷积神经网络的实时暴力行为检测系统
KR20180123810A (ko) X-Ray 의료 영상 판독을 위한 데이터 심화학습 처리 기술 및 그 방법
CN112653749A (zh) 用于物联网的基于边缘计算的复杂事件处理系统及方法
CN112910881A (zh) 一种基于通信协议的数据监控方法及系统
Nazeer et al. Real time object detection and recognition in machine learning using jetson nano
CN117036798A (zh) 一种基于深度学习的输配电线路图像识别方法及系统
CN115147895B (zh) 人脸鉴伪方法及装置
Xie et al. Mask Wearing Detection Based on YOLOv5 Target Detection Algorithm under COVID-19''
Wang et al. Sessionvideo: A novel approach for encrypted traffic classification via 3D-CNN model
CN114092746A (zh) 一种多属性识别方法、装置、存储介质及电子设备
CN117274843B (zh) 基于轻量级边缘计算的无人机前端缺陷识别方法及系统
CN116405330B (zh) 基于迁移学习的网络异常流量识别方法、装置和设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19897053

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 03/11/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19897053

Country of ref document: EP

Kind code of ref document: A1