CN111431819B - Network traffic classification method and device based on serialized protocol flow characteristics - Google Patents

Network traffic classification method and device based on serialized protocol flow characteristics Download PDF

Info

Publication number
CN111431819B
CN111431819B CN202010150723.5A CN202010150723A CN111431819B CN 111431819 B CN111431819 B CN 111431819B CN 202010150723 A CN202010150723 A CN 202010150723A CN 111431819 B CN111431819 B CN 111431819B
Authority
CN
China
Prior art keywords
network
flow
data set
packets
serialized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010150723.5A
Other languages
Chinese (zh)
Other versions
CN111431819A (en
Inventor
赵世林
叶可江
须成忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN202010150723.5A priority Critical patent/CN111431819B/en
Publication of CN111431819A publication Critical patent/CN111431819A/en
Application granted granted Critical
Publication of CN111431819B publication Critical patent/CN111431819B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2441Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a network traffic classification method and device based on serialized protocol flow characteristics. Dividing the acquired network packet into a plurality of network flows based on the packet head attribute characteristics of the network packet; calculating the similarity between network packets of each network flow and clustering and labeling the network flows; carrying out serialization operation on the network flow with the tag to obtain a serialization data set; inputting the serialized data set into a cyclic neural network, and extracting the association relation among network packets; inputting the output sequence value of the cyclic neural network into a one-dimensional convolutional neural network to obtain a serialized feature vector set; and inputting the serialized feature vector set into the self-adaptive convolutional neural network to obtain the predicted network traffic classification. The invention utilizes the data relationship modeling of the time dimension and the space dimension of the network packet to improve the classification efficiency and the classification effect of the network traffic.

Description

Network traffic classification method and device based on serialized protocol flow characteristics
Technical Field
The present invention relates to the field of computer communications technologies, and in particular, to a method and apparatus for classifying network traffic based on serialized protocol flow features.
Background
With the rapid development of the internet, a large amount of data is generated in the scenes of computer networks, communication equipment, network transmission, wireless communication and the like, and the large amount of data occupies a large amount of network bandwidth resources, cloud service and storage resources, and if the data are not processed well, network paralysis, congestion or server downtime can be caused at any time.
Along with the rapid expansion of network scale, data transmitted from various network devices, mobile terminals and networks in a server side are rapidly increased, so that the data generated by a large amount of network applications can be accurately applied and analyzed only by ensuring the security and stability of network data transmission in the process of transmission and use, which is one of focuses of attention in network security nowadays. The network traffic classification is a key technology for ensuring the safety and stability of network data transmission, and can also be used as one of key indexes for evaluating the network performance.
In the prior art, the network traffic classification technology mainly comprises a traditional network traffic classification technology and a network traffic classification technology based on machine learning. However, these techniques still have the following problems: the manual extraction of the features consumes time and labor; poor fault tolerance to data; unstable flow classification effect, etc.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, and provides a network traffic classification method and device based on serialized protocol flow characteristics.
According to a first aspect of the present invention, there is provided a network traffic classification method based on serialized protocol flow characteristics. The method comprises the following steps:
dividing the acquired network packets into a plurality of network flows based on the packet header attribute characteristics of the network packets, wherein each network flow comprises a plurality of network packets;
calculating the similarity between network packets of each network flow and clustering and labeling the network flows;
carrying out serialization operation on the network flow with the tag to obtain a serialization data set;
inputting the serialized data set into a cyclic neural network, and extracting the association relation among network packets;
inputting the output sequence value of the cyclic neural network into a one-dimensional convolutional neural network to obtain a serialized feature vector set;
and inputting the serialized feature vector set into the self-adaptive convolutional neural network to obtain the predicted network traffic classification.
In one embodiment, dividing the collected network packets into a plurality of network flows based on header attribute characteristics of the network packets includes:
dividing network data of each layer according to protocol features of different layers of a network, extracting corresponding protocol packet features and stream features, and forming a protocol packet data set and a stream feature data set;
classifying and labeling the protocol packet data set and the flow characteristic data set according to the source IP, the destination IP, the source port, the destination port and the protocol type quintuple in each network packet to obtain a plurality of network flows, wherein each network flow comprises a plurality of network packets with the same quintuple.
In one embodiment, the characteristic values in the protocol packet data set and the flow characteristic data set are normalized values.
In one embodiment, calculating the similarity between network packets of each network flow and clustering the network flows includes:
for the obtained network flow, calculating the similarity of the first N network packets in the network flow, wherein N is an integer greater than or equal to 2;
determining a homologous network flow by comparing the calculated similarity with a set threshold;
and clustering and labeling the network flows by using a clustering method based on machine learning.
In one embodiment, the European metric is utilized to calculate the similarity of the first N network packets in each network flow.
In one embodiment, the recurrent neural network is a long and short term memory network of a plurality of chained architectures.
In one embodiment, partitioning network data of each layer according to protocol features of different layers of the network, extracting corresponding protocol packet features and stream features includes:
dividing network data of an application layer, a presentation layer, a session layer, a transmission layer, a network layer, a data link layer and a physical layer based on OSI seven-layer network model protocol;
and extracting corresponding protocol packet characteristics and flow characteristics for the network data of each layer, and forming a protocol packet data set and a flow characteristic data set according to the transmission directivity of the network packets.
According to a second aspect of the present invention, there is provided a network traffic classification device based on serialized protocol flow characteristics. The device comprises:
network flow construction unit: dividing the acquired network packets into a plurality of network flows based on the packet header attribute characteristics of the network packets, wherein each network flow comprises a plurality of network packets;
a network flow clustering unit: the method comprises the steps of calculating the similarity between network packets of each network flow and clustering and labeling the network flows;
serialization unit: the method comprises the steps of carrying out serialization operation on a network flow with a label to obtain a serialization data set;
sequence data mining unit: the method comprises the steps of inputting the serialized data set into a cyclic neural network, and extracting the association relation among network packets;
a sequence feature extraction unit: the method comprises the steps of inputting an output sequence value of the cyclic neural network into a one-dimensional convolutional neural network to obtain a serialized feature vector set;
classification prediction unit: and the method is used for inputting the serialized feature vector set into the adaptive convolutional neural network to obtain the predicted network traffic classification.
Compared with the prior art, the method can accurately and automatically extract the high-dimensional sequence convolution protocol flow characteristics for convolution neural network training, thereby improving the classification efficiency; meanwhile, modeling is carried out by utilizing the data relationship between the time dimension and the space dimension of the network packet, so that a better fitting effect is achieved on real data; and network behavior can be predicted based on the serialized protocol flow characteristics, so that the classification effect and classification efficiency are remarkably improved.
Other features of the present invention and its advantages will become apparent from the following detailed description of exemplary embodiments of the invention, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a flow chart of a method of classifying network traffic based on serialized protocol stream features in accordance with an embodiment of the invention;
FIG. 2 is a flow chart of dividing an acquired network packet into a plurality of network flows according to one embodiment of the invention;
FIG. 3 is a flow chart of acquiring a serialized data set in accordance with an embodiment of the invention;
fig. 4 is a general framework flow diagram of a method of classifying network traffic based on serialized protocol stream features in accordance with one embodiment of the invention.
Detailed Description
Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless it is specifically stated otherwise.
The following description of at least one exemplary embodiment is merely exemplary in nature and is in no way intended to limit the invention, its application, or uses.
Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.
In all examples shown and discussed herein, any specific values should be construed as merely illustrative, and not a limitation. Thus, other examples of exemplary embodiments may have different values.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.
Briefly, the network traffic classification method based on the serialized protocol flow characteristics provided by the invention comprises the following steps: collecting network traffic data; preprocessing protocol flow characteristics; generating a serialization training set, and training a cyclic neural network by using the serialization data; the result output by the cyclic neural network is reconstructed and input to the one-dimensional convolutional neural network; and inputting an output result of the one-dimensional convolutional neural network into the self-adaptive convolutional neural network for training to obtain the classification of the network flow.
Fig. 1 is a flowchart of a network traffic classification method according to an embodiment of the present invention, and with reference to fig. 4, the method specifically includes the following steps:
step S110, collecting network flow data for preprocessing, and dividing the collected network packets into a plurality of network flows based on the header attribute characteristics of the network packets.
In one embodiment, referring to fig. 2, the process of collecting network traffic data and preprocessing includes:
step S101, preparing a data source, and preparing to collect data in a network data management center or a built local area network environment.
Step S102, collecting network application data and network communication logs.
For example, tshark commander may be used to capture network protocol data at the server, wireshare may be used to monitor a specific network card to collect specific network application traffic (e.g., video, website, voice, etc.), and collect system network communication logs during the period of time in preparation for marking the data.
Step S103, extracting protocol package characteristics and stream characteristics according to the protocol characteristics of different layers of the network.
For example, based on the OSI seven-layer network model protocol, the 7 layers of OSI are, from bottom to top, the 7-application layer, the 6-presentation layer, the 5-session layer, the 4-transport layer, the 3-network layer, the 2-data link layer, the 1-physical layer, respectively. Based on the protocol characteristics of each layer, the network data of each layer is divided, and the corresponding protocol packet characteristics and stream characteristics are extracted. For example, the protocol packet characteristics include { Size-packet, interval-packet, … } and the like, and the Flow characteristics include { Length-Flow, flow Size, … } and the like. Furthermore, it is considered that both the network packets and the network flows are bi-directional, i.e. client- > server, server- > client.
Preferably, these feature values may be normalized in order to improve the efficiency and calculation accuracy of subsequent data processing.
Step S104, analyzing the network communication log, and labeling the protocol packet/stream characteristic classification.
Specifically, by using the collected system network communication log, by detecting and comparing the five features { SrcIP, dstIP, srcPort, dstPort, TCP or UDP } in the network communication log and in each network packet, if the network communication log has the same five-tuple, the network packets are grouped into a class, and the label of the class is marked as the corresponding network application or protocol in the network communication log. Since the connection of the transport protocols TCP and UDP is life-cycled, many network flows with the same five-tuple are obtained, denoted { Flow1, flow2, … }, each consisting of a number of network packets with the same five-tuple, denoted flow= { packet-1, packet-2, …, packet-n }.
Step S105, obtaining the preprocessed protocol stream features, and preparing for clustering similar network streams.
Through the above process, a plurality of network flows are obtained, each network flow comprises a plurality of network packets, and each network packet has a corresponding protocol flow characteristic, so that preparation is made for subsequent clustering of similar network flows.
Step S120, labeling based on the similarity cluster of the network flow, and carrying out serialization operation to obtain a serialization data set.
For the pre-processed protocol stream feature set, where part of the network stream is marked and part is unmarked, in this step S120, a serialization process is required. Specifically, referring to fig. 3, the serialization process includes:
s201, detecting the similarity of network packets for the preprocessed protocol flow feature set.
For the preprocessed protocol flow feature sets, the similarity among network packets can be calculated, each network packet has the same protocol flow feature set, and the similarity and the relevance of different network packets can be calculated according to different values of each feature.
In one embodiment, for the obtained network flow, the similarity of the first N network packets in the network flow is calculated. The value of N can be 5,8, 10 and the like according to the empirical value, and the value of N can be properly adjusted according to the actual application scene. Preferably, european metrics are employed to calculate the similarity between network packets. Because the European metric pays attention to the difference in numerical value, abnormal network packets can be detected, abnormal points are removed, the calculated amount of the model is reduced, and the accuracy of the model is improved.
Step S202, synthesizing a homologous network flow based on the similarity of the network packets, and clustering and labeling the network flow based on a clustering machine learning method.
Specifically, according to the result of step S201, if the calculated similarity is greater than the set threshold, merging the network packets into a homologous network flow; those network packets for which the similarity is less than the threshold may participate in the calculation of the next TCP/UDP protocol lifecycle, or may be deleted directly. Then, the generated network flows are clustered and labeled by a clustering method based on machine learning, in this way, the network flows with close association are clustered again, and the network flows without labels are labeled. In the same cluster, network flows with multiple similar labels are winning through a voting mechanism, and the labels can be used as labels of unknown network flows.
Step S203, carrying out serialization operation on the network flow with various labels to obtain a serialized data set.
In this step, the generated various network flows (with labels L) are serialized, for example, by using an Embedding operation to sequence the network flows, so as to obtain a serialized data set, and prepare for training the convolutional neural network. The process according to fig. 3 may step-wise yield a homologous network flow, a similar network flow, and a serialized dataset.
For the processed serialized data set, in subsequent training, a portion may be used as the training data set and a portion may be used as the test training set.
Step S130, inputting the serialized data set into a cyclic neural network for training, and taking the output sequence value as the input of a one-dimensional convolution network to obtain a serialized feature vector set.
The serialized training data set obtained in step S203 is input to a plurality of recurrent neural networks such as LSTM (long short-term memory network) for training, and this chained structure shows the RNN (recurrent neural network) in close relation to the sequence and list. Usually LSTM recurrent neural networks consist of different neurons or memory units. The key to LSTM is the cell state CellState, which is similar to a conveyor belt, which passes directly through the entire chain, LSTM has the ability to add or delete information to the cell state, which is controlled by a gate structure. The LSTM cell will typically output two states to the next LSTM cell, namely a cell state and a hidden state. The memory block is responsible for controlling and protecting the respective hidden states, and this memory mode is generally implemented by three gating mechanisms, namely an input gate, a forget gate and an output gate.
Through training of the cyclic neural network, the output sequence value can be used as input of a one-dimensional convolution network to obtain a serialized feature vector.
It should be noted that other types of recurrent neural networks, such as a GRU (gate-controlled recurrent unit network) and the like, may also be employed.
And step S140, training the self-adaptive convolutional neural network by using the serialized feature vector set to obtain the predicted network flow classification.
And (3) carrying out normalization operation on the serialized feature vector set obtained in the step S130, and taking the normalized feature vector set as an input of an adaptive Convolutional Neural Network (CNN) after format conversion so as to extract feature vectors with higher dimensions.
For example, an optimized convolutional neural network includes the following layers: 1) Each convolution layer in the CNN may consist of a number of convolution units, the parameters of each convolution unit being optimized by a back propagation algorithm (Convolutional layer). The purpose of convolution is to extract local features, and the first layer of convolution may be used to extract some low-level features, such as edges, lines, corners, etc., and more network layers can extract more complex local features. The characteristic of the activation function (Activation function) is subjected to nonlinear transformation, so that the fitting capacity of the data can be enhanced; 2) BN (Batch normalization) is a training skill in deep learning, namely when training CNN by adopting a gradient descent method, normalizing the data of each mini-batch in a network layer to ensure that the mean value of the data is changed to 0 and the variance is changed to 1, and the training skill has the main function of relieving the gradient disappearance/explosion phenomenon in DNN (deep neural network) training so as to accelerate the training speed of a model; 3) A Pooling layer is periodically inserted between the convolution layers, a large number of high-dimensional features are usually generated after convolution, so that the number of parameters in a network can be reduced, the consumption of calculation resources is reduced, meanwhile, the over-fitting can be effectively controlled, the Pooling operation can be adopted to reduce the complexity of a model, and the calculation amount is reduced; 4) A full-Connected layer, which acts as a "classifier" throughout the convolutional neural network, where the SoftMax function can be used to calculate the probability of a predicted class.
And finally, obtaining a prediction classification result, comparing the prediction classification result with a true value, evaluating the quality of the prediction value by using a loss function, and gradually updating the parameter value of each layer according to a reverse gradient propagation method of the loss function until the loss function value is reduced to an optimal solution, so that the final prediction classification can be obtained.
In summary, the embodiment of the invention uses the serialized network flow, extracts the sequence packet/flow characteristics of the seven-layer network protocol at the same time, and inputs the sequence packet/flow characteristics into the recurrent neural network by using the serialized data set, so that complete and effective network sequence information can be extracted. And the sequence information is processed by a one-dimensional convolutional neural network, and then the operations such as convolution, pooling and the like are performed, so that iterative modeling can be continuously performed, updated parameters can be automatically learned, prediction error loss is reduced, and the classification precision is improved.
Correspondingly, the invention also provides a network traffic classification device based on the serialized protocol flow characteristics, which is used for realizing one aspect or more aspects of the method. For example, the apparatus includes: a network flow construction unit for dividing the collected network packets into a plurality of network flows based on the header attribute characteristics of the network packets, each network flow including a plurality of network packets; the network flow clustering unit is used for calculating the similarity between network packets of each network flow and clustering and labeling the network flows; a serialization unit, which is used for carrying out serialization operation on the network flow with the tag to obtain a serialization data set; the sequence data mining unit is used for inputting the serialized data set into a cyclic neural network and extracting the association relation among network packets; the sequence feature extraction unit is used for inputting the output sequence value of the cyclic neural network into the one-dimensional convolutional neural network to obtain a serialized feature vector set; and the classification prediction unit is used for inputting the serialized feature vector set into the adaptive convolutional neural network to obtain predicted network traffic classification. The units in the network flow classification device provided by the embodiment of the invention can be realized by adopting a processor or a logic device.
In summary, the invention uses the serialized network packet as the input of the convolutional neural network model, processes the serialized data by combining the convolutional neural network and extracts the high-dimensional characteristics by the convolutional neural network, thereby obviously improving the classification precision; the intermittent network packets can be connected in series by utilizing the protocol characteristics of the network packets and the periodicity of the transmission protocol TCP/UDP, so that the centralized modularized training can be realized, and the data integrity can be ensured; the network flows are subjected to serialization operation, so that the correlation between the front and rear of the network flows can be ensured, data loss is prevented, and in view of the continuity of network packets transmitted in the network, a good comprehensive classification effect can be obtained by utilizing the data characteristics in the time dimension and the space dimension; the cyclic neural network is used for processing the serialized network packet-stream characteristic data set, so that the association relation between the front and rear of the network packet sequence can be well extracted, and the convolutional neural network is used for further convolutional pooling operation, so that the complete high-dimensional sequence characteristic can be extracted, and the flow classification precision is improved.
Compared with the existing network traffic classification technology, the method and the device can improve the classification precision of the network traffic and solve the problems of difficult feature extraction and the like. For example, the problems that the manual extraction of the features is needed in the past and the granularity of the extracted features is not detailed are solved; the problem that good flow classification accuracy cannot be obtained under the condition that abnormal data are sensitive to malicious flow or abnormal data are carried in the data is solved; the method solves the problems that the sequence information of the network packets in the network transmission is not mined enough, seven-layer protocol information is not associated, and the like.
The present invention may be a system, method, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present invention.
The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.
The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.
Computer program instructions for carrying out operations of the present invention may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information for computer readable program instructions, which can execute the computer readable program instructions.
Various aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, implementation by software, and implementation by a combination of software and hardware are all equivalent.
The foregoing description of embodiments of the invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the technical improvements in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the invention is defined by the appended claims.

Claims (9)

1. A network traffic classification method based on serialized protocol flow characteristics comprises the following steps:
dividing the acquired network packets into a plurality of network flows based on the packet header attribute characteristics of the network packets, wherein each network flow comprises a plurality of network packets;
calculating the similarity between network packets of each network flow and clustering and labeling the network flows;
carrying out serialization operation on the network flow with the tag to obtain a serialization data set;
inputting the serialized data set into a cyclic neural network, and extracting the association relation among network packets;
inputting the output sequence value of the cyclic neural network into a one-dimensional convolutional neural network to obtain a serialized feature vector set;
inputting the serialized feature vector set into a self-adaptive convolutional neural network to obtain predicted network flow classification;
wherein dividing the collected network packets into a plurality of network flows based on the header attribute characteristics of the network packets comprises:
dividing network data of each layer according to protocol features of different layers of a network, extracting corresponding protocol packet features and stream features, and forming a protocol packet data set and a stream feature data set;
classifying and labeling the protocol packet data set and the flow characteristic data set according to the source IP, the destination IP, the source port, the destination port and the protocol type quintuple in each network packet to obtain a plurality of network flows, wherein each network flow comprises a plurality of network packets with the same quintuple.
2. The method of claim 1, wherein the feature values in the protocol packet data set and the flow feature data set are normalized values.
3. The method of claim 1, wherein calculating similarities between network packets of each network flow and clustering labels the network flows comprises:
for the obtained network flow, calculating the similarity of the first N network packets in the network flow, wherein N is an integer greater than or equal to 2;
determining a homologous network flow by comparing the calculated similarity with a set threshold;
and clustering and labeling the network flows by using a clustering method based on machine learning.
4. A method according to claim 3, wherein the similarity of the first N network packets in each network flow is calculated using an euclidean metric.
5. The method of claim 1, wherein the recurrent neural network is a long and short term memory network of a plurality of chained architectures.
6. The method of claim 1, wherein partitioning the network data of each layer according to protocol features of different layers of the network, extracting corresponding protocol packet features and flow features comprises:
dividing network data of an application layer, a presentation layer, a session layer, a transmission layer, a network layer, a data link layer and a physical layer based on OSI seven-layer network model protocol;
and extracting corresponding protocol packet characteristics and flow characteristics for the network data of each layer, and forming a protocol packet data set and a flow characteristic data set according to the transmission directionality of the network packets.
7. A network traffic classification device based on serialized protocol flow characteristics, comprising:
network flow construction unit: dividing the acquired network packets into a plurality of network flows based on the packet header attribute characteristics of the network packets, wherein each network flow comprises a plurality of network packets;
a network flow clustering unit: the method comprises the steps of calculating the similarity between network packets of each network flow and clustering and labeling the network flows;
serialization unit: the method comprises the steps of carrying out serialization operation on a network flow with a label to obtain a serialization data set;
sequence data mining unit: the method comprises the steps of inputting the serialized data set into a cyclic neural network, and extracting the association relation among network packets;
a sequence feature extraction unit: the method comprises the steps of inputting an output sequence value of the cyclic neural network into a one-dimensional convolutional neural network to obtain a serialized feature vector set;
classification prediction unit: the method comprises the steps of inputting the serialized feature vector set into an adaptive convolutional neural network to obtain predicted network traffic classification;
wherein dividing the collected network packets into a plurality of network flows based on the header attribute characteristics of the network packets comprises:
dividing network data of each layer according to protocol features of different layers of a network, extracting corresponding protocol packet features and stream features, and forming a protocol packet data set and a stream feature data set;
classifying and labeling the protocol packet data set and the flow characteristic data set according to the source IP, the destination IP, the source port, the destination port and the protocol type quintuple in each network packet to obtain a plurality of network flows, wherein each network flow comprises a plurality of network packets with the same quintuple.
8. A computer readable storage medium having stored thereon a computer program, wherein the program when executed by a processor realizes the steps of the method according to any of claims 1 to 6.
9. A computer device comprising a memory and a processor, on which memory a computer program is stored which can be run on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 6 when the program is executed.
CN202010150723.5A 2020-03-06 2020-03-06 Network traffic classification method and device based on serialized protocol flow characteristics Active CN111431819B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010150723.5A CN111431819B (en) 2020-03-06 2020-03-06 Network traffic classification method and device based on serialized protocol flow characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010150723.5A CN111431819B (en) 2020-03-06 2020-03-06 Network traffic classification method and device based on serialized protocol flow characteristics

Publications (2)

Publication Number Publication Date
CN111431819A CN111431819A (en) 2020-07-17
CN111431819B true CN111431819B (en) 2023-06-20

Family

ID=71547447

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010150723.5A Active CN111431819B (en) 2020-03-06 2020-03-06 Network traffic classification method and device based on serialized protocol flow characteristics

Country Status (1)

Country Link
CN (1) CN111431819B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113079150B (en) * 2021-03-26 2022-09-30 深圳供电局有限公司 Intrusion detection method for power terminal equipment
CN113315790B (en) * 2021-07-29 2021-11-02 湖南华菱电子商务有限公司 Intrusion flow detection method, electronic device and storage medium
CN113766545B (en) * 2021-09-30 2024-04-09 贝壳找房(北京)科技有限公司 Identity recognition method and device for wireless network
CN114595879B (en) * 2022-03-03 2022-09-06 大连理工大学 Characteristic particle sequence LSTM-based quasi-periodic energy long-term prediction method
CN114697272B (en) * 2022-03-03 2023-06-16 安徽师范大学 Traffic classification method, system and computer readable storage medium
CN116192997B (en) * 2023-02-21 2023-12-01 兴容(上海)信息技术股份有限公司 Event detection method and system based on network flow
CN117176664A (en) * 2023-08-28 2023-12-05 枣庄福缘网络科技有限公司 Abnormal flow monitoring system for Internet of things

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110213244A (en) * 2019-05-15 2019-09-06 杭州电子科技大学 A kind of network inbreak detection method based on space-time characteristic fusion
CN110391958A (en) * 2019-08-15 2019-10-29 北京中安智达科技有限公司 A kind of pair of network encryption flow carries out feature extraction automatically and knows method for distinguishing

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108200006B (en) * 2017-11-21 2020-12-18 中国科学院声学研究所 Network traffic classification method and device based on hierarchical spatiotemporal feature learning
CN109995601B (en) * 2017-12-29 2020-12-01 中国移动通信集团上海有限公司 Network traffic identification method and device
US11368476B2 (en) * 2018-02-22 2022-06-21 Helios Data Inc. Data-defined architecture for network data management
US20190273510A1 (en) * 2018-03-01 2019-09-05 Crowdstrike, Inc. Classification of source data by neural network processing
CN109871948A (en) * 2019-03-26 2019-06-11 中国人民解放军陆军工程大学 A kind of application protocol recognition method based on two-dimensional convolution neural network
CN110413786B (en) * 2019-07-26 2021-12-28 北京智游网安科技有限公司 Data processing method based on webpage text classification, intelligent terminal and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110213244A (en) * 2019-05-15 2019-09-06 杭州电子科技大学 A kind of network inbreak detection method based on space-time characteristic fusion
CN110391958A (en) * 2019-08-15 2019-10-29 北京中安智达科技有限公司 A kind of pair of network encryption flow carries out feature extraction automatically and knows method for distinguishing

Also Published As

Publication number Publication date
CN111431819A (en) 2020-07-17

Similar Documents

Publication Publication Date Title
CN111431819B (en) Network traffic classification method and device based on serialized protocol flow characteristics
US11663067B2 (en) Computerized high-speed anomaly detection
Yu et al. PBCNN: packet bytes-based convolutional neural network for network intrusion detection
US10187401B2 (en) Hierarchical feature extraction for malware classification in network traffic
CN110348526B (en) Equipment type identification method and device based on semi-supervised clustering algorithm
Wang et al. App-net: A hybrid neural network for encrypted mobile traffic classification
Zhao et al. A semi-self-taught network intrusion detection system
CN110046297B (en) Operation and maintenance violation identification method and device and storage medium
Wang et al. A multitask learning-based network traffic prediction approach for SDN-enabled industrial internet of things
Sirisha et al. Intrusion detection models using supervised and unsupervised algorithms-a comparative estimation
Fu et al. A multi-label multi-view learning framework for in-app service usage analysis
CN114492601A (en) Resource classification model training method and device, electronic equipment and storage medium
Hur et al. Entropy-based pruning method for convolutional neural networks
WO2022257421A1 (en) Cluster anomaly detection method, apparatus, and related device
Yujie et al. End-to-end android malware classification based on pure traffic images
Ali et al. Survey on encode biometric data for transmission in wireless communication networks
CN113971733A (en) Model training method, classification method and device based on hypergraph structure
Yang Optimized and Automated Machine Learning Techniques towards IoT Data Analytics and Cybersecurity
Ming et al. Sleeping cell detection for resiliency enhancements in 5g/b5g mobile edge-cloud computing networks
CN114726876A (en) Data detection method, device, equipment and storage medium
CN111709366A (en) Method, apparatus, electronic device, and medium for generating classification information
Xie et al. Research and application of intrusion detection method based on hierarchical features
CN112367325A (en) Unknown protocol message clustering method and system based on closed frequent item mining
Li et al. Efficient poisoning attacks and defenses for unlabeled data in ddos prediction of intelligent transportation systems
Qiu et al. Abnormal Traffic Detection Method of Internet of Things Based on Deep Learning in Edge Computing Environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant