CN114697272B - Traffic classification method, system and computer readable storage medium - Google Patents

Traffic classification method, system and computer readable storage medium Download PDF

Info

Publication number
CN114697272B
CN114697272B CN202210203700.5A CN202210203700A CN114697272B CN 114697272 B CN114697272 B CN 114697272B CN 202210203700 A CN202210203700 A CN 202210203700A CN 114697272 B CN114697272 B CN 114697272B
Authority
CN
China
Prior art keywords
sequence
fractal
similarity
alpha
space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210203700.5A
Other languages
Chinese (zh)
Other versions
CN114697272A (en
Inventor
汤萍萍
王再见
汝佳冉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Normal University
Original Assignee
Anhui Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Normal University filed Critical Anhui Normal University
Priority to CN202210203700.5A priority Critical patent/CN114697272B/en
Publication of CN114697272A publication Critical patent/CN114697272A/en
Application granted granted Critical
Publication of CN114697272B publication Critical patent/CN114697272B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2441Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Probability & Statistics with Applications (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a flow classification method, a system and a computer readable storage medium, wherein the flow classification method comprises the following steps: s1, obtaining a network flow F to be classified I Obtaining F I A spatial sequence P and a temporal sequence T of (a); s2, establishing fractal features f in two dimensions of space and time according to the space sequence P and the time sequence T P (alpha) and f T (α) defining a high-dimensional fractal M; s3, defining a similarity measure according to the high-dimensional fractal M; s4, converting the vector matrix with similar measurement into similarity; s5, classifying the network flows according to the similarity. The flow classification method, the system and the computer readable storage medium reflect a series of change characteristics obtained by observing network flows from different space and time scales, represent the space-time correlation between data, so as to classify, not only ensure the stability of fine classification, but also enhance the accuracy of fine classification.

Description

Traffic classification method, system and computer readable storage medium
Technical Field
The present invention relates to the field of traffic classification technologies, and in particular, to a traffic classification method, system, and computer readable storage medium.
Background
With the increasing innovation of network technology, network traffic is exploded. Traffic with different QoS (Quality of Service) requirements, such as minute and second delays in video conferencing or loss of pictures can lead to economic losses or decision errors, while traffic classification can help to implement differentiated services; network bandwidth is occupied by a large amount of garbage flow (high consumption and low value), and the use of bandwidth resources can be optimized through online identification; in addition, malicious traffic online detection (such as broadcast storm attack) can enhance network security and ensure system confidentiality and availability. In short, traffic classification is a basic technology for solving a series of important problems such as resource management, network monitoring, security control and the like, and is an important research problem in the field of communication.
As network traffic is more and more classified, classification granularity is finer and finer. For fine-grained classification, the statistical feature method is invaginated by the "feature engineering" problem, 2227676s. The deep learning method obtains fine classification capability by enhancing detail features, but the network structure becomes complex, and setting parameters and super parameters is a difficult task; if new classes are added, all parameters and even the system architecture face readjustment, which severely restricts the application of online classification.
The existing fractal method generally obtains classification granularity in a mode of sacrificing classification speed, and is difficult to achieve both classification accuracy and classification speed.
Disclosure of Invention
In view of the above, the present invention aims to provide a flow classification method, system and computer readable storage medium, which can more accurately classify flows; in addition, unlike traditional continuous fractal, high-dimensional fractal is a discrete fractal feature, so that the calculated speed is greatly improved.
The technical scheme of the invention is realized as follows:
the invention provides a flow classification method, which comprises the following steps:
s1, obtaining a network flow F to be classified I Obtaining F I A spatial sequence P and a temporal sequence T of (a);
s2, establishing fractal features f in two dimensions of space and time according to the space sequence P and the time sequence T P (alpha) and f T (α), defining a high-dimensional fractal:
M=f P (α)*f T (α) T
wherein f P (alpha) and f T The scale of the (α) observation is at least q=1 and at most
Figure BDA0003530599980000011
S3, defining a similarity measure according to the high-dimensional fractal M;
Figure BDA0003530599980000021
wherein M is a Representing stream F i a High-dimensional fractal of M b Representing stream F i b Is a high-dimensional fractal of (2);
s4, converting the vector matrix of the similarity measure into similarity:
Figure BDA0003530599980000022
s5, classifying the network flows according to the similarity.
Preferably, the spatial sequence P and the time sequence T establish fractal features f in two dimensions of space and time P (alpha) and f T (α) specifically includes:
the spatial sequence p= { P i Sum time series t= { T i Respectively brought into the following formula to form fractal features f in two dimensions of space and time P (alpha) and f T (α);
Let x= { Xi, i=1, 2, …, N } be a discrete random sequence and possess fractal characteristics. Dividing the discrete sequence { X (i) } into m non-overlapping blocks, and carrying out merging operation on the blocks to obtain an m-order merging sequence:
Figure BDA0003530599980000023
q-order calculation and summation are carried out on the m-order coalescence sequence:
Figure BDA0003530599980000024
finally obtain
Figure BDA0003530599980000025
Preferably, the S4 specifically includes:
for similarity matrices A and P -1 AP,tr(P -1 AP)=tr(PP -1 A) Tr (a), where tr (·) is the trace of the matrix and M is the fractal feature f of the spatial sequence P Fractal features of (alpha) with time series f T Cross multiplication of (α), thus tr (f) P (α)f T (α) T )=f T (α) T f P (α) converting the vector matrix of similarity measures into a similarity:
Figure BDA0003530599980000026
preferably, the step S5 specifically includes:
s51: let there are L classes at present
Figure BDA0003530599980000027
Each class has several streams { …, F I j ,F I k … }, the center point is denoted as
Figure BDA0003530599980000028
The center point is determined by the following formula:
Figure BDA0003530599980000029
s52: for network flow F I a When classifying, calculating the similarity between the stream and each center point
Figure BDA00035305999800000210
The following operations were selected to be most similar:
Figure BDA0003530599980000031
wherein the network flow F I a And a center point P l If the similarity of (2) is greater than or equal to the threshold T, then F I a Belonging to class P l The method comprises the steps of carrying out a first treatment on the surface of the If the similarity is less than the threshold T, then F I a Not of class P l
The invention also provides a flow classification system, which comprises:
an acquisition module for acquiring the network flow F to be classified I Obtaining F I A spatial sequence P and a temporal sequence T of (a);
the fractal module is used for establishing fractal characteristics f in two dimensions of space and time according to the space sequence P and the time sequence T P (alpha) and f T (α), defining a high-dimensional fractal:
M=f P (α)*f T (α) T
wherein f P (alpha) and f T The scale of the (α) observation is at least q=1 and at most
Figure BDA0003530599980000036
The high-dimensional fractal module is used for defining a similarity measure according to the high-dimensional fractal M;
Figure BDA0003530599980000032
wherein M is a Representing stream F i a High-dimensional fractal of M b Representing stream F i b Is a high-dimensional fractal of (2);
converting the vector matrix of similarity measures into similarity:
Figure BDA0003530599980000033
and the classification module is used for classifying the network flows according to the similarity.
PreferablyThe spatial sequence P and the time sequence T establish fractal characteristics f in two dimensions of space and time P (alpha) and f T (α) specifically includes:
the spatial sequence p= { P i Sum time series t= { T i Respectively brought into the following formula to form fractal features f in two dimensions of space and time P (alpha) and f T (α);
Let x= { Xi, i=1, 2, …, N } be a discrete random sequence and possess fractal characteristics. Dividing the discrete sequence { X (i) } into m non-overlapping blocks, and carrying out merging operation on the blocks to obtain an m-order merging sequence:
Figure BDA0003530599980000034
q-order calculation and summation are carried out on the m-order coalescence sequence:
Figure BDA0003530599980000035
finally obtain
Figure BDA0003530599980000041
Preferably, the high-dimensional fractal module is specifically configured to:
for similarity matrices A and P -1 AP,tr(P -1 AP)=tr(PP -1 A) Tr (a), wherein tr (·) is the trace of the matrix;
m is the fractal feature f of the spatial sequence P Fractal features of (alpha) with time series f T Cross multiplication of (α), thus tr (f) P (α)f T (α) T )=f T (α) T f P (α) converting the vector matrix of similarity measures into a similarity:
Figure BDA0003530599980000042
preferably, the classification module is specifically configured to:
let there are L classes at present
Figure BDA0003530599980000043
Each class has several streams { …, F I j ,F I k …, center point is denoted +.>
Figure BDA0003530599980000044
The center point is determined by the following formula:
Figure BDA0003530599980000045
for network flow F I a When classifying, the similarity Sim (M a ,M Pl ) The following operations were selected to be most similar:
Figure BDA0003530599980000046
wherein the network flow F I a And a center point P l If the similarity of (2) is greater than or equal to the threshold T, then F I a Belonging to class P l The method comprises the steps of carrying out a first treatment on the surface of the If the similarity is less than the threshold T, then F I a Not of class P l
The invention also proposes a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a flow classification method according to any of the preceding claims.
The discrete fractal features formed by carrying out space-time separation on the flow reflect a series of change features obtained by observing network flows from different space and time scales, embody the space-time correlation between data and classify the data, not only can ensure the stability of fine classification, but also can enhance the accuracy of fine classification.
Drawings
FIG. 1 is a flow chart of a flow classification method according to an embodiment of the present invention;
FIG. 2 is a block diagram of a flow classification system according to an embodiment of the present invention;
fig. 3 is a high-dimensional fractal flow detail profile.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, an embodiment of the present invention provides a flow classification method, which includes the following steps:
s1, obtaining a network flow F to be classified I Obtaining F I A spatial sequence P and a temporal sequence T of (a);
s2, establishing fractal features f in two dimensions of space and time according to the space sequence P and the time sequence T P (alpha) and f T (α), defining a high-dimensional fractal:
M=f P (α)*f T (α) T
wherein f P (alpha) and f T The scale of the (α) observation is at least q=1 and at most
Figure BDA0003530599980000056
S3, defining a similarity measure according to the high-dimensional fractal M;
Figure BDA0003530599980000051
wherein M is a Representing stream F i a High-dimensional fractal of M b Representing stream F i b Is a high-dimensional fractal of (2);
s4, converting the vector matrix of the similarity measure into similarity:
Figure BDA0003530599980000052
s5, classifying the network flows according to the similarity.
Specifically, the method specifically comprises the following steps:
space-time separation
In the flow fractal theory, flow is defined as the amount of data passing through the network device per unit time I, network flow F I Is a group meeting five-tuple<Source IP, destination IP, source port, destination port, protocol>Is a data packet of (1):
Figure BDA0003530599980000053
here, P i Refers to the size, T, of the ith data packet i Refers to the time interval between the packet and the previous packet, and the resolution N is the number of packets contained in the stream.
In addition, stream F I Can be divided into a plurality of substreams, the mth substream F I (m)
Figure BDA0003530599980000054
Figure BDA0003530599980000055
Wherein N in the formula (2) m The number of packets included in the mth substream is indicated. Next, fractal features are calculated for the spatial sequence P and the temporal sequence T, respectively. In practice f (α) is often estimated using numerical methods to obtain approximations, such as unbiased estimates under the legend transform: let x= { X i I=1, 2, …, N } is a discrete random sequence and has fractal properties. Will leaveThe scattered sequence { X (i) } is divided into m non-overlapping blocks, and the blocks are combined to obtain an m-order merging sequence:
Figure BDA0003530599980000061
q-order calculation and summation are carried out on the m-order coalescence sequence:
Figure BDA0003530599980000062
finally, the method comprises the following steps:
Figure BDA0003530599980000063
here, τ (q) is the fractal estimation spectrum f (α) under legendre transformation, and τ (q) is used in the present application to quickly calculate the fractal characteristics of the flow in order to achieve quick identification of the network flow.
High-dimensional fractal
The spatial sequence p= { P i Sum time series t= { T i Respectively brought into (4-6) to form fractal features f in both spatial and temporal dimensions P (alpha) and f T (alpha). The former describes the varying characteristics of the traffic packet size; the latter describes the bursty nature of the traffic packets over time. The two vectors are subjected to cross multiplication, and the physical meaning is that the variation characteristics of the data burst quantity reflected by the network flow on different spatial and time scales are defined, so that the high-dimensional fractal is defined:
M=f P (α)*f T (α) T (7)
here, f P (α) is a fractal feature built based on the spatial sequence P, the observed scale is minimum q=1, and maximum
Figure BDA0003530599980000066
Similarly, f T And (alpha) is a fractal feature corresponding to the time sequence T. Fractal features in both the P and T dimensions describeWhen the observation scale q is from 1 to +.>
Figure BDA0003530599980000065
When in change, the flow data presents a change track in time and space. For this purpose, the observation scale f of the time series is fixed first T (α)| α=q′ Only the spatially varying features f are observed P (alpha). Observation scale f in time series T (α)| α=q′ Under the feature vector f Pa (alpha) has uniqueness, so that the orthogonal matrix f of all components P (α)*f T (α) T The network flow may be uniquely marked. Thus, the present application will distinguish between different types of network flows based on the high-dimensional fractal M.
High-dimensional fractal similarity
The high-dimensional fractal M describes a variation track of the flow along with the variation of the observation scale. One type of traffic always follows a specific protocol, transport, and therefore has similar trajectories reflecting some of the characteristics inherent to traffic. Therefore, based on the similarity of the high-dimensional fractal M, the accurate classification of the network flows is realized. To this end, the present application defines a similarity measure for the high-dimensional fractal M based on a matrix relationship:
Figure BDA0003530599980000064
here, M a Representing stream F i a High-dimensional fractal of M b Representing stream F i b Is a high-dimensional fractal of (2). For similarity matrices A and P -1 AP, tr (P- 1 AP)=tr(PP- 1 A) Tr (a), where tr (·) is the trace of the matrix. I.e. the similarity matrix has the same trace. Furthermore, the fractal feature f of the spatial sequence is represented by formula (18) M P Fractal features of (alpha) with time series f T Cross multiplication of (α), thus tr (f) P (α)f T (α) T )=f T (α) T f P (α), then, converting the vector matrix of the similarity measure shown in (9) into a scalar, and is called similarity:
Figure BDA0003530599980000071
here, sim (M) is obtainable according to formula (9) a ,M b )=Sim(M b ,M a ) And Sim (·) ranges between 0 and 1; the larger the value, the higher the similarity between the two, and in the extreme case Sim (M a ,M a ) =1, i.e. there is perfect agreement between the two.
Classification
The classification process of the application refers to a classifier design method based on kmeans. Assume that there are L classes currently
Figure BDA0003530599980000072
Each class has several streams { …, F I j ,F I k …, center point is denoted +.>
Figure BDA0003530599980000073
Because Sim (·) obeys a uniform distribution over 0-1, the center point is determined by the following formula:
Figure BDA0003530599980000074
center point P l With other points { …, F in the class I j ,F I k … are all of a relatively small amount. For network flow F I a When classifying, calculating the similarity between the stream and each center point
Figure BDA0003530599980000076
The following operations were selected to be most similar:
Figure BDA0003530599980000075
the meaning of formula (11) here is: network flow F I a And a center point P l If the similarity of (a) is largeIs equal to or greater than the threshold value, then F I a Belonging to class P l The method comprises the steps of carrying out a first treatment on the surface of the If the similarity is less than the threshold, F I a Not of class P l
Examples
Software environment of experiment: capturing real-time traffic flow by using Wireshark software; the effectiveness of the model HFM (High-dimensional Fractal Model) method was verified with a MATLAB R2016a simulation tool. The hardware configuration environment is Win10professional (64 bit/SP 1), intel (R) Core (TM) i7-7500U@2.70GHz,8GB memory.
The data sets used in this experiment were: 1) An NJUPT data set acquired in a campus network of Nanjing university of post and telecommunications contains six types of traffic; 2) Internet traffic data set UNB ISCX Network Traffic [34] Traffic data containing numerous applications such as Vimeo, youTube, ICQ, skype, facebook, bitdorent, etc. The traffic of the dataset is divided into eight categories. 3) ISP data sets collected by regional data centers in China Mobile integrate ten types of traffic, such as video streams, online games, etc.
Step 1, obtaining a time sequence P and a space sequence T. Most traffic packet-grabbing software can provide the size of each packet as well as time-of-arrival information. Taking the example of a cool video stream, P and T can be obtained by Wireshark packet grabbing:
{P i }={470,462,1494,…,68,1494,1494}
{T i }={0.000428,0.00083,…,0.151786,0.05897}
and 2, generating high-dimensional fractal. For different observation scales
Figure BDA0003530599980000081
Generating corresponding fractal features f from the time series and the space series of (15-17) P (alpha) and f T (α):
f P (α)={20.513,10.436,7.237,5.288,4.362,3.538,3.192,2.641,2.407,2.215}
f T (α)={6.285,3.217,2.163,1.722,1.338,1.176,1.035,0.919,0.814,0.752}
Generated by (7)High-dimensional fractal M QQ =f P (α)*f T (α) T . As shown in fig. 3, the high-dimensional fractal through space-time separation characterizes more flow details in two dimensions of space and time. In a physical sense, the fractal characteristics of the spatial dimension reflect the change characteristics of the flow packet size; fractal features in the time dimension reflect the bursty nature of traffic packets over time. The high-dimensional fractal can obtain more fractal detail features only by a small amount of data (2000 data packets), so that the HFM greatly improves the calculation speed on the basis of ensuring the classification accuracy.
It should be specifically noted that, the present application sets the resolution to n=2000, and these packets are sufficient to obtain the variation characteristics of the network flow to implement classification. The smaller N is, the less the calculated amount is; however, as the number of packets decreases, the high-dimensional fractal features become unstable, which is detrimental to classification. Taking a cool video stream as an example, the influence of the size of the stream sequence resolution N on the high-dimensional fractal is studied. Convection sequence, respectively taking N i = {10000,8000,6000,4000,2000,1500,1000,500}, calculating the high-dimensional fractal corresponding to the substreams; then, corresponding matrix similarity is counted
Figure BDA0003530599980000082
When N is 1 When=10000, the matrix similarity Sim (C j ,C k ) 0.984.+ -. 0.006, this result is very stable. When N is reduced i In the process, the stability is worse and worse, N 8 When=500, sim (C j ,C k ) 0.469.+ -. 0.127, the difference between sub-streams is quite large and cannot be identified. For other types of flows, repeated experiments were performed, and the situation was largely similar. Therefore, the present application ultimately selects a resolution of n=n 5 =2000, i.e. classification stability is guaranteed; the calculated amount and the memory amount are not excessively large.
The application adopts the index commonly used by a classification system: the accuracy, recall and F value are used for evaluating the classification accuracy. On the NJUPT data set, 5000 (1000 each type) are randomly selected for six types of flow such as streaming media video, voIP instant audio, web browsing, FTP file transmission, email and online game, and two-fold cross validation is performed; the average of 20 classification results was taken. The statistical results are shown in table 1. Average F, accuracy and recall were 0.953, 95.84% and 95.88%, respectively.
TABLE 1 identification rate statistics
Figure BDA0003530599980000091
The space-time complexity of HFM is minimal, and its complexity is mainly determined by the number of classified samples M and the resolution N, and is specifically analyzed as follows. The computational effort of HFM online classification is mainly focused on: 1) Preprocessing data, namely respectively generating fractal characteristics by a time sequence and a space sequence after space-time separation. As can be seen from equations (5-7), the calculated amount of this process is mainly the sum of the scan flows, i.e., O (Nlog (N)), N being the flow sequence resolution. 2) Generating a high-dimensional fractal. Taking the observation dimension
Figure BDA0003530599980000092
Therefore, the calculated amount for generating the high-dimensional fractal based on the formula (7) is O (log N) 2 ). 3) And (5) classification. The process mainly comprises the steps of calculating the difference degree of the flow to be measured and each center point>
Figure BDA0003530599980000093
And then classified according to the similarity. Because tr (f) P (α)f T (α) T )=f T (α) T f P (α),/>
Figure BDA0003530599980000094
The calculation amount is therefore O (LlogN), L being the number of classes. The overall algorithm complexity is then obtained as O (Nlog (N) + (log N) 2 +Llog N). If M streams participate in classification, the time complexity is O (MNLog (N)).
On the other hand, consider the spatial complexity. And comparing the flow to be measured with L class center points and classifying, so that the storage space required by calculation is mainly used for storing each high-dimensional fractal. Taking the observation dimension
Figure BDA0003530599980000095
Therefore, the memory space required by the high-dimensional fractal is O (log N) 2 ). The L class centers are added with the flow to be measured, so the spatial complexity is O ((M+L) (log N) 2 )。
From the above, the time complexity and the space complexity of the HFM are relatively small, and the HFM is suitable for online flow classification detection. Specifically, the traditional fractal (such as FS fractal spectrum) is to generate fractal features after fusing space dimension and time dimension; the HFM method establishes high-dimensional fractal based on two dimensions of space and time, the high-dimensional fractal shows finer fractal characteristics of flow in the space dimension and the time dimension, the detailed characteristics enable HFM to be obtained only by 2000 data packets, the FS method is based on classification accuracy which can be achieved only by 10000 data packets, and the HFM greatly improves classification rate.
As shown in fig. 2, the present invention further provides a flow classification system, including:
an acquisition module 1, configured to acquire a network flow F to be classified I Obtaining F I A spatial sequence P and a temporal sequence T of (a);
a fractal module 2 for establishing fractal characteristics f in two dimensions of space and time based on the spatial sequence P and the time sequence T P (alpha) and f T (α), defining a high-dimensional fractal:
M=f P (α)*f T (α) T
wherein f P (alpha) and f T The scale of the (α) observation is at least q=1 and at most
Figure BDA0003530599980000101
The high-dimensional fractal module 3 is used for defining a similarity measure according to the high-dimensional fractal M;
Figure BDA0003530599980000102
wherein M is a Representing stream F i a High-dimensional fractal of M b Representing stream F i b Is a high-dimensional fractal of (2);
converting the vector matrix of similarity measures into similarity:
Figure BDA0003530599980000103
and the classification module 4 is used for classifying the network flows according to the similarity.
Specifically, the spatial sequence P and the time sequence T establish fractal features f in two dimensions of space and time P (alpha) and f T (α) specifically includes:
the spatial sequence p= { P i Sum time series t= { T i Respectively brought into the following formula to form fractal features f in two dimensions of space and time P (alpha) and f T (α);
Let x= { Xi, i=1, 2, …, N } be a discrete random sequence and possess fractal characteristics. Dividing the discrete sequence { X (i) } into m non-overlapping blocks, and carrying out merging operation on the blocks to obtain an m-order merging sequence:
Figure BDA0003530599980000104
q-order calculation and summation are carried out on the m-order coalescence sequence:
Figure BDA0003530599980000105
finally obtain
Figure BDA0003530599980000106
In a preferred embodiment of the invention, the high-dimensional fractal module is specifically configured to:
for similarity matrices A and P -1 AP,tr(P -1 AP)=tr(PP -1 A) Tr (a), wherein tr (·) is the trace of the matrix;
m is the fractal feature f of the spatial sequence P Fractal features of (alpha) with time series f T Cross multiplication of (α), thus tr (f) P (α)f T (α) T )=f T (α) T f P (α) converting the vector matrix of similarity measures into a similarity:
Figure BDA0003530599980000107
in a preferred embodiment of the present invention, the classification module is specifically configured to:
let there are L classes at present
Figure BDA0003530599980000108
Each class has several streams { …, F I j ,F I k …, center point is denoted +.>
Figure BDA0003530599980000109
The center point is determined by the following formula:
Figure BDA0003530599980000111
for network flow F I a When classifying, calculating the similarity between the stream and each center point
Figure BDA0003530599980000113
The following operations were selected to be most similar:
Figure BDA0003530599980000112
wherein the network flow F I a And a center point P l If the similarity of (2) is greater than or equal to the threshold T, then F I a Belonging to class P l The method comprises the steps of carrying out a first treatment on the surface of the If the similarity is less than the threshold T, then F I a Not of class P l
The invention also proposes a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a flow classification method according to any of the preceding claims.
The flow classification method, the system and the computer readable storage medium introduce the attention of the self-adaptive width of the whole layer, so that the model can adjust the global attention when adjusting the attention width of each layer, and the model can learn the optimal attention range. The feedforward layer with the gate control unit reduces the training steps of the model by three quarters, and the model converges to the optimal state more quickly. Compared with the traditional transformer, the method greatly saves the calculation and display memory cost while increasing the maximum visible context length of the model.
From the above description of the embodiments, it will be apparent to those skilled in the art that the present application may be implemented by means of software plus necessary general purpose hardware, or of course may be implemented by dedicated hardware including application specific integrated circuits, dedicated CPUs, dedicated memories, dedicated components and the like. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions can be varied, such as analog circuits, digital circuits, or dedicated circuits. However, a software program implementation is a preferred embodiment in many cases for the present application. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, or an optical disk of a computer, etc., including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the method of the embodiments of the present application.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). Computer readable storage media can be any available media that can be stored by a computer or data storage devices such as servers, data centers, etc. that contain an integration of one or more available media. Usable media may be magnetic media (e.g., floppy disks, hard disks, magnetic tape), optical media (e.g., DVD), or semiconductor media (e.g., solid state disk), among others.
Finally, it should be noted that: the foregoing description is only illustrative of the preferred embodiments of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims (9)

1. A method of classifying traffic, comprising the steps of:
s1, obtaining a network flow F to be classified I Obtaining F I A spatial sequence P and a temporal sequence T of (a);
s2, establishing fractal features f in two dimensions of space and time according to the space sequence P and the time sequence T P (alpha) and f T (α), defining a high-dimensional fractal:
M=f P (α)*f T (α) T
wherein the method comprises the steps of,f P (alpha) and f T The scale of the (α) observation is at least q=1 and at most
Figure QLYQS_1
S3, defining a similarity measure according to the high-dimensional fractal M:
Figure QLYQS_2
wherein M is a Representing stream F i a High-dimensional fractal of M b Representing stream F i b Is a high-dimensional fractal of (2);
s4, converting the vector matrix of the similarity measure into similarity:
Figure QLYQS_3
s5, classifying the network flows according to the similarity.
2. The traffic classification method according to claim 1, wherein the spatial sequence P and the temporal sequence T establish fractal features f in both spatial and temporal dimensions P (alpha) and f T (α) specifically includes:
the spatial sequence p= { P i Sum time series t= { T i Respectively brought into the following formula to form fractal features f in two dimensions of space and time P (alpha) and f T (α);
Let x= { Xi, i=1, 2,..n } be a discrete random sequence and possess fractal characteristics; dividing the discrete sequence { X (i) } into m non-overlapping blocks, and carrying out merging operation on the blocks to obtain an m-order merging sequence:
Figure QLYQS_4
q-order calculation and summation are carried out on the m-order coalescence sequence:
Figure QLYQS_5
finally, the method comprises the following steps:
Figure QLYQS_6
3. the traffic classification method according to claim 1, wherein S4 specifically comprises:
for similarity matrices A and P -1 AP,tr(P -1 AP)=tr(PP -1 A) Tr (a), wherein tr (·) is the trace of the matrix;
m is the fractal feature f of the spatial sequence P Fractal features of (alpha) with time series f T Cross multiplication of (α), thus tr (f) P (α)f T (α) T )=f T (α) T f P (α) converting the vector matrix of similarity measures into a similarity:
Figure QLYQS_7
4. the flow classification method according to claim 1, wherein S5 specifically includes:
s51: let there are L classes at present
Figure QLYQS_8
There are several streams per class {.. F (F) I j ,F I k ,..}, center point is denoted +.>
Figure QLYQS_9
The center point is determined by the following formula:
Figure QLYQS_10
s52: for network flow F I a When classifying, calculating the similarity between the stream and each center point
Figure QLYQS_11
The following operations were selected to be most similar:
Figure QLYQS_12
wherein the network flow F I a And a center point P l If the similarity of (2) is greater than or equal to the threshold T, then F I a Belonging to class P l The method comprises the steps of carrying out a first treatment on the surface of the If the similarity is less than the threshold T, then F I a Not of class P l
5. A traffic classification system, comprising:
an acquisition module for acquiring the network flow F to be classified I Obtaining F I A spatial sequence P and a temporal sequence T of (a);
the fractal module is used for establishing fractal characteristics f in two dimensions of space and time according to the space sequence P and the time sequence T P (alpha) and f T (α), defining a high-dimensional fractal:
M=f P (α)*f T (α) T
wherein f P (alpha) and f T The scale of the (α) observation is at least q=1 and at most
Figure QLYQS_13
The high-dimensional fractal module is used for defining a similarity measure according to the high-dimensional fractal M:
Figure QLYQS_14
wherein M is a Representing stream F i a Is higher than the height of (1)Dimension fractal, M b Representing stream F i b Is a high-dimensional fractal of (2);
converting the vector matrix of similarity measures into similarity:
Figure QLYQS_15
and the classification module is used for classifying the network flows according to the similarity.
6. The traffic classification system according to claim 5, wherein said spatial sequence P and temporal sequence T establish fractal features f in both spatial and temporal dimensions P (alpha) and f T (α) specifically includes:
the spatial sequence p= { P i Sum time series t= { T i Respectively brought into the following formula to form fractal features f in two dimensions of space and time P (alpha) and f T (α);
Let x= { Xi, i=1, 2,..n } be a discrete random sequence and possess fractal characteristics; dividing the discrete sequence { X (i) } into m non-overlapping blocks, and carrying out merging operation on the blocks to obtain an m-order merging sequence:
Figure QLYQS_16
q-order calculation and summation are carried out on the m-order coalescence sequence:
Figure QLYQS_17
finally obtain
Figure QLYQS_18
7. The traffic classification system according to claim 5, wherein said high-dimensional fractal module is specifically configured to:
for similarity matrices A and P -1 AP,tr(P -1 AP)=tr(PP -1 A) Tr (a), wherein tr (·) is the trace of the matrix;
m is the fractal feature f of the spatial sequence P Fractal features of (alpha) with time series f T Cross multiplication of (α), thus tr (f) P (α)f T (α) T )=f T (α) T f P (α) converting the vector matrix of similarity measures into a similarity:
Figure QLYQS_19
8. the traffic classification system according to claim 5, wherein said classification module is specifically configured to:
let there are L classes at present
Figure QLYQS_20
There are several streams per class {.. F (F) I j ,F I k ,..}, center point is denoted +.>
Figure QLYQS_21
The center point is determined by the following formula:
Figure QLYQS_22
for network flow F I a When classifying, calculating the similarity between the stream and each center point
Figure QLYQS_23
The following operations were selected to be most similar:
Figure QLYQS_24
wherein the network flow F I a And a center point P l If the similarity of (2) is greater than or equal to the threshold T, then F I a Belonging to class P l The method comprises the steps of carrying out a first treatment on the surface of the If the similarity is less than the threshold T, then F I a Not of class P l
9. Computer readable storage medium, characterized in that the storage medium has stored thereon a computer program which, when executed by a processor, implements the flow classification method according to any of claims 1-4.
CN202210203700.5A 2022-03-03 2022-03-03 Traffic classification method, system and computer readable storage medium Active CN114697272B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210203700.5A CN114697272B (en) 2022-03-03 2022-03-03 Traffic classification method, system and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210203700.5A CN114697272B (en) 2022-03-03 2022-03-03 Traffic classification method, system and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN114697272A CN114697272A (en) 2022-07-01
CN114697272B true CN114697272B (en) 2023-06-16

Family

ID=82137530

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210203700.5A Active CN114697272B (en) 2022-03-03 2022-03-03 Traffic classification method, system and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN114697272B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116825215B (en) * 2023-02-28 2024-04-16 福建省龙德新能源有限公司 Fluid circulation reaction control system and method for lithium hexafluorophosphate preparation

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110784381A (en) * 2019-11-05 2020-02-11 安徽师范大学 Flow classification method based on particle calculation
CN111431819A (en) * 2020-03-06 2020-07-17 中国科学院深圳先进技术研究院 Network traffic classification method and device based on serialized protocol flow characteristics

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110784381A (en) * 2019-11-05 2020-02-11 安徽师范大学 Flow classification method based on particle calculation
CN111431819A (en) * 2020-03-06 2020-07-17 中国科学院深圳先进技术研究院 Network traffic classification method and device based on serialized protocol flow characteristics

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Online traffic classification using granules;Pingping Tang, etc.;《IEEE》;全文 *
基于M值概率分布的网络视频流分类;杨凌云等;《电子与信息学报》;第40卷(第5期);第1094-1100页 *

Also Published As

Publication number Publication date
CN114697272A (en) 2022-07-01

Similar Documents

Publication Publication Date Title
CN107864168B (en) Method and system for classifying network data streams
US10091675B2 (en) System and method for estimating an effective bandwidth
EP1900150B1 (en) Method and monitoring system for sample-analysis of data comprising a multitude of data packets
CN107181724A (en) A kind of recognition methods for cooperateing with stream, system and the server using this method
CN110225037B (en) DDoS attack detection method and device
CN107967488B (en) Server classification method and classification system
CN114697272B (en) Traffic classification method, system and computer readable storage medium
Xu et al. Modeling buffer starvations of video streaming in cellular networks with large-scale measurement of user behavior
Markovich et al. Statistical analysis and modeling of peer-to-peer multimedia traffic
Domański et al. The use of a non-integer order PI controller with an active queue management mechanism
CN111600877A (en) LDoS attack detection method based on MF-Ada algorithm
Mori et al. Flow analysis of internet traffic: World Wide Web versus peer‐to‐peer
Del Rio et al. On the processing time for detection of Skype traffic
CN113055333B (en) Network flow clustering method and device capable of adaptively and dynamically adjusting density grid
Orcik et al. Prediction of speech quality based on resilient backpropagation artificial neural network
CN108141377B (en) Early classification of network flows
US8284764B1 (en) VoIP traffic behavior profiling method
Yan et al. Predicting freezing of webrtc videos in wifi networks
Hagos et al. Classification of delay-based TCP algorithms from passive traffic measurements
Tang et al. Classification of Internet video traffic using multi-fractals
Gu et al. Fast traffic classification using joint distribution of packet size and estimated protocol processing time
RU2813361C1 (en) Secure video conferencing quality assessment system
Markovich Modeling of dependence in a peer-to-peer video application
Wang et al. Skype traffic identification based on trends-aware protocol fingerprints
Cui et al. Feature selection algorithm based on correlation between muti metric network traffic flow features.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant