WO2010116036A1 - Method and device for identifying applications which generate data traffic flows - Google Patents

Method and device for identifying applications which generate data traffic flows Download PDF

Info

Publication number
WO2010116036A1
WO2010116036A1 PCT/FI2010/050275 FI2010050275W WO2010116036A1 WO 2010116036 A1 WO2010116036 A1 WO 2010116036A1 FI 2010050275 W FI2010050275 W FI 2010050275W WO 2010116036 A1 WO2010116036 A1 WO 2010116036A1
Authority
WO
WIPO (PCT)
Prior art keywords
application
preliminary identification
data
traffic flow
data traffic
Prior art date
Application number
PCT/FI2010/050275
Other languages
French (fr)
Inventor
Matti Hirvonen
Jukka-Pekka Laulajainen
Original Assignee
Valtion Teknillinen Tutkimuskeskus
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Valtion Teknillinen Tutkimuskeskus filed Critical Valtion Teknillinen Tutkimuskeskus
Publication of WO2010116036A1 publication Critical patent/WO2010116036A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/142Network analysis or design using statistical or mathematical methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/02Capturing of monitoring data
    • H04L43/028Capturing of monitoring data by filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Definitions

  • the invention relates generally to a method and a device for identifying applications which generate data traffic flows. Furthermore, the invention relates to a network element and a computer program suitable for identifying applications which generate data traffic flows.
  • An application can be, for example, the electronic mail, the file transfer protocol (FTP), the hypertext transfer protocol (HTTP), the Secure Shell (SSH), the voice transfer e.g. the Voice over Internet (VoIP), or any other application that generates a data traffic flow to a communication network.
  • FTP file transfer protocol
  • HTTP hypertext transfer protocol
  • SSH Secure Shell
  • VoIP Voice over Internet
  • the identification of application may be needed, for example, for management and optimisation of the quality of service (QoS), for an intrusion detection system (IDS), and/or for an intrusion prevention system (IPS).
  • QoS quality of service
  • IDS intrusion detection system
  • IPS intrusion prevention system
  • Publication US2006277288 discloses a system in which applications that generate data traffic flows are identified by analyzing network traffic and network host information.
  • the network host information may be collected by network host monitors associated with network hosts.
  • Network traffic and network host information are evaluated against data traffic flow profiles to identify data traffic flows. If a data traffic flow is identified with high certainty and are associated with previously identified applications, then data traffic flow policies can be applied to the data traffic flows to block, throttle, accelerate, enhance, or transform the data traffic flows. If a data traffic flow is identified with lesser certainty or is not associated with a previously identified application, then a new data traffic flow profile can be created from further analysis of network traffic information, network host information, and possibly additional network host information collected to enhance the analysis.
  • the application identification system is able to dynamically modify the set of data traffic flow profiles being used in order to keep in touch with changing circumstances.
  • the above-discussed application identification system is able to at least in some extent to identify applications that pose as another application and/or use encryption.
  • An inconvenience related to the above-discussed application identification system is that it may be in some situations difficult to distinguish between applications that generate data traffic flows having mutually similar traffic characteristics.
  • a new device for identifying an application generating a data traffic flow comprises a processing system arranged to:
  • the first and second preliminary identifications of the application can be made, for example, using first and second classification data obtained with the K-means clustering algorithm. Details of the K-means clustering algorithm can be found, for example, from the book "Clustering
  • a new method for identifying an application generating a data traffic flow comprises:
  • the network element according to the invention is arranged to receive a data traffic flow generated by an application and comprises a processing system arranged to:
  • the network element can be, for example, an IP-router (Internet Protocol), Ethernet switch, ATM-switch (Asynchronous Transfer Mode), base station of a mobile communications network, an MPLS-switch (Multiprotocol Label Switching), or a combination of two or more of the aforementioned.
  • IP-router Internet Protocol
  • Ethernet switch Ethernet switch
  • ATM-switch Asynchronous Transfer Mode
  • base station of a mobile communications network an MPLS-switch (Multiprotocol Label Switching), or a combination of two or more of the aforementioned.
  • MPLS-switch Multiprotocol Label Switching
  • the network element can be as well a user terminal device that can be, for example, a mobile phone, a palmtop computer, a personal digital assistant, or a combination of two or more of the aforementioned.
  • the network element can be as well a home or office sited network element such as e.g. an Ethernet switch, an IP- router (Internet Protocol) or a WLAN-AP (Wireless Local Area Network - Access Point).
  • a new computer program for identifying an application generating a data traffic flow comprises computer executable instructions for controlling a programmable processor to:
  • a computer program product according to the invention comprises a computer readable medium, e.g. a compact disc (CD) or a random access memory (RAM), encoded with a computer program according to the invention.
  • a computer readable medium e.g. a compact disc (CD) or a random access memory (RAM)
  • CD compact disc
  • RAM random access memory
  • figure 1 shows a high-level flow chart of a method according to an embodiment of the invention for identifying an application generating a data traffic flow
  • figure 2 shows a flow chart of a method according to an embodiment of the invention for identifying an application generating a data traffic flow
  • figure 3 shows a flow chart of a method according to an embodiment of the invention for identifying an application generating a data traffic flow
  • figures 4a and 4b show a flow chart of making a preliminary identification of an application in a method according to an embodiment of the invention for identifying the application
  • figure 5 shows a schematic illustration of a device according to an embodiment of the invention for identifying an application generating a data traffic flow
  • figure 6 shows a schematic illustration of a network element according to an embodiment of the invention for identifying an application generating a data traffic flow.
  • Figure 1 shows a high-level flow chart of a method according to an embodiment of the invention for identifying an application generating a data traffic flow.
  • the method contains two classification phases 101 and 102 and a phase 103 for making a final decision on the basis or results obtained in those two classification phases.
  • the classification phase 101 comprises making a first preliminary identification of the application on the basis of properties of one or more first data frames of the data traffic flow.
  • the one or more first data frames are data frames that are transferred at the beginning of the data traffic flow. Hence, it is possible to utilise infor- mation that is present in the data traffic flow only at the beginning of the data traffic flow.
  • the first data frames can be for example data frames that are transferred during a negotiation phase related to establishing of the data traffic flow.
  • the negotiation phase may comprise for example hand-shaking and/or other initialisation actions of the data traffic flow.
  • the data frames can be for example IP-packets (Internet Protocol), Ethernet frames, or other protocol data units (PDU).
  • the classification phase 102 comprises making a second preliminary identification of the application on the basis of statistical properties associated with a portion of the data traffic flow transferred after the transferring of the first data frames. In the classification phase 102 it is possible to utilise information that is present only when the data traffic flow is at the steady state, i.e. after possible initiations are already done.
  • the portion of the data traffic flow used for the second preliminary identification of the application may comprise for example a first pre-determined number of data frames transferred after a second pre-determined number of earlier transferred data frames.
  • the above-mentioned first and second pre- determined numbers can be e.g. 200 and 800, respectively, in which case the portion of the data traffic flow used for the second preliminary identification of the ap- plication comprises data frames 201 -1000 in the temporal order of transmission.
  • the phase 103 comprises identifying the application at least partly on the basis of the first preliminary identification made in the classification phase 101 and the second preliminary identification made in the classification phase 102.
  • parameters related to algorithms used in the classification phases 101 and 102 may include, depending on the algorithms being used, for example an indicator of reliability of the first preliminary identification and an indicator of reliability of the second preliminary identification.
  • the first preliminary identification of the application is made, in the classification phase 101 , on the basis of at least one of the following properties of the one or more first data frames transferred at the beginning of the data traffic flow: payload size, header size, uplink/downlink-transfer direction, a port number.
  • the properties of the first one or more data frames that are selected to be used in the first preliminary identification of the application constitute a feature vector of the data traffic flow for the first preliminary identification. For example, if the number of the first data frames is N, the feature vector of the data traffic flow can be for example:
  • - Feature 3 The payload size and uplink/downlink direction of a third transferred data frame
  • - Feature N The payload size and uplink/downlink direction of an N th transferred data frame.
  • the first preliminary identification of the application can be made, for example, using the feature vector and classification data obtained with the K-means clustering algorithm, the Gaussian Mixture Model, the spectral clustering, the AutoClass clustering algorithm, or the density based spatial clustering of applications with noise (DBSCAN). It is also possible to use two or more of the aforementioned algorithms and to use a suitable logic, e.g. the voting principle, for determining the result of the first preliminary identification on the basis of results obtained with different classification data based on different algorithms.
  • a suitable logic e.g. the voting principle
  • the second preliminary identification of the application is made, in the classification phase 102, on the basis of at least one of the following statistical properties related to the portion of the data traffic flow transferred after the transferring of the first data frames:
  • Total payload size i.e. sum of payload sizes of all data frames used for the second preliminary identification of the application
  • Total header size i.e. sum of header sizes of all data frames used for the second preliminary identification of the application
  • Total payload size to uplink i.e. sum of payload sizes of all data frames to uplink used for the second preliminary identification of the application
  • Total payload size to downlink i.e. sum of payload sizes of all data frames to downlink used for the second preliminary identification of the application
  • - Total header size to uplink i.e. sum of header sizes of all data frames to uplink used for the second preliminary identification of the application
  • Total header size to downlink i.e. sum of header sizes of all data frames to downlink used for the second preliminary identification of the applica- tion
  • the statistical properties that are selected to be used in the second preliminary identification of the application constitute a feature vector of the data traffic flow for the second preliminary identification.
  • the second preliminary identification of the application can be made, for example, using the feature vector and classification data obtained with the K-means clustering algorithm, the Gaussian Mixture Model, the spectral clustering, the AutoClass clustering algorithm, or the density based spatial clustering of applications with noise (DBSCAN). It is also possible to use two or more of the aforementioned algorithms and to use a suitable logic, e.g. the voting principle, for determining the result of the second preliminary identification on the basis of results obtained with different classification data based on different algorithms.
  • a suitable logic e.g. the voting principle
  • accuracy indicators are calculated for the first preliminary identification of the application and for the second preliminary identification of the application.
  • the application that has a better accuracy is selected in the phase 103 from among the one or two applications proposed by the first and second preliminary identifications.
  • the selected application represents the identified application i.e. the final identification of the application.
  • the accuracy indicators are preferably parameters related to algorithms used in the classification phases 101 and 102.
  • FIG. 2 shows a flow chart of a method according to an embodiment of the inven- tion for identifying an application generating a data traffic flow.
  • the K-means clustering algorithm is used for obtaining first classification data for the first preliminary identification of the application and for obtaining second classification data for the second preliminary identification of the application.
  • the first classification data includes first cluster descriptions and first cluster compositions that are used in the first preliminary identification of the application
  • the second classification data includes second cluster descriptions and second cluster compositions that are used in the second preliminary identification of the application. Details of the K- means clustering algorithm can be found, for example, from the book "Clustering Algorithms", J. A. Hartigan (1975), Wiley.
  • the first preliminary identification of the application is made in phases 201 and 21 1
  • the second preliminary identification of the application is made in phases 202 and 212.
  • the phase 201 comprises selecting a first cluster of one or more applications on a basis of a first feature vector based on properties of data frames transferred at the beginning of the data traffic flow.
  • the algorithm finds out whether a feature vector corresponds to any cluster, and if it does, it discovers it. Every cluster has its own density measure. This density measure is the standard deviation of distances from applications within a cluster to the centroid of this cluster. The density measure together with a pre-determined threshold value is used when discovering whether the feature vector corresponds to a coverage area of a certain cluster. When assigning the feature vector to a cluster, the selected cluster is not always the clos- est one.
  • the cluster B is a better selection for the outcome of the phase 201 than the cluster A.
  • the phase 21 1 comprises selecting a first application candidate from the selected first cluster of one or more applications. It also is possible that the result of the first preliminary identification of the application is that the first application candidate is unknown.
  • the phase 202 comprises selecting a second cluster of one or more applications on a basis of a second feature vector based on statistical properties of a portion of the data traffic flow that is transferred later than the data frames used for the first preliminary identification of the application.
  • the phase 212 comprises selecting a second application candidate from the selected second cluster of one or more applications. It also is possible that the result of the second preliminary identification of the application is that the second application candidate is un- known.
  • a phase 203 comprises making a final decision on the application to be identified at least partly on the basis of the first application candidate and the second application candidate. If the first and second application candidates are the same, the final decision on the application to be identified is preferably the application pro- posed by both the first and second preliminary identifications of application. Exemplifying alternatives for making the final decision in cases where the first and second preliminary identifications of application propose different application candidates are described below.
  • accuracy indicators are calculated for the first preliminary identification of the application and for the second preliminary identification of the application.
  • the application candidate that has a better accuracy is selected in the phase 203 from among the first and second application candidates proposed by the first and second preliminary identifications, respectively.
  • Each accuracy indicator is calculated as a proportional distance DCC/DDM, wherein the DCC is a distance between a feature vector of the data traffic flow and a centroid of a selected cluster and the DDM is the standard deviation of distances from applications within the selected cluster to the centroid of the selected cluster.
  • the feature vector of the data traffic flow is either the first feature vector used in the first preliminary identification or the second feature vector used in the second preliminary identification.
  • an occurrence probability of the first application candidate is used as an accuracy indicator for the first preliminary identification and an occurrence probability of the second application candidate is used as an accuracy indicator for the second preliminary identification.
  • An occurrence probability of an application is a probability of occurrence of the application within all applications of a corresponding cluster.
  • the occurrence probabilities of the applications A, B, and C are p, q, and 1 - p - q, respectively. Estimates for the occurrence probabilities can be determined for example on the basis of usage statistics related to the applications under considerations.
  • the final decision on the application to be identified is the application proposed by the first preliminary identification in a case in which only the second preliminary identification indicates that the application is unknown.
  • the final decision on the application to be identified is the application proposed by the second preliminary identification in a case in which only the first preliminary identification indicates that the application is unknown.
  • the final decision on the application to be identified is the application proposed by the first preliminary iden- tification if:
  • the closest cluster found in the second preliminary identification process contains the said application proposed by the first preliminary identifica- tion, the closest cluster being the cluster whose centroid is closest to the feature vector of the data traffic flow in the second preliminary identification process.
  • the final decision is that the application is unknown if the said closest cluster does not contain the said application proposed by the first preliminary identification.
  • the final decision on the application to be identified is the application proposed by the second preliminary identification if:
  • the closest cluster found in the first preliminary identification process con- tains the said application proposed by the second preliminary identification, the closest cluster being the cluster whose centroid is closest to the feature vector of the data traffic flow in the first preliminary identification process.
  • the final decision is that the application is unknown if the said closest cluster does not contain the said application proposed by the second preliminary identification.
  • Figure 3 shows a flow chart of a method according to an embodiment of the invention for identifying an application generating a data traffic flow.
  • Phases 301 , 311 , 302, 312, and 303 are similar to the phases 201 , 21 1 , 202, 212, and 203 shown in figure 2, respectively.
  • the method comprises training phases 304 and 305.
  • the K-means clustering algorithm and first pre-determined training data flows are used for forming the first classification data for the purpose of the first preliminary identification of the application.
  • the K-means clustering algorithm and second pre-determined training data flows are used for forming the second classification data for the purpose of the second preliminary identification of the application.
  • the first classification data produced in the training phase 304 includes first cluster descriptions and first cluster compositions that are used in the first preliminary identification of the application
  • the second classification data produced in the training phase 305 includes second cluster descriptions and second cluster compositions that are used in the second preliminary identification of the application.
  • the cluster descriptions may include for example information defining the centroids and the density measures of the clusters.
  • the training phases 304 and 305 are preferably implemented using an offline trainer.
  • the trainer takes the first and second training data flows as input and uses those training data flows to capture the characteristic patterns of the desired application types.
  • the trainer divides the training data flows into clusters using the K- means clustering algorithm. After the clustering, the trainer outputs the cluster de- scriptors and cluster compositions for the purposes of the first and second preliminary identification of the application.
  • the offline training can be done only once but it is also possible to update the cluster descriptors and cluster compositions within a certain period of time. In the long run, it may be good to update in order to keep in touch with possible changes in behaviour of applications.
  • the K-Means clustering algorithm that can be used in the training phases 304 and 305 can be described with the aid of the following steps: 1. calculating the distances between feature vectors and cluster centroids, each feature vector corresponding to a certain training data flow,
  • step 4 4. go back to step 1 and continue until the cluster centroids do not substantially move.
  • clusters After required iterations, clusters have formed. All feature vectors used in the train- ing contain the ground truth about the applications generating the training data flows. Therefore, the trainer knows which applications are related to which cluster. The trainer also calculates the distributions of the feature vectors inside each cluster. The distribution describes whether a cluster is very tight or if it is spread far and wide. This information can be used when recognizing unknown data traffic flows. Without this property, all data traffic flows, including those corresponding no training data flow, would be related to some application at the first and second preliminary identifications. Consequently, the trainer outputs the final cluster centroids, the distribution of the applications for each cluster and the standard deviation of the distances to the cluster centroid for each cluster.
  • Figures 4a and 4b show a flow chart of making a preliminary identification of an application in a method according to an embodiment of the invention for identifying the application.
  • the process depicted in figures 4a and 4b can be used both for the first preliminary identification of the application, the phases 201 and 21 1 in figure 2 and the phases 301 and 311 in figure 3, and for the second preliminary iden- tification of the application, the phases 202 and 212 in figure 2 and the phases 302 and 312 in figure 3.
  • Figure 4a depicts an exemplifying cluster assignment process that may correspond, for example, to the phase 201 and/or the phase 202 shown in figure 2, as well as the phase 301 and/or the phase 302 shown in figure 3.
  • a phase 421 of the cluster assignment process comprises initialisation of variables i, is_near, min_dist_in, and min_dist.
  • a phase 422 comprises calculation of a distance D(i) between a feature vector of a data traffic flow and the centroid of the cluster i.
  • a decision phase 423 comprises checking whether the feature vector belongs to the coverage area of the cluster i, i.e. checking whether the distance D(i) is less than the standard deviation of the distances in the cluster i multiplied with a predetermined threshold value T.
  • a decision phase 424 comprises checking whether the distance D(i) is smaller than the so far smallest distance over clusters whose coverage areas comprise the feature vector, i.e. it is checked whether D(i) ⁇ min_dist_in.
  • a phase 425 comprises setting the cluster i as the so far closest cluster whose coverage area comprises the feature vector, i.e. the variable clus- ter_id_in is set to i, setting the variable is_near to '1 ' in order to indicate that the feature vector belongs to an coverage area of at least one cluster, and setting the so far smallest distance over the clusters whose coverage areas comprise the fea- ture vector to the D(i), i.e. the variable min_dist_in is set to D(i).
  • a decision phase 425 comprises setting the cluster i as the so far closest cluster whose coverage area comprises the feature vector, i.e. the variable clus- ter_id_in is set to i, setting the variable is_near to '
  • 426 comprises checking whether the distance D(i) is smaller than the so far smallest distance over all clusters, i.e. it is checked whether D(i) ⁇ min_dist.
  • a decision phase 430 comprises checking whether there are any clusters left to be inspected.
  • a phase 431 comprises shifting to the next cluster to be inspected, i.e. the variable i is incremented by one, and moving back to the phase 422.
  • a decision phase 428 comprises checking whether the feature vector belongs to a coverage area of any cluster, i.e. checking whether the variable is_near is one or still zero. If the feature vector belongs to the coverage area of at least one cluster, i.e.
  • the selected cluster is indicated in a phase 429 by the variable cluster_id_in and the distance from the feature vector to the centroid of the selected cluster is indicated by the variable min_dist_in.
  • the variable clusterjd indicates the cluster the centroid of which is closest to the feature vector
  • the variable min dist indicates the distance from the feature vector to the centroid of this clos- est cluster.
  • the above-described cluster assignment process is continued by an application labelling process for providing an indication of an application corresponding to the feature vector or an indication that the applica- tion is unknown.
  • Figure 4b depicts an exemplifying labelling process.
  • the labelling process depicted in figure 4b may correspond for example to the phases 21 1 and 212 shown in figure 2 and the phases 311 and 312 shown in figure 3.
  • a port number is utilised.
  • a decision phase 433 comprises checking whether the data traffic flow under consideration uses a standard port number of a known application.
  • a phase 434 comprises determining an application that corresponds to the standard port number.
  • a decision phase 435 comprises checking whether the selected cluster to which the data traffic flow was assigned contains any application corresponding to that standard port num- ber.
  • a phase 436 comprises setting the determined application to be the outcome of the application labelling process, i.e.
  • a phase 437 comprises de- termining the dominant application among all applications of the selected cluster. If the data traffic flow is using a non-standard port number, it will be labelled according to the dominant application among those applications that use non-standard destination port numbers and belong to the selected cluster, or the data traffic flow will be labelled as unknown if the selected cluster does not contain any applica- tions that use non-standard destination port numbers.
  • a decision phase 438 comprises checking whether the selected cluster contains any application(s) that utilise ⁇ ) non-standard port numbers.
  • a phase 439 comprises determining a dominant, e.g.
  • a phase 440 comprises setting the applica- tion to be unknown, i.e. the outcome of the first or second preliminary identification of application is that the application is unknown.
  • Figure 5 shows a schematic illustration of a device according to an embodiment of the invention for identifying an application generating a data traffic flow.
  • the device can be for example a part of a network element that can be either an operator controlled network element, a home or office sited network element, or a user ter- minal device.
  • the device comprises a processing system 501 arranged to:
  • the processing system 501 may comprise one or more processor units. Each processing unit can be a programmable processor, an application specific circuit, or a field programmable circuit.
  • the device may further comprise a memory unit 502 and a data interface 503 for communicating with external systems.
  • the processing system 501 is arranged to use data frames transferred during a negotiation phase related to establishing of the data traffic flow as the above-mentioned one or more first data frames.
  • the processing system 501 is arranged to make the first preliminary identification of the application on the basis of at least one of the following properties of the above-mentioned one or more first data frames: payload size, header size, uplink/downlink-transfer direction, a port number.
  • the processing system 501 is arranged to make the first preliminary identification of the application with the aid of first classification data obtained with the K-means clustering algorithm.
  • the processing system 501 is arranged to make the first preliminary identification of the application with the aid of first classification data obtained with one of the following: the Gaussian Mixture Model, the spectral clustering, the AutoClass clustering algorithm, the density based spatial clustering of applications with noise (DBSCAN).
  • the portion of the data traffic flow used for the second preliminary identification of the application comprises a first pre-determined number of data frames transferred after a second predetermined number of earlier transferred data frames.
  • the processing system 501 is arranged to make the second preliminary identification of the application on the basis of at least one of the following statistical properties related to the portion of the data traffic flow transferred after the first data frames: average frame size, minimum frame size, maximum frame size, standard deviation of frame size, average inter-arrival time, standard deviation of the inter-arrival time.
  • the processing system 501 is arranged to make the second preliminary identification of the application with the aid of second classification data obtained with the K-means clustering algorithm.
  • the processing system 501 is arranged to make the second preliminary identification of the application with the aid of second classification data obtained with one of the following: the Gaussian Mixture Model, the spectral clustering, the AutoClass clustering algorithm, the density based spatial clustering of applications with noise (DBSCAN).
  • the processing system 501 is arranged to calculate accuracy indicators for the first preliminary identifica- tion of the application and for the second preliminary identification of the application, and to select the application that has a better accuracy from among the one or two applications according to the first and second preliminary identifications, the selected application representing the identified application.
  • the processing system 501 is arranged to use first classification data obtained with the K-means clustering algorithm for the first preliminary identification of the application and second classification data obtained with the K-means clustering algorithm for the second preliminary identification of the application, and to calculate each accuracy indica- tor as DCC/DDM.
  • the DCC is a distance between a feature vector of the data traffic flow and a centroid of a selected cluster of one or more applications
  • the DDM is a standard deviation of distances from applications within the selected cluster of applications to the centroid of the selected cluster of one or more applications.
  • the processing system 501 is arranged to use first classification data obtained with the K-means clustering algorithm for the first preliminary identification of the application and second classification data obtained with the K-means clustering algorithm for the second preliminary identification of the application, and to use an occurrence probability of the application according to the first preliminary identification as an accuracy indicator for the application according to the first preliminary identification and an occurrence probability of the application according to the second preliminary identification as an accuracy indicator for the application according to the second preliminary identification.
  • the occurrence probability of an application is the probability of occurrence of the application within all applications of a corresponding cluster of one or more applications.
  • the processing system 501 is arranged to use the K-means clustering algorithm and first pre-determined training data flows for obtaining first classification data to be used for the first pre- liminary identification of the application and to use the K-means clustering algo- rithm and second pre-determined training data flows for obtaining second classification data to be used for the second preliminary identification of the application.
  • FIG. 6 shows a schematic illustration of a network element 600 according to an embodiment of the invention for identifying an application generating a data traffic flow.
  • the network element comprises a processing system 601 arranged to:
  • the network element comprises preferably a transmitter 604 for transmitting data traffic flows to a communication network and/or a receiver 605 for receiving data traffic flows from the communication network.
  • the network element may comprise a data interface (not shown) for connecting to an external transmitter and/or to an external receiver.
  • the network element may further comprise a memory unit 602 or a data interface (not shown) for connecting to an external memory unit.
  • a network element comprises/is at least one of the following: an IP-router (Internet Protocol), Ethernet switch, ATM- switch (Asynchronous Transfer Mode), base station of a mobile communications network, MPLS-switch (Multiprotocol Label Switching), a WLAN-AP (Wireless Local Area Network - Access Point).
  • IP-router Internet Protocol
  • Ethernet switch Ethernet switch
  • ATM- switch Asynchronous Transfer Mode
  • base station of a mobile communications network MPLS-switch (Multiprotocol Label Switching)
  • WLAN-AP Wireless Local Area Network - Access Point
  • a network element is a user terminal device and comprises/is at least one of the following: a mobile phone, a palmtop computer, a personal digital assistant, a personal computer, a lap-top computer.
  • a computer program comprises a program code for controlling a programmable processor to identify an application generating a data traffic flow.
  • the program code comprises computer executable instructions for controlling the programmable processor to:
  • the computer executable instructions can be e.g. subroutines and/or functions.
  • a computer program product according to an embodiment of the invention is stored in a computer readable medium.
  • the computer readable medium can be e.g. a CD-ROM (Compact Disc Read Only Memory) or a RAM-device (Random Access Memory).
  • a computer program product is car- ried in a signal that is receivable from a communication network.
  • a computer readable medium e.g. a CD-ROM (Compact Disc Read Only Memory) or a RAM-device (Random Access Memory), according to an embodiment of the invention is encoded with a computer program according to an embodiment of the invention.
  • a computer readable medium e.g. a CD-ROM (Compact Disc Read Only Memory) or a RAM-device (Random Access Memory)
  • CD-ROM Compact Disc Read Only Memory
  • RAM-device Random Access Memory

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Algebra (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Pure & Applied Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

A method for identifying an application that generates a data traffic flow to a communication network comprises making (101) a first preliminary identification of the application on the basis of properties of first data frames transferred at the beginning of the data traffic flow, making (102) a second preliminary identification of the application on the basis of statistical properties associated with a portion of the data traffic flow transferred after the first data frames, and identifying (103) the application at least partly on the basis of the first preliminary identification and the second preliminary identification. As the identification is made in two phases, it is possible to utilise both the characteristics of the beginning of the data traffic flow and the statistical properties related to a later phase of the data traffic flow, and therefore to improve the successfulness of the identification.

Description

METHOD AND DEVICE FOR IDENTIFYING APPLICATIONS WHICH GENERATE DATA TRAFFIC FLOWS
Field of the invention
The invention relates generally to a method and a device for identifying applications which generate data traffic flows. Furthermore, the invention relates to a network element and a computer program suitable for identifying applications which generate data traffic flows.
Background
In conjunction with telecommunications, it is commonplace to have a need to identify applications which generate data traffic flows in order to be able to handle the data traffic flows in an appropriate manner in a network element that can be for example a router, a switch, a terminal device, or any other device arranged to control the data traffic flows. An application can be, for example, the electronic mail, the file transfer protocol (FTP), the hypertext transfer protocol (HTTP), the Secure Shell (SSH), the voice transfer e.g. the Voice over Internet (VoIP), or any other application that generates a data traffic flow to a communication network. The identification of application may be needed, for example, for management and optimisation of the quality of service (QoS), for an intrusion detection system (IDS), and/or for an intrusion prevention system (IPS). In present telecommunication systems, it is very often difficult or even impossible to identify an application generating an arriving data traffic flow merely on the basis of a port number related to the data traffic flow and/or on the basis of payload data analysis, because many applications are arranged to use dynamically allocated port numbers and, especially in a case of hostile activities, an application can pose as another application and/or use encryption for intentionally avoiding identification.
Publication US2006277288 discloses a system in which applications that generate data traffic flows are identified by analyzing network traffic and network host information. The network host information may be collected by network host monitors associated with network hosts. Network traffic and network host information are evaluated against data traffic flow profiles to identify data traffic flows. If a data traffic flow is identified with high certainty and are associated with previously identified applications, then data traffic flow policies can be applied to the data traffic flows to block, throttle, accelerate, enhance, or transform the data traffic flows. If a data traffic flow is identified with lesser certainty or is not associated with a previously identified application, then a new data traffic flow profile can be created from further analysis of network traffic information, network host information, and possibly additional network host information collected to enhance the analysis. Hence, the application identification system is able to dynamically modify the set of data traffic flow profiles being used in order to keep in touch with changing circumstances. As the data traffic flows are identified at least partly by analyzing the network traffic, the above-discussed application identification system is able to at least in some extent to identify applications that pose as another application and/or use encryption. An inconvenience related to the above-discussed application identification system is that it may be in some situations difficult to distinguish between applications that generate data traffic flows having mutually similar traffic characteristics.
Summary
In accordance with a first aspect of the invention there is provided a new device for identifying an application generating a data traffic flow. The device according to the invention comprises a processing system arranged to:
- make a first preliminary identification of the application on the basis of properties of one or more first data frames of the data traffic flow, the one or more first data frames being transferred at the beginning of the data traffic flow,
- make a second preliminary identification of the application on the basis of statistical properties associated with a portion of the data traffic flow transferred after the transferring of the first data frames, and - identify the application at least partly on the basis of the first preliminary identification and the second preliminary identification.
As the identification of the application is carried out in two phases, it is possible to utilise both the characteristics of the beginning of the data traffic flow and also the statistical properties related to a later portion of the data traffic flow. Hence, the successfulness of the identification of the application is improved compared with the prior art described earlier in this document. The first and second preliminary identifications of the application can be made, for example, using first and second classification data obtained with the K-means clustering algorithm. Details of the K-means clustering algorithm can be found, for example, from the book "Clustering
Algorithms", J. A. Hartigan (1975), Wiley.
In accordance with a second aspect of the invention there is provided a new method for identifying an application generating a data traffic flow. The method according to the invention comprises:
- making a first preliminary identification of the application on the basis of properties of one or more first data frames of the data traffic flow, the one or more first data frames being transferred at the beginning of the data traffic flow,
- making a second preliminary identification of the application on the basis of statistical properties associated with a portion of the data traffic flow transferred after the transferring of the first data frames, and
- identifying the application at least partly on the basis of the first preliminary identification and the second preliminary identification.
In accordance with a third aspect of the invention there is provided a new network element. The network element according to the invention is arranged to receive a data traffic flow generated by an application and comprises a processing system arranged to:
- make a first preliminary identification of the application on the basis of properties of one or more first data frames of the data traffic flow, the one or more first data frames being transferred at the beginning of the data traffic flow,
- make a second preliminary identification of the application on the basis of statistical properties associated with a portion of the data traffic flow trans- ferred after the transferring of the first data frames, and
- identify the application at least partly on the basis of the first preliminary identification and the second preliminary identification.
The network element can be, for example, an IP-router (Internet Protocol), Ethernet switch, ATM-switch (Asynchronous Transfer Mode), base station of a mobile communications network, an MPLS-switch (Multiprotocol Label Switching), or a combination of two or more of the aforementioned.
The network element can be as well a user terminal device that can be, for example, a mobile phone, a palmtop computer, a personal digital assistant, or a combination of two or more of the aforementioned. The network element can be as well a home or office sited network element such as e.g. an Ethernet switch, an IP- router (Internet Protocol) or a WLAN-AP (Wireless Local Area Network - Access Point).
In accordance with a fourth aspect of the invention there is provided a new computer program for identifying an application generating a data traffic flow. The computer program according to the invention comprises computer executable instructions for controlling a programmable processor to:
- make a first preliminary identification of the application on the basis of properties of one or more first data frames of the data traffic flow, the one or more first data frames being transferred at the beginning of the data traffic flow,
- make a second preliminary identification of the application on the basis of statistical properties associated with a portion of the data traffic flow transferred after the transferring of the first data frames, and - identify the application at least partly on the basis of the first preliminary identification and the second preliminary identification.
A computer program product according to the invention comprises a computer readable medium, e.g. a compact disc (CD) or a random access memory (RAM), encoded with a computer program according to the invention.
A number of exemplifying embodiments of the invention are described in accompanied dependent claims.
Various exemplifying embodiments of the invention both as to constructions and to methods of operation, together with additional objects and advantages thereof, will be best understood from the following description of specific exemplifying embodiments when read in connection with the accompanying drawings.
The verb "to comprise" is used in this document as an open limitation that does not exclude the existence of also unrecited features. The features recited in depending claims are mutually freely combinable unless otherwise explicitly stated.
Brief description of the figures
The exemplifying embodiments of the invention and their advantages are explained in greater detail below with reference to the accompanying drawings, in which:
figure 1 shows a high-level flow chart of a method according to an embodiment of the invention for identifying an application generating a data traffic flow,
figure 2 shows a flow chart of a method according to an embodiment of the invention for identifying an application generating a data traffic flow,
figure 3 shows a flow chart of a method according to an embodiment of the invention for identifying an application generating a data traffic flow,
figures 4a and 4b show a flow chart of making a preliminary identification of an application in a method according to an embodiment of the invention for identifying the application, figure 5 shows a schematic illustration of a device according to an embodiment of the invention for identifying an application generating a data traffic flow, and
figure 6 shows a schematic illustration of a network element according to an embodiment of the invention for identifying an application generating a data traffic flow.
Description of the exemplifying embodiments
Figure 1 shows a high-level flow chart of a method according to an embodiment of the invention for identifying an application generating a data traffic flow. The method contains two classification phases 101 and 102 and a phase 103 for making a final decision on the basis or results obtained in those two classification phases. The classification phase 101 comprises making a first preliminary identification of the application on the basis of properties of one or more first data frames of the data traffic flow. The one or more first data frames are data frames that are transferred at the beginning of the data traffic flow. Hence, it is possible to utilise infor- mation that is present in the data traffic flow only at the beginning of the data traffic flow. The first data frames can be for example data frames that are transferred during a negotiation phase related to establishing of the data traffic flow. The negotiation phase may comprise for example hand-shaking and/or other initialisation actions of the data traffic flow. The data frames can be for example IP-packets (Internet Protocol), Ethernet frames, or other protocol data units (PDU). The classification phase 102 comprises making a second preliminary identification of the application on the basis of statistical properties associated with a portion of the data traffic flow transferred after the transferring of the first data frames. In the classification phase 102 it is possible to utilise information that is present only when the data traffic flow is at the steady state, i.e. after possible initiations are already done. The portion of the data traffic flow used for the second preliminary identification of the application may comprise for example a first pre-determined number of data frames transferred after a second pre-determined number of earlier transferred data frames. The above-mentioned first and second pre- determined numbers can be e.g. 200 and 800, respectively, in which case the portion of the data traffic flow used for the second preliminary identification of the ap- plication comprises data frames 201 -1000 in the temporal order of transmission. The phase 103 comprises identifying the application at least partly on the basis of the first preliminary identification made in the classification phase 101 and the second preliminary identification made in the classification phase 102. In addition to the results of the first and second preliminary identifications, it is possible to use parameters related to algorithms used in the classification phases 101 and 102. These parameters may include, depending on the algorithms being used, for example an indicator of reliability of the first preliminary identification and an indicator of reliability of the second preliminary identification.
In a method according to an embodiment of the invention, the first preliminary identification of the application is made, in the classification phase 101 , on the basis of at least one of the following properties of the one or more first data frames transferred at the beginning of the data traffic flow: payload size, header size, uplink/downlink-transfer direction, a port number. The properties of the first one or more data frames that are selected to be used in the first preliminary identification of the application constitute a feature vector of the data traffic flow for the first preliminary identification. For example, if the number of the first data frames is N, the feature vector of the data traffic flow can be for example:
- Feature 1 : The payload size and uplink/downlink direction of a first trans- ferred data frame,
- Feature 2: The payload size and uplink/downlink direction of a second transferred data frame,
- Feature 3: The payload size and uplink/downlink direction of a third transferred data frame, - Feature N: The payload size and uplink/downlink direction of an Nth transferred data frame.
The first preliminary identification of the application can be made, for example, using the feature vector and classification data obtained with the K-means clustering algorithm, the Gaussian Mixture Model, the spectral clustering, the AutoClass clustering algorithm, or the density based spatial clustering of applications with noise (DBSCAN). It is also possible to use two or more of the aforementioned algorithms and to use a suitable logic, e.g. the voting principle, for determining the result of the first preliminary identification on the basis of results obtained with different classification data based on different algorithms. An embodiment of the invention in which the K-means clustering algorithm is used will be described in more details in later parts of this document. More detailed information about the Gaussian Mixture Model and the spectral clustering can be found e.g. from Ber- naille, L, Teixeira, R., and Salamatian, K. 2006. Early application identification. In Proceedings of the 2006 ACM CoNEXT Conference (Lisboa, Portugal, December 04-07, 2006). CoNEXT '06. ACM, New York, NY, 1 -12. More detailed information about the AutoClass clustering algorithm and the about density based spatial clus- tering of applications with noise (DBSCAN) can be found e.g. from Erman J., Arlitt M., and Mahanti A. (2006) Traffic Classification using Clustering Algorithms. In: Proceedings of the 2006 SIGCOMM workshop on mining network data. New York, NY, USA: ACM Press, p. 281 -286.
In a method according to an embodiment of the invention, the second preliminary identification of the application is made, in the classification phase 102, on the basis of at least one of the following statistical properties related to the portion of the data traffic flow transferred after the transferring of the first data frames:
- Average data frame size,
- Minimum data frame size, - Maximum data frame size,
- Standard deviation of data frame sizes,
- Number of data frame size variations,
- Total payload size, i.e. sum of payload sizes of all data frames used for the second preliminary identification of the application, - Total header size, i.e. sum of header sizes of all data frames used for the second preliminary identification of the application,
- Total payload size to uplink, i.e. sum of payload sizes of all data frames to uplink used for the second preliminary identification of the application,
- Total payload size to downlink, i.e. sum of payload sizes of all data frames to downlink used for the second preliminary identification of the application, - Total header size to uplink, i.e. sum of header sizes of all data frames to uplink used for the second preliminary identification of the application,
- Total header size to downlink, i.e. sum of header sizes of all data frames to downlink used for the second preliminary identification of the applica- tion,
- Number of data frames containing payload to uplink,
- Number of data frames containing payload to downlink,
- Number of push data frames to uplink,
- Number of push data frames to downlink, - Average inter-arrival time to uplink,
- Average inter-arrival time to downlink,
- Minimum inter-arrival time to uplink,
- Minimum inter-arrival time to downlink,
- Maximum inter-arrival time to uplink, - Maximum inter-arrival time to downlink,
- Standard deviation of inter-arrival times to uplink,
- Standard deviation of inter-arrival times to downlink,
The statistical properties that are selected to be used in the second preliminary identification of the application constitute a feature vector of the data traffic flow for the second preliminary identification.
The second preliminary identification of the application can be made, for example, using the feature vector and classification data obtained with the K-means clustering algorithm, the Gaussian Mixture Model, the spectral clustering, the AutoClass clustering algorithm, or the density based spatial clustering of applications with noise (DBSCAN). It is also possible to use two or more of the aforementioned algorithms and to use a suitable logic, e.g. the voting principle, for determining the result of the second preliminary identification on the basis of results obtained with different classification data based on different algorithms. An embodiment of the invention in which the K-means clustering algorithm is used will be described in more details in later parts of this document. It should be noted that it is not neces- sary to use a same algorithm for both the first preliminary identification in the classification phase 101 and the second preliminary identification in the classification phase 102, but the algorithm used for each of the classification phases 101 and 102 can be selected from the viewpoints of the needs and requirements related to the classification phase, 101 or 102, under consideration.
In a method according to an embodiment of the invention, accuracy indicators are calculated for the first preliminary identification of the application and for the second preliminary identification of the application. The application that has a better accuracy is selected in the phase 103 from among the one or two applications proposed by the first and second preliminary identifications. The selected application represents the identified application i.e. the final identification of the application. The accuracy indicators are preferably parameters related to algorithms used in the classification phases 101 and 102.
Figure 2 shows a flow chart of a method according to an embodiment of the inven- tion for identifying an application generating a data traffic flow. The K-means clustering algorithm is used for obtaining first classification data for the first preliminary identification of the application and for obtaining second classification data for the second preliminary identification of the application. The first classification data includes first cluster descriptions and first cluster compositions that are used in the first preliminary identification of the application, and the second classification data includes second cluster descriptions and second cluster compositions that are used in the second preliminary identification of the application. Details of the K- means clustering algorithm can be found, for example, from the book "Clustering Algorithms", J. A. Hartigan (1975), Wiley. The first preliminary identification of the application is made in phases 201 and 21 1 , and the second preliminary identification of the application is made in phases 202 and 212.
The phase 201 comprises selecting a first cluster of one or more applications on a basis of a first feature vector based on properties of data frames transferred at the beginning of the data traffic flow. The algorithm finds out whether a feature vector corresponds to any cluster, and if it does, it discovers it. Every cluster has its own density measure. This density measure is the standard deviation of distances from applications within a cluster to the centroid of this cluster. The density measure together with a pre-determined threshold value is used when discovering whether the feature vector corresponds to a coverage area of a certain cluster. When assigning the feature vector to a cluster, the selected cluster is not always the clos- est one. It may happen, for example, that the feature vector is closer to the centroid of a cluster A than to the centroid of a cluster B but the feature vector is not within the coverage area of the cluster A but the feature vector is within the coverage area of the cluster B. In this case, the cluster B is a better selection for the outcome of the phase 201 than the cluster A.
The phase 21 1 comprises selecting a first application candidate from the selected first cluster of one or more applications. It also is possible that the result of the first preliminary identification of the application is that the first application candidate is unknown. The phase 202 comprises selecting a second cluster of one or more applications on a basis of a second feature vector based on statistical properties of a portion of the data traffic flow that is transferred later than the data frames used for the first preliminary identification of the application. The phase 212 comprises selecting a second application candidate from the selected second cluster of one or more applications. It also is possible that the result of the second preliminary identification of the application is that the second application candidate is un- known.
A phase 203 comprises making a final decision on the application to be identified at least partly on the basis of the first application candidate and the second application candidate. If the first and second application candidates are the same, the final decision on the application to be identified is preferably the application pro- posed by both the first and second preliminary identifications of application. Exemplifying alternatives for making the final decision in cases where the first and second preliminary identifications of application propose different application candidates are described below.
In a method according to an embodiment of the invention, accuracy indicators are calculated for the first preliminary identification of the application and for the second preliminary identification of the application. The application candidate that has a better accuracy is selected in the phase 203 from among the first and second application candidates proposed by the first and second preliminary identifications, respectively. Each accuracy indicator is calculated as a proportional distance DCC/DDM, wherein the DCC is a distance between a feature vector of the data traffic flow and a centroid of a selected cluster and the DDM is the standard deviation of distances from applications within the selected cluster to the centroid of the selected cluster. The feature vector of the data traffic flow is either the first feature vector used in the first preliminary identification or the second feature vector used in the second preliminary identification.
In a method according to an embodiment of the invention, an occurrence probability of the first application candidate is used as an accuracy indicator for the first preliminary identification and an occurrence probability of the second application candidate is used as an accuracy indicator for the second preliminary identification. An occurrence probability of an application is a probability of occurrence of the application within all applications of a corresponding cluster. As a purely exemplifying case we can be assume that applications A, B, and C constitute a certain cluster X of applications, and, after knowing that the cluster to be selected is the cluster X, the application to be identified is the application A with a probability p, the application B with a probability q, and the application C with the probability 1 - p - q. In this case, the occurrence probabilities of the applications A, B, and C are p, q, and 1 - p - q, respectively. Estimates for the occurrence probabilities can be determined for example on the basis of usage statistics related to the applications under considerations.
It is possible that one or both of the first and second preliminary identifications produce a result that the application is unknown, i.e. either one or both of the first and second application candidates can indicate that the application is unknown. If both the first and second application candidates indicate that the application is unknown, the final decision on the application to be identified is preferably such that the application is unknown. Exemplifying alternatives for making the final decision in cases where only one of the first preliminary identification and the second preliminary identification proposes a known application are described below. In a method according to an embodiment of the invention, the final decision on the application to be identified is the application proposed by the first preliminary identification in a case in which only the second preliminary identification indicates that the application is unknown. Correspondingly, the final decision on the application to be identified is the application proposed by the second preliminary identification in a case in which only the first preliminary identification indicates that the application is unknown.
In a method according to an embodiment of the invention, the final decision on the application to be identified is the application proposed by the first preliminary iden- tification if:
- the second preliminary identification indicates that the application is unknown, and
- the closest cluster found in the second preliminary identification process contains the said application proposed by the first preliminary identifica- tion, the closest cluster being the cluster whose centroid is closest to the feature vector of the data traffic flow in the second preliminary identification process.
The final decision is that the application is unknown if the said closest cluster does not contain the said application proposed by the first preliminary identification.
Correspondingly, the final decision on the application to be identified is the application proposed by the second preliminary identification if:
- the first preliminary identification indicates that the application is unknown, and
- the closest cluster found in the first preliminary identification process con- tains the said application proposed by the second preliminary identification, the closest cluster being the cluster whose centroid is closest to the feature vector of the data traffic flow in the first preliminary identification process. The final decision is that the application is unknown if the said closest cluster does not contain the said application proposed by the second preliminary identification.
Figure 3 shows a flow chart of a method according to an embodiment of the invention for identifying an application generating a data traffic flow. Phases 301 , 311 , 302, 312, and 303 are similar to the phases 201 , 21 1 , 202, 212, and 203 shown in figure 2, respectively. The method comprises training phases 304 and 305. In the training phase 304, the K-means clustering algorithm and first pre-determined training data flows are used for forming the first classification data for the purpose of the first preliminary identification of the application. In the training phase 305, the K-means clustering algorithm and second pre-determined training data flows are used for forming the second classification data for the purpose of the second preliminary identification of the application. The first classification data produced in the training phase 304 includes first cluster descriptions and first cluster compositions that are used in the first preliminary identification of the application, and the second classification data produced in the training phase 305 includes second cluster descriptions and second cluster compositions that are used in the second preliminary identification of the application. The cluster descriptions may include for example information defining the centroids and the density measures of the clusters.
The training phases 304 and 305 are preferably implemented using an offline trainer. The trainer takes the first and second training data flows as input and uses those training data flows to capture the characteristic patterns of the desired application types. The trainer divides the training data flows into clusters using the K- means clustering algorithm. After the clustering, the trainer outputs the cluster de- scriptors and cluster compositions for the purposes of the first and second preliminary identification of the application. The offline training can be done only once but it is also possible to update the cluster descriptors and cluster compositions within a certain period of time. In the long run, it may be good to update in order to keep in touch with possible changes in behaviour of applications.
The K-Means clustering algorithm that can be used in the training phases 304 and 305 can be described with the aid of the following steps: 1. calculating the distances between feature vectors and cluster centroids, each feature vector corresponding to a certain training data flow,
2. assigning each feature vector to a cluster the centroid of which is closest to that feature vector,
3. calculating new cluster centroids based on the assigned feature vectors, and
4. go back to step 1 and continue until the cluster centroids do not substantially move.
After required iterations, clusters have formed. All feature vectors used in the train- ing contain the ground truth about the applications generating the training data flows. Therefore, the trainer knows which applications are related to which cluster. The trainer also calculates the distributions of the feature vectors inside each cluster. The distribution describes whether a cluster is very tight or if it is spread far and wide. This information can be used when recognizing unknown data traffic flows. Without this property, all data traffic flows, including those corresponding no training data flow, would be related to some application at the first and second preliminary identifications. Consequently, the trainer outputs the final cluster centroids, the distribution of the applications for each cluster and the standard deviation of the distances to the cluster centroid for each cluster.
Figures 4a and 4b show a flow chart of making a preliminary identification of an application in a method according to an embodiment of the invention for identifying the application. The process depicted in figures 4a and 4b can be used both for the first preliminary identification of the application, the phases 201 and 21 1 in figure 2 and the phases 301 and 311 in figure 3, and for the second preliminary iden- tification of the application, the phases 202 and 212 in figure 2 and the phases 302 and 312 in figure 3.
Figure 4a depicts an exemplifying cluster assignment process that may correspond, for example, to the phase 201 and/or the phase 202 shown in figure 2, as well as the phase 301 and/or the phase 302 shown in figure 3. A phase 421 of the cluster assignment process comprises initialisation of variables i, is_near, min_dist_in, and min_dist. A phase 422 comprises calculation of a distance D(i) between a feature vector of a data traffic flow and the centroid of the cluster i. A decision phase 423 comprises checking whether the feature vector belongs to the coverage area of the cluster i, i.e. checking whether the distance D(i) is less than the standard deviation of the distances in the cluster i multiplied with a predetermined threshold value T. A decision phase 424 comprises checking whether the distance D(i) is smaller than the so far smallest distance over clusters whose coverage areas comprise the feature vector, i.e. it is checked whether D(i) < min_dist_in. A phase 425 comprises setting the cluster i as the so far closest cluster whose coverage area comprises the feature vector, i.e. the variable clus- ter_id_in is set to i, setting the variable is_near to '1 ' in order to indicate that the feature vector belongs to an coverage area of at least one cluster, and setting the so far smallest distance over the clusters whose coverage areas comprise the fea- ture vector to the D(i), i.e. the variable min_dist_in is set to D(i). A decision phase
426 comprises checking whether the distance D(i) is smaller than the so far smallest distance over all clusters, i.e. it is checked whether D(i) < min_dist. A phase
427 comprises setting the cluster i as the so far closest cluster, i.e. the variable clusterjd is set to i, and setting the so far smallest distance over all clusters to the D(i), i.e. the variable min_dist is set to D(i). A decision phase 430 comprises checking whether there are any clusters left to be inspected. A phase 431 comprises shifting to the next cluster to be inspected, i.e. the variable i is incremented by one, and moving back to the phase 422. A decision phase 428 comprises checking whether the feature vector belongs to a coverage area of any cluster, i.e. checking whether the variable is_near is one or still zero. If the feature vector belongs to the coverage area of at least one cluster, i.e. is_near = 1 , the selected cluster is indicated in a phase 429 by the variable cluster_id_in and the distance from the feature vector to the centroid of the selected cluster is indicated by the variable min_dist_in. If the feature vector does not belong to a coverage area of any cluster, i.e. is_near = 0, it is indicated in a phase 432 that the application that corresponds to the feature vector is unknown, the variable clusterjd indicates the cluster the centroid of which is closest to the feature vector, and the variable min dist indicates the distance from the feature vector to the centroid of this clos- est cluster. In the case in which the feature vector belongs to the coverage area of at least one cluster, i.e. is_near = 1 , the above-described cluster assignment process is continued by an application labelling process for providing an indication of an application corresponding to the feature vector or an indication that the applica- tion is unknown.
Figure 4b depicts an exemplifying labelling process. The labelling process depicted in figure 4b may correspond for example to the phases 21 1 and 212 shown in figure 2 and the phases 311 and 312 shown in figure 3. In the exemplifying labelling process depicted in figure 4b, a port number is utilised. A decision phase 433 comprises checking whether the data traffic flow under consideration uses a standard port number of a known application. A phase 434 comprises determining an application that corresponds to the standard port number. A decision phase 435 comprises checking whether the selected cluster to which the data traffic flow was assigned contains any application corresponding to that standard port num- ber. A phase 436 comprises setting the determined application to be the outcome of the application labelling process, i.e. the outcome of the first or second preliminary identification of application. If the selected cluster does not contain any application corresponding to that standard port number, the data flow will be labelled with the dominant application of the selected cluster. A phase 437 comprises de- termining the dominant application among all applications of the selected cluster. If the data traffic flow is using a non-standard port number, it will be labelled according to the dominant application among those applications that use non-standard destination port numbers and belong to the selected cluster, or the data traffic flow will be labelled as unknown if the selected cluster does not contain any applica- tions that use non-standard destination port numbers. A decision phase 438 comprises checking whether the selected cluster contains any application(s) that utilise^) non-standard port numbers. A phase 439 comprises determining a dominant, e.g. most probable, application that uses a non-standard port number among all applications of the selected cluster. A phase 440 comprises setting the applica- tion to be unknown, i.e. the outcome of the first or second preliminary identification of application is that the application is unknown. Figure 5 shows a schematic illustration of a device according to an embodiment of the invention for identifying an application generating a data traffic flow. The device can be for example a part of a network element that can be either an operator controlled network element, a home or office sited network element, or a user ter- minal device. The device comprises a processing system 501 arranged to:
- make a first preliminary identification of the application on the basis of properties of one or more first data frames of the data traffic flow, the one or more first data frames being transferred at the beginning of the data traffic flow,
- make a second preliminary identification of the application on the basis of statistical properties associated with a portion of the data traffic flow transferred after the transferring of the first data frames, and
- identify the application at least partly on the basis of the first preliminary identification and the second preliminary identification.
The processing system 501 may comprise one or more processor units. Each processing unit can be a programmable processor, an application specific circuit, or a field programmable circuit. The device may further comprise a memory unit 502 and a data interface 503 for communicating with external systems.
In a device according to an embodiment of the invention, the processing system 501 is arranged to use data frames transferred during a negotiation phase related to establishing of the data traffic flow as the above-mentioned one or more first data frames.
In a device according to an embodiment of the invention, the processing system 501 is arranged to make the first preliminary identification of the application on the basis of at least one of the following properties of the above-mentioned one or more first data frames: payload size, header size, uplink/downlink-transfer direction, a port number. In a device according to an embodiment of the invention, the processing system 501 is arranged to make the first preliminary identification of the application with the aid of first classification data obtained with the K-means clustering algorithm.
In a device according to an embodiment of the invention, the processing system 501 is arranged to make the first preliminary identification of the application with the aid of first classification data obtained with one of the following: the Gaussian Mixture Model, the spectral clustering, the AutoClass clustering algorithm, the density based spatial clustering of applications with noise (DBSCAN).
In a device according to an embodiment of the invention, the portion of the data traffic flow used for the second preliminary identification of the application comprises a first pre-determined number of data frames transferred after a second predetermined number of earlier transferred data frames.
In a device according to an embodiment of the invention, the processing system 501 is arranged to make the second preliminary identification of the application on the basis of at least one of the following statistical properties related to the portion of the data traffic flow transferred after the first data frames: average frame size, minimum frame size, maximum frame size, standard deviation of frame size, average inter-arrival time, standard deviation of the inter-arrival time.
In a device according to an embodiment of the invention, the processing system 501 is arranged to make the second preliminary identification of the application with the aid of second classification data obtained with the K-means clustering algorithm.
In a device according to an embodiment of the invention, the processing system 501 is arranged to make the second preliminary identification of the application with the aid of second classification data obtained with one of the following: the Gaussian Mixture Model, the spectral clustering, the AutoClass clustering algorithm, the density based spatial clustering of applications with noise (DBSCAN).
In a device according to an embodiment of the invention, the processing system 501 is arranged to calculate accuracy indicators for the first preliminary identifica- tion of the application and for the second preliminary identification of the application, and to select the application that has a better accuracy from among the one or two applications according to the first and second preliminary identifications, the selected application representing the identified application.
In a device according to an embodiment of the invention, the processing system 501 is arranged to use first classification data obtained with the K-means clustering algorithm for the first preliminary identification of the application and second classification data obtained with the K-means clustering algorithm for the second preliminary identification of the application, and to calculate each accuracy indica- tor as DCC/DDM. The DCC is a distance between a feature vector of the data traffic flow and a centroid of a selected cluster of one or more applications, and the DDM is a standard deviation of distances from applications within the selected cluster of applications to the centroid of the selected cluster of one or more applications.
In a device according to an embodiment of the invention, the processing system 501 is arranged to use first classification data obtained with the K-means clustering algorithm for the first preliminary identification of the application and second classification data obtained with the K-means clustering algorithm for the second preliminary identification of the application, and to use an occurrence probability of the application according to the first preliminary identification as an accuracy indicator for the application according to the first preliminary identification and an occurrence probability of the application according to the second preliminary identification as an accuracy indicator for the application according to the second preliminary identification. The occurrence probability of an application is the probability of occurrence of the application within all applications of a corresponding cluster of one or more applications.
In a device according to an embodiment of the invention, the processing system 501 is arranged to use the K-means clustering algorithm and first pre-determined training data flows for obtaining first classification data to be used for the first pre- liminary identification of the application and to use the K-means clustering algo- rithm and second pre-determined training data flows for obtaining second classification data to be used for the second preliminary identification of the application.
Figure 6 shows a schematic illustration of a network element 600 according to an embodiment of the invention for identifying an application generating a data traffic flow. The network element comprises a processing system 601 arranged to:
- make a first preliminary identification of the application on the basis of properties of one or more first data frames of the data traffic flow, the one or more first data frames being transferred at the beginning of the data traffic flow,
- make a second preliminary identification of the application on the basis of statistical properties associated with a portion of the data traffic flow transferred after the transferring of the first data frames, and
- identify the application at least partly on the basis of the first preliminary identification and the second preliminary identification.
The network element comprises preferably a transmitter 604 for transmitting data traffic flows to a communication network and/or a receiver 605 for receiving data traffic flows from the communication network. Alternatively, the network element may comprise a data interface (not shown) for connecting to an external transmitter and/or to an external receiver. The network element may further comprise a memory unit 602 or a data interface (not shown) for connecting to an external memory unit.
A network element according to an embodiment of the invention comprises/is at least one of the following: an IP-router (Internet Protocol), Ethernet switch, ATM- switch (Asynchronous Transfer Mode), base station of a mobile communications network, MPLS-switch (Multiprotocol Label Switching), a WLAN-AP (Wireless Local Area Network - Access Point).
A network element according to an embodiment of the invention is a user terminal device and comprises/is at least one of the following: a mobile phone, a palmtop computer, a personal digital assistant, a personal computer, a lap-top computer. A computer program according to an embodiment of the invention comprises a program code for controlling a programmable processor to identify an application generating a data traffic flow. The program code comprises computer executable instructions for controlling the programmable processor to:
- make a first preliminary identification of the application on the basis of properties of one or more first data frames of the data traffic flow, the one or more first data frames being transferred at the beginning of the data traffic flow,
- make a second preliminary identification of the application on the basis of statistical properties associated with a portion of the data traffic flow transferred after the transferring of the first data frames, and
- identify the application at least partly on the basis of the first preliminary identification and the second preliminary identification.
The computer executable instructions can be e.g. subroutines and/or functions.
A computer program product according to an embodiment of the invention is stored in a computer readable medium. The computer readable medium can be e.g. a CD-ROM (Compact Disc Read Only Memory) or a RAM-device (Random Access Memory).
A computer program product according to an embodiment of the invention is car- ried in a signal that is receivable from a communication network.
A computer readable medium, e.g. a CD-ROM (Compact Disc Read Only Memory) or a RAM-device (Random Access Memory), according to an embodiment of the invention is encoded with a computer program according to an embodiment of the invention.
The specific examples provided in the description given above should not be construed as limiting. Therefore, the invention is not limited merely to the embodiments described above, many variants being possible.

Claims

Claims:
1. A device for identifying an application generating a data traffic flow, the device comprising a processing system (501 ) arranged to make a first preliminary identification of the application on the basis of properties of one or more first data frames of the data traffic flow, the one or more first data frames being transferred at the beginning of the data traffic flow, characterized in that the processing system is further arranged to:
- make a second preliminary identification of the application on the basis of statistical properties associated with a portion of the data traffic flow trans- ferred after the transferring of the first data frames, and
- identify the application at least partly on the basis of the first preliminary identification and the second preliminary identification.
2. A device according to claim 1 , wherein the one or more first data frames are data frames transferred during a negotiation phase related to establishing of the data traffic flow.
3. A device according to claim 1 or 2, wherein the processing system is arranged to make the first preliminary identification of the application on the basis of at least one of the following properties of the one or more first data frames: pay- load size, header size, uplink/downlink-transfer direction, a port number.
4. A device according to any of claims 1 -3, wherein the processing system is arranged to make the first preliminary identification of the application with the aid of first classification data obtained with the K-means clustering algorithm.
5. A device according to any of claims 1 -3, wherein the processing system is arranged to make the first preliminary identification of the application with the aid of first classification data obtained with one of the following: the Gaussian Mixture Model, the spectral clustering, the AutoClass clustering algorithm, the density based spatial clustering of applications with noise (DBSCAN).
6. A device according to any of the claims 1 -5, wherein the portion of the data traffic flow used for the second preliminary identification of the application comprises a first pre-determined number of data frames transferred after a second predetermined number of earlier transferred data frames.
7. A device according to any of the claims 1 -6, wherein the processing system is arranged to make the second preliminary identification of the application on the basis of at least one of the following statistical properties related to the portion of the data traffic flow transferred after the transferring of the first data frames: average frame size, minimum frame size, maximum frame size, standard deviation of frame size, average inter-arrival time, standard deviation of inter-arrival time.
8. A device according to any of the claims 1 -7, wherein the processing system is arranged to make the second preliminary identification of the application with the aid of second classification data obtained with the K-means clustering algorithm.
9. A device according to any of the claims 1 -7, wherein the processing system is arranged to make the second preliminary identification of the application with the aid of second classification data obtained with one of the following: the Gaussian Mixture Model, the spectral clustering, the AutoClass clustering algorithm, the density based spatial clustering of applications with noise (DBSCAN).
10. A device according to any of the claims 1 -9, wherein the processing system is arranged to calculate accuracy indicators for the first preliminary identification of the application and for the second preliminary identification of the application, and to select the application that has a better accuracy from among the one or two applications according to the first and second preliminary identifications, the selected application representing the identified application.
11. A device according to claim 10, wherein the processing system is arranged to use first classification data obtained with the K-means clustering algorithm for the first preliminary identification of the application and second classification data obtained with the K-means clustering algorithm for the second preliminary identification of the application, and to calculate each accuracy indicator as DCC/DDM, the DCC being a distance between a feature vector of the data traffic flow and a cen- troid of a selected cluster of one or more applications, and the DDM being a standard deviation of distances from applications within the selected cluster of one or more applications to the centroid of the selected cluster of one or more applications.
12. A device according to claim 10, wherein the processing system is arranged to use first classification data obtained with the K-means clustering algorithm for the first preliminary identification of the application and second classification data obtained with the K-means clustering algorithm for the second preliminary identification of the application, and to use an occurrence probability of the application ac- cording to the first preliminary identification as an accuracy indicator for the application according to the first preliminary identification and an occurrence probability of the application according to the second preliminary identification as an accuracy indicator for the application according to the second preliminary identification, an occurrence probability of an application being a probability of occurrence of the application within all applications of a corresponding cluster of one or more applications.
13. A device according to claim 1 , wherein the processing system is arranged to use the K-means clustering algorithm and first pre-determined training data flows for obtaining first classification data for the first preliminary identification of the ap- plication and to use the K-means clustering algorithm and second pre-determined training data flows for obtaining second classification data for the second preliminary identification of the application.
14. A method for identifying an application generating a data traffic flow, the method comprising making (101 , 201 , 21 1 , 301 , 311 ) a first preliminary identifica- tion of the application on the basis of properties of one or more first data frames of the data traffic flow, the one or more first data frames being transferred at the beginning of the data traffic flow, characterized in that the method further comprises:
- making (102, 202, 212, 302, 312) a second preliminary identification of the application on the basis of statistical properties associated with a portion of the data traffic flow transferred after the transferring of the first data frames, and - identifying (103, 203, 303) the application at least partly on the basis of the first preliminary identification and the second preliminary identification.
15. A method according to claim 14, wherein the one or more first data frames are data frames transferred during a negotiation phase related to establishing of the data traffic flow.
16. A method according to claim 14 or 15, wherein the first preliminary identification of the application is made on the basis of at least one of the following properties of the one or more first data frames: payload size, header size, uplink/downlink-transfer direction, a port number.
17. A method according to any of claims 14-16, wherein the first preliminary identification of the application is made (421 -440) using first classification data obtained with the K-means clustering algorithm.
18. A method according to any of claims 14-16, wherein the first preliminary identification of the application is made using first classification data obtained with one of the following: the Gaussian Mixture Model, the spectral clustering, the Auto- Class clustering algorithm, the density based spatial clustering of applications with noise (DBSCAN).
19. A method according to any of the claims 14-18, wherein the portion of the data traffic flow used for the second preliminary identification of the application comprises a first pre-determined number of data frames transferred after a second pre-determined number of earlier transferred data frames.
20. A method according to any of the claims 14-19, wherein the second preliminary identification of the application is made on the basis of at least one of the following statistical properties related to the portion of the data traffic flow transferred after the transferring of the first data frames: average frame size, minimum frame size, maximum frame size, standard deviation of frame size, average inter-arrival time, standard deviation of inter-arrival time.
21. A method according to any of the claims 14-20, wherein the second preliminary identification of the application is made (421 -440) using second classification data obtained with the K-means clustering algorithm.
22. A method according to any of the claims 14-20, wherein the second prelimi- nary identification of the application is made using second classification data obtained with one of the following: the Gaussian Mixture Model, the spectral clustering, the AutoClass clustering algorithm, the density based spatial clustering of applications with noise (DBSCAN).
23. A method according to any of the claims 14-22, wherein accuracy indicators are calculated for the first preliminary identification of the application and for the second preliminary identification of the application, and the application that has a better accuracy is selected from among the one or two applications according to the first and second preliminary identifications, the selected application representing the identified application.
24. A method according to claim 23, wherein the K-means clustering algorithm is used for obtaining first classification data for the first preliminary identification of the application and for obtaining second classification data for the second preliminary identification of the application, and each accuracy indicator is calculated as DCC/DDM, the DCC being a distance between a feature vector of the data traffic flow and a centroid of a selected cluster of one or more applications and the DDM being a standard deviation of distances from applications within the selected cluster of one or more applications to the centroid of the selected cluster of one or more applications.
25. A method according to claim 23, wherein the K-means clustering algorithm is used for obtaining first classification data for the first preliminary identification of the application and for obtaining second classification data for the second preliminary identification of the application, and an occurrence probability of the application according to the first preliminary identification is used as an accuracy indicator for the application according to the first preliminary identification and an occur- rence probability of the application according to the second preliminary identification is used as an accuracy indicator for the application according to the second preliminary identification, an occurrence probability of an application being a probability of occurrence of the application within all applications of a corresponding cluster of one or more applications.
26. A method according to claim 14, wherein the method comprises using (304) the K-means clustering algorithm and first pre-determined training data flows for obtaining first classification data for the first preliminary identification of the application and using (305) the K-means clustering algorithm and second predetermined training data flows for obtaining second classification data for the second preliminary identification of the application.
27. A network element (600) arranged to receive a data traffic flow generated by an application, the network element comprising a processing system (601 ) arranged to make a first preliminary identification of the application on the basis of properties of one or more first data frames of the data traffic flow, the one or more first data frames being transferred at the beginning of the data traffic flow, charac- terized in that the processing system is further arranged to:
- make a second preliminary identification of the application on the basis of statistical properties associated with a portion of the data traffic flow transferred after the transferring of the first data frames, and
- identify the application at least partly on the basis of the first preliminary identification and the second preliminary identification.
28. A network element according to claim 27, wherein the network element comprises at least one of the following: an IP-router (Internet Protocol), Ethernet switch, ATM-switch (Asynchronous Transfer Mode), base station of a mobile communications network, MPLS-switch (Multiprotocol Label Switching), a WLAN- AP (Wireless Local Area Network - Access Point).
29. A network element according to claim 27, wherein the network element is a user terminal device and comprises at least one of the following: a mobile phone, a palmtop computer, a personal digital assistant.
30. A computer program for identifying an application generating a data traffic flow, the computer program comprising computer executable instructions for controlling a programmable processor to make a first preliminary identification of the application on the basis of properties of one or more first data frames of the data traffic flow, the one or more first data frames being transferred at the beginning of the data traffic flow, characterized in that the computer program further comprises computer executable instructions for controlling the programmable processor to:
- make a second preliminary identification of the application on the basis of statistical properties associated with a portion of the data traffic flow trans- ferred after the transferring of the first data frames, and
- identify the application at least partly on the basis of the first preliminary identification and the second preliminary identification.
31. A computer readable medium, characterized in that the computer readable medium is encoded with a computer program according to claim 30.
PCT/FI2010/050275 2009-04-09 2010-04-08 Method and device for identifying applications which generate data traffic flows WO2010116036A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FI20095393A FI20095393A0 (en) 2009-04-09 2009-04-09 Method and apparatus for identifying applications that generate data traffic flows
FI20095393 2009-04-09

Publications (1)

Publication Number Publication Date
WO2010116036A1 true WO2010116036A1 (en) 2010-10-14

Family

ID=40590275

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/FI2010/050275 WO2010116036A1 (en) 2009-04-09 2010-04-08 Method and device for identifying applications which generate data traffic flows

Country Status (2)

Country Link
FI (1) FI20095393A0 (en)
WO (1) WO2010116036A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140219101A1 (en) * 2013-02-04 2014-08-07 Huawei Technologies Co., Ltd. Feature Extraction Apparatus, and Network Traffic Identification Method, Apparatus, and System
CN110222782A (en) * 2019-06-13 2019-09-10 齐鲁工业大学 There are supervision two-category data analysis method and system based on Density Clustering
US10796243B2 (en) 2014-04-28 2020-10-06 Hewlett Packard Enterprise Development Lp Network flow classification
CN112291089A (en) * 2020-10-23 2021-01-29 全知科技(杭州)有限责任公司 Application system identification and definition method based on flow
CN114513473A (en) * 2022-03-24 2022-05-17 新华三人工智能科技有限公司 Traffic class detection method, device and equipment

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060277288A1 (en) * 2005-01-19 2006-12-07 Facetime Communications, Inc. Categorizing, classifying, and identifying network flows using network and host components

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060277288A1 (en) * 2005-01-19 2006-12-07 Facetime Communications, Inc. Categorizing, classifying, and identifying network flows using network and host components

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"Proceeding of the 2nd Conference on Future Networking Technologies, CoNEXT'06, 04-07 December 2006, Lisboa, Portugal", article BERNAILLE, L. ET AL.: "Early Application Identification" *
"Proceedings of the 31st IEEE Conference on Local Computer Networks, Tampa, Florida, USA, 14-16 November 2006", article NGUYEN, T.T.T. ET AL.: "Training on Multiple Sub-Flows to Optimize The Use of Machine Learning Classifiers in Real-World IP Networks", pages: 369 - 376 *
NGUYEN, T.T.T. ET AL.: "A Survey of Techniques for Internet Traffic Classification using Machine Learning", IEEE COMMUNICATIONS SURVEYS & TUTORIALS, vol. 10, no. 4, 1 October 2008 (2008-10-01), NEW YORK, NY, US, pages 56 - 76 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140219101A1 (en) * 2013-02-04 2014-08-07 Huawei Technologies Co., Ltd. Feature Extraction Apparatus, and Network Traffic Identification Method, Apparatus, and System
US10796243B2 (en) 2014-04-28 2020-10-06 Hewlett Packard Enterprise Development Lp Network flow classification
CN110222782A (en) * 2019-06-13 2019-09-10 齐鲁工业大学 There are supervision two-category data analysis method and system based on Density Clustering
CN112291089A (en) * 2020-10-23 2021-01-29 全知科技(杭州)有限责任公司 Application system identification and definition method based on flow
CN114513473A (en) * 2022-03-24 2022-05-17 新华三人工智能科技有限公司 Traffic class detection method, device and equipment

Also Published As

Publication number Publication date
FI20095393A0 (en) 2009-04-09

Similar Documents

Publication Publication Date Title
CN113261244B (en) Network node combining MEC host and UPF selection
Hamid et al. Energy and eigenvalue based combined fully blind self adapted spectrum sensing algorithm
Bütün et al. Impact of mobility prediction on the performance of cognitive radio networks
CN108989880B (en) Code rate self-adaptive switching method and system
WO2010116036A1 (en) Method and device for identifying applications which generate data traffic flows
Deka et al. Optimization of spectrum sensing in cognitive radio using genetic algorithm
Lin et al. A neural-network-based context-aware handoff algorithm for multimedia computing
Schmid et al. A survey on client throughput prediction algorithms in wired and wireless networks
US11558769B2 (en) Estimating apparatus, system, method, and computer-readable medium, and learning apparatus, method, and computer-readable medium
Krishnakumar et al. Machine learning based spectrum sensing and distribution in a cognitive radio network
JP2007036839A (en) Apparatus, system, and method for dividing quality deterioration in packet exchange network
Ali et al. Network selection in heterogeneous access networks simultaneously satisfying user profile and QoS
Long et al. An estimation algorithm of channel state transition probabilities for cognitive radio systems
CN114302428B (en) MEC node determination method and device
Carvalho et al. Performance analysis of multi-service wireless network: An approach integrating CAC, scheduling, and buffer management
Tang et al. An analytical performance model considering access strategy of an opportunistic spectrum sharing system
Shadad et al. Efficient and Reliable Management of 5G Network Slicing based on Deep Learning
Wu et al. A wireless channel model for support of quality of service
Chousainov et al. An analytical framework of a C-RAN supporting bursty traffic
Vieira et al. Estimation of backlog and delay in OFDM/TDMA systems with traffic policing using Network Calculus
Lee et al. Bi‐LSTM model with time distribution for bandwidth prediction in mobile networks
CN112437469A (en) Service quality assurance method, apparatus and computer readable storage medium
Xu et al. Towards smart networking through context aware traffic identification kit (trick) in 5G
Perera et al. Primary user activity modeling using multi-term parameter estimation in cognitive radio systems
García et al. Automatic UMTS system resource dimensioning based on service traffic analysis

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10761233

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10761233

Country of ref document: EP

Kind code of ref document: A1