WO2020228527A1 - 数据流的分类方法和报文转发设备 - Google Patents

数据流的分类方法和报文转发设备 Download PDF

Info

Publication number
WO2020228527A1
WO2020228527A1 PCT/CN2020/087363 CN2020087363W WO2020228527A1 WO 2020228527 A1 WO2020228527 A1 WO 2020228527A1 CN 2020087363 W CN2020087363 W CN 2020087363W WO 2020228527 A1 WO2020228527 A1 WO 2020228527A1
Authority
WO
WIPO (PCT)
Prior art keywords
service
forwarding device
application
address
data flow
Prior art date
Application number
PCT/CN2020/087363
Other languages
English (en)
French (fr)
Inventor
邱亚平
罗奇
华卓隽
王璐
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP20805486.6A priority Critical patent/EP3905597B1/en
Publication of WO2020228527A1 publication Critical patent/WO2020228527A1/zh
Priority to US17/468,250 priority patent/US20210409334A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/22Parsing or analysis of headers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/50Network service management, e.g. ensuring proper service fulfilment according to agreements
    • H04L41/5041Network service management, e.g. ensuring proper service fulfilment according to agreements characterised by the time relationship between creation and deployment of a service
    • H04L41/5054Automatic deployment of services triggered by the service manager, e.g. service implementation by automatic configuration of network components
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/02Capturing of monitoring data
    • H04L43/026Capturing of monitoring data using flow identification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/74Address processing for routing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2441Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2475Traffic characterised by specific attributes, e.g. priority or QoS for supporting traffic characterised by the type of applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/40Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using virtualisation of network functions or resources, e.g. SDN or NFV entities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/45Network directories; Name-to-address mapping
    • H04L61/4505Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
    • H04L61/4511Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]

Definitions

  • This application relates to the field of communication technologies, and in particular to a data stream classification method and packet forwarding device.
  • SD-WAN software-defined wide area network
  • Deep packet inspection (DPI) technology is applied to classify data streams.
  • the DPI device extracts traffic characteristics based on the byte information in the data stream, and then matches the extracted traffic characteristics with preset identification rules to obtain the classification result.
  • the embodiment of the present application provides a data stream classification method, which is applied to a message forwarding device between an internal network and the Internet, which can reduce the workload of technicians and avoid the problem of unidentified data streams caused by application updates.
  • the first aspect of the embodiments of the present application provides a data stream classification method, which is applied to a message forwarding device between an internal network and the Internet, including: the message forwarding device obtains multiple data streams and extracts the multiple data streams The address information and time information of each data stream in the data stream.
  • the multiple data streams are data streams generated by multiple client devices respectively accessing multiple services.
  • the service is used to implement sub-functions of the application.
  • the address information includes the source IP address.
  • the message forwarding device filters out the data generated by the first client device accessing multiple services from the multiple data flows according to the source IP address of each data flow Flow set, the first client device is the client device assigned to use the first IP address among the multiple client devices; the packet forwarding device according to the destination IP address and destination of each data flow in the data flow set
  • the port number determines the service set accessed by the first client device.
  • the service set includes the first service and the second service.
  • the combination of the destination IP address and the destination port number corresponding to the first service corresponds to the destination of the second service
  • the combination of IP address and destination port number is different; the packet forwarding device determines the correlation between the services in the service set according to the time information of each data flow in the data flow set; the packet forwarding device Correlation, determining that the first service and the second service are used to implement the first application; the packet forwarding device determines that the data flow corresponding to the first service and the second service is the data flow of the first application.
  • An application consists of a set of services, which are used to implement sub-functions of the application.
  • a client device accesses an application, it will use the provider server to establish multiple data streams. Multiple data streams are used to implement multiple services belonging to the application. The time information of the multiple data streams will have a strong correlation. .
  • multiple client devices respectively access one or more applications, multiple data streams will be established.
  • the message forwarding device obtains multiple data streams, and the multiple data streams are data streams generated by multiple client devices respectively accessing multiple services.
  • the message forwarding device extracts the address information and time information of each data stream in the multiple data streams, and can filter out the data generated by the first client device from the multiple data streams according to the source IP address of each data stream The first data stream collection.
  • the packet forwarding device can determine the service set accessed by the first client device according to the destination IP address and destination port number of each data flow in the data flow set, and according to the time information of each data flow in the data flow set .
  • the message forwarding device determines the correlation between the services in the service set, and the correlation between the services refers to the degree of correlation between the services at the time information level. Based on the correlation, the packet forwarding device can determine the first service and the second service used to implement the first application, and thus, the packet forwarding device can determine the data flow corresponding to the first service and the second service For the data stream of the first application, the classification of multiple data streams is realized.
  • the data stream classification method provided by the embodiments of this application classifies by the internal correlation of time information, and does not need to identify the byte information in the data stream according to the identification rules, which can reduce the workload of the technicians and avoid application updates.
  • the time information includes: the start time and/or the end time of the data stream.
  • the data stream classification method provided in the embodiments of this application provides several specific forms of time information, which can be the start time and end time of the data stream, or the start time and end time, so that in the process of implementing data stream classification The flexibility of the solution can be increased.
  • the classification of data streams based on the start time and the end time can also increase the accuracy of the classification.
  • the message forwarding device determining that the first service and the second service are used to implement the first application according to the correlation includes: the message forwarding device according to the correlation , Clustering is performed through an unsupervised algorithm, and the first service and the second service are determined to be used to implement the first application.
  • the data stream classification method provided in the embodiments of this application uses an unsupervised algorithm in machine learning to classify traffic.
  • no label samples are used, and it is only used to verify the effectiveness of the algorithm, which can simplify The classification process reduces the workload of technicians.
  • the clustering method includes: a spectral clustering algorithm, a K-Means clustering algorithm, or a DBSCAN density clustering algorithm.
  • the data stream classification method provided in the embodiment of the present application provides multiple possible clustering methods, and improves the flexibility of solution implementation.
  • the packet forwarding device determines the correlation between services in the service set according to the time information of each data flow in the data flow set, including: the message The text forwarding device determines the first co-occurring service set according to the time information of each data flow in the data flow set, the first service and the second service belong to the first co-occurring service set, and the first co-occurring service set It includes at least two services, and the interval duration for accessing the time information of the data streams generated by the at least two services is less than or equal to the preset duration; the packet forwarding device determines the first co-occurrence service set according to the first co-occurrence service set. The correlation between the service and the second service.
  • the data stream classification method provided by the embodiments of the present application can filter out the services whose inter-information interval duration is less than or equal to the preset duration from multiple services accessed by a single client device, obtain the co-occurrence service set, and then determine the service The correlation between the two enhances the feasibility of the program.
  • the method further includes: the packet forwarding device determines the similarity between the first service and the second service according to the first co-occurrence service set, to obtain the similarity Matrix; the message forwarding device determines that the first service and the second service are used to implement the first application according to the correlation includes: the message forwarding device determines the first service and the second service according to the similarity matrix Used to implement the first application.
  • the similarity between services can be determined through the set of co-occurring services, and the similarity matrix can be obtained, and then whether the service is used to realize the same application is determined according to the similarity matrix. This solution can improve classification Accuracy.
  • the message forwarding device determining the similarity between the first service and the second service according to the first co-occurrence service set includes: the message forwarding device according to the cosine The similarity calculation method, the cross-combination calculation method or the Euclidean distance calculation method determines the similarity between the first service and the second service.
  • the data stream classification method provided by the embodiment of the present application provides several specific calculation methods for calculating the similarity between services, which improves the feasibility and flexibility of the solution.
  • the method further includes: the message forwarding device extracts the first feature vector of the first service and the first feature vector of the second service from the similarity matrix through graph embedding technology Two feature vectors; the message forwarding device determining that the first service and the second service are used to implement the first application according to the correlation includes: the message forwarding device according to the first feature vector and the second feature vector, It is determined that the first service and the second service are used to implement the first application.
  • the feature vector of the service can be further extracted by the graph embedding technology, and then the service used to implement the same application can be determined according to the feature vector, which can improve the accuracy of classification.
  • the method further includes: the packet forwarding device extracts DNS characteristics of the multiple data streams, the DNS characteristics including the combination of the destination IP address and the destination port number and the domain name Correspondence; according to the destination address information of the data stream of the first application and the DNS feature, the label of the first application is determined, and the label is used to identify the first application.
  • the packet forwarding device can also obtain the DNS characteristics of the data flow, and identify the application corresponding to the classified data flow through the DNS characteristic, which can facilitate the user to have an intuitive understanding of the application type.
  • the second aspect of the embodiments of the present application provides a message forwarding device, which is applied between an internal network and the Internet, and includes: an acquisition unit for acquiring multiple data streams, and extracting each data stream in the multiple data streams
  • the multiple data streams are data streams generated by multiple client devices respectively accessing multiple services.
  • the service is used to implement sub-functions of the application.
  • the address information includes source IP address, source port number, The destination IP address and the destination port number; the selection unit is used to filter out the data flow set generated by the first client device accessing multiple services from the multiple data flows according to the source IP address of each data flow.
  • a client device is the client device assigned to use the first IP address among the plurality of client devices; the determining unit is configured to determine the destination IP address and destination port number of each data flow in the data flow set
  • a service set accessed by the first client device, the service set includes a first service and a second service, the combination of the destination IP address and destination port number corresponding to the first service and the destination IP address and destination port corresponding to the second service The combination of numbers is different;
  • the determining unit is also used to determine the correlation between services in the service set according to the time information of each data stream in the data stream set; the determining unit is also used to determine the correlation between the services in the service set according to the correlation And determining that the first service and the second service are used to implement the first application; the determining unit is further configured to determine that the data flow corresponding to the first service and the second service is the data flow of the first application.
  • the determining unit is specifically configured to perform clustering through an unsupervised algorithm according to the correlation, and determine that the first service and the second service are used to implement the first application.
  • the determining unit is specifically configured to determine the first co-occurring service set, the first service and the second service set according to the time information of each data flow in the data flow set.
  • the service belongs to the first co-occurring service set, the first co-occurring service set includes at least two services, and the interval duration for accessing the time information of the data streams generated by the at least two services is less than or equal to the preset duration;
  • the first co-occurrence service set determines the correlation between the first service and the second service.
  • the determining unit is further configured to: determine the similarity between the first service and the second service according to the first co-occurring service set to obtain a similarity matrix;
  • the determining unit is specifically configured to determine, according to the similarity matrix, that the first service and the second service are used to implement the first application.
  • the determining unit is specifically configured to: determine the difference between the first service and the second service according to the cosine similarity calculation method, the intersection ratio calculation method, or the Euclidean distance calculation method. Similarity.
  • the device further includes: an extracting unit, configured to extract the first feature vector of the first service and the first feature vector of the second service from the similarity matrix through graph embedding technology Two feature vectors; the determining unit is specifically configured to: determine that the first service and the second service are used to implement the first application according to the first feature vector and the second feature vector.
  • the extraction unit is further configured to: extract DNS features of the multiple data streams, the DNS features including the correspondence between the combination of the destination IP address and the destination port number and the domain name;
  • the determining unit is further configured to determine a label of the first application according to the destination address information of the data flow of the first application and the DNS feature, where the label is used to identify the first application.
  • the third aspect of the embodiments of the present application provides a message forwarding device, which is applied between an internal network and the Internet, and includes: a processor and a network interface; the network interface is used to send and receive data; and the processor is used to execute the above-mentioned first aspect And the methods in its various implementations.
  • the fourth aspect of the embodiments of the present application provides a computer program product.
  • the computer program product includes instructions that, when the instructions run on a computer, cause the computer to execute the methods in the first aspect and its implementations.
  • the fifth aspect of the embodiments of the present application provides a computer-readable storage medium that stores instructions.
  • the instructions When the instructions are run on a computer, the first aspect of the foregoing embodiments of the present application and various implementation modes thereof are executed Methods.
  • a sixth aspect of the embodiments of the present application provides a communication system, including the packet forwarding device of the foregoing second aspect.
  • the embodiment of the present application provides a data stream classification method, which is applied to a packet forwarding device between an internal network and the Internet.
  • An application usually consists of a set of services. The service is to implement sub-functions of the application.
  • Multiple client devices access the application, that is, a large number of data streams are generated when accessing multiple services.
  • the packet forwarding device extracts source address information, destination address information, and time information of multiple data streams.
  • the multiple client devices include the first client device; the packet forwarding device can filter based on the source address information
  • the data stream set generated by the first client device accessing multiple services; the packet forwarding device can determine the service set accessed by the first client device according to the destination address information, and the service set includes the first service and the first service.
  • the message forwarding device determines the correlation between services in the service set according to the time information of the data flow set; the message forwarding device determines the first service and the second service according to the correlation Used to implement the first application; the packet forwarding device determines that the data flow corresponding to the first service and the second service is the data flow of the first application.
  • the message forwarding device can classify applications according to the source address information, destination address information and time information of the data stream. Compared with the identification rules obtained by the technical staff, it can reduce the workload of the technical staff and complete the application classification conveniently and quickly. .
  • Figure 1 is an SD-WAN network architecture diagram in an embodiment of the application
  • FIG. 2 is a schematic diagram of an embodiment of a data stream classification method in an embodiment of this application
  • FIG. 3 is a schematic diagram of another embodiment of a data stream classification method in an embodiment of this application.
  • FIG. 4 is a schematic diagram of an embodiment of a packet forwarding device in an embodiment of the application.
  • FIG. 5 is a schematic diagram of another embodiment of a packet forwarding device in an embodiment of this application.
  • Differentiating or identifying data flows belonging to different applications of aggregated traffic can facilitate network management, such as providing different service guarantees for the traffic of different applications.
  • DPI technology is used to classify data streams.
  • the DPI device extracts traffic characteristics based on the byte information in the data stream, and then matches the extracted traffic characteristics with preset identification rules to obtain the classification result. Since the identification rules used by the DPI device to classify traffic need to be summarized and obtained by technical personnel, it is time-consuming and laborious, and it is difficult to meet the problem of unidentified data flow caused by application updates. In addition, because the extraction of flow characteristics is based on byte information in the data stream, it is difficult to extract the flow characteristics of encrypted messages.
  • the embodiments of the present application provide a data stream classification method, which is used to classify data streams belonging to different applications, which can reduce the workload of technicians and avoid the problem of unrecognized data streams caused by application updates.
  • the data stream classification method provided in the embodiments of the present application can be applied to various internal networks, such as enterprise networks or campus networks, and the present application does not limit specific application scenarios.
  • the following uses SD-WAN as an example to introduce.
  • FIG. 1 is a diagram of the SD-WAN network architecture in the embodiment of this application.
  • SD-WAN is a service formed by applying software defined network (software defined network, SDN) technology to a wide area network scenario.
  • SDN software defined network
  • SDN uses virtualization technology to simplify management and operation and maintenance.
  • QoS quality of service
  • the network equipment at the location of the enterprise branch constitutes the branch site (for example, site1, siteN in the figure), and the network equipment at the location of the company headquarters or data center (enterprise HQ/DC) constitutes the headquarters site.
  • the interconnection between branch sites and headquarters sites, or the interconnection between branch sites is achieved by creating dynamic smart virtual private network (DSVPN) tunnels.
  • the logical link type corresponding to the DSVPN tunnel may be Internet, multiprotocol label switching (multiprotocol label switching, MPLS), or long-term evolution (long-term evolution, LTE), etc.
  • the node that realizes flow identification is a packet forwarding device between an internal network and the Internet, such as a router.
  • FIG. 2 is a schematic diagram of an embodiment of a data stream classification method in an embodiment of this application.
  • An application consists of a set of services, which are used to implement sub-functions of the application.
  • a client device accesses an application, it will establish multiple data streams with multiple services belonging to the application.
  • the time information of the multiple data streams will have a strong correlation.
  • a message forwarding device obtains multiple data streams, and extracts address information and time information of each data stream in the multiple data streams.
  • multiple client devices When multiple client devices respectively access one or more applications, multiple data streams will be established.
  • One application includes multiple services.
  • the message forwarding device acquires multiple data streams, and the multiple data streams are data streams generated by multiple client devices respectively accessing multiple services.
  • the data stream forwarded by the message forwarding device within a time period can be obtained.
  • the time period of the time period may be 12 hours or 24 hours, etc. The specific time period is not limited here.
  • the message forwarding device extracts the address information and time information of each data stream in the multiple data streams.
  • the address information includes a source Internet Protocol (IP) address, a source port number, a destination IP address, and a destination port number; the combination of the destination IP address and the destination port number can be used to identify the service accessed by the data stream.
  • IP Internet Protocol
  • the address information is a stream quintuple, that is, IP address, source port number, destination IP address, destination port number, and transmission protocol.
  • Time information refers to the start time and/or end time of the data stream.
  • the start time may be the SYN packet sending time
  • the end time may be the FIN packet sending time.
  • the start time of the data stream is the time when a UDP message is received, and the message forwarding device will establish a forwarding table after receiving the UDP message, and forward
  • the aging time of the publication is usually 120 seconds, that is, if no data flow hits the forwarding table within 120 seconds, this entry is aged and deleted; the end time of the data flow is the last packet hit time during the entire existence of the forwarding table, so the end time It can be reduced by 120 seconds from the aging deletion time of the forwarding table.
  • 120 seconds is only an example of the aging time of the forwarding table.
  • the aging time of the forwarding table of the message forwarding device can be configured by the administrator according to specific network scenarios, or can be updated in a self-learning manner.
  • the packet forwarding device filters out the data stream sets generated by the first client device accessing the multiple services from the multiple data streams according to the source IP address of each data stream.
  • the packet forwarding device After the packet forwarding device extracts the address information of each data stream in the multiple data streams, it can filter out the multiple data streams that the first client device accesses according to the source IP address in the address information of the data stream.
  • the first client device is a client device assigned to use a first IP address among the plurality of client devices.
  • the packet forwarding device filters data streams whose source IP address is the first IP address to obtain a data stream set generated by the first client device accessing multiple services.
  • the packet forwarding device may determine the data stream set corresponding to each of the multiple client devices according to the source IP address of the multiple data streams obtained in step 201, which is not specifically done here. limited.
  • the packet forwarding device determines the service set accessed by the first client device according to the destination IP address and destination port number of each data flow in the data flow set.
  • the combination of the destination IP address and the destination port number can be used to identify the service accessed by the data flow, and the packet forwarding device determines the service accessed by each data flow according to the destination IP address and the destination port number in the address information of the data flow.
  • the message forwarding device determines a service set accessed by the first client device, the service set includes a first service and a second service, and the combination of the destination IP address and the destination port number corresponding to the first service corresponds to the second service
  • the combination of destination IP address and destination port number is different.
  • the packet forwarding device may determine the service set corresponding to the data flow set of each client device in the multiple client devices, which is not specifically limited here.
  • the packet forwarding device determines the correlation between services in the service set according to the time information of each data flow in the data flow set.
  • the packet forwarding device obtains the data stream set established by the first client device and the service set accessed by the first client device. According to the time information of the data stream, the packet forwarding device can obtain the time information for the first client device to access each service in the service set.
  • the correlation between services in the service set refers to the degree of correlation of each service at the time information level.
  • the start time of establishing the data stream with the group of services will be correlated, for example, between the start time of the data stream corresponding to the service set
  • the time interval between is less than the preset first duration threshold.
  • the first duration threshold is an empirical value determined according to network conditions in actual applications, and may be 30s or 25s, etc., which is not specifically limited here.
  • a second duration threshold will also appear, and the second duration threshold is also an empirical value determined according to network conditions. "First” and "Second" are only used to distinguish. In practical applications, the physical values can be the same or different. According to the start time of each data flow in the data flow set, if the packet forwarding device determines that the start time of the first service and the second service in the service set is less than the first duration threshold, then the first service and the second service Relevant; otherwise irrelevant.
  • the end time of the data stream established with this set of services should also be relevant, for example, the end time of the multiple data streams corresponding to the service set
  • the time interval should also be less than the preset second duration threshold. According to the end time of each data flow in the data flow set, if the packet forwarding device determines that the end time of the first service and the second service in the service set is less than the second duration threshold, then the first service and the second service Relevant; otherwise irrelevant.
  • the time information may be the start time and end time of the data stream. It can be understood that if the time interval between the start times of the data streams corresponding to two services in the service set is less than the preset first duration threshold, and The time interval between the end moments is less than the preset second duration threshold, and it can be more accurately determined that the first service is related to the second service.
  • the packet forwarding device determines, according to the correlation, that the first service and the second service are used to implement the first application.
  • the packet forwarding device may determine that the first service and the second service that are relevant in the service set are services that implement the first application.
  • the third service that is not related to the first service and the second service is a service that realizes the second application; optionally, if the first service, the second service, and the fourth service are related, it can be determined
  • the first service, the second service, and the fourth service are services for implementing the first application.
  • the packet forwarding device determines that the data flow corresponding to the first service and the second service is the data flow of the first application.
  • the packet forwarding device After the packet forwarding device determines that the first service and the second service are used to implement the first application, it may determine that the data flow corresponding to the first service and the second service is the data flow of the first application.
  • the packet forwarding device obtains multiple data streams and extracts the respective address information and time information of each data stream in the multiple data streams to further determine the data stream set generated by the first client device accessing the service set , Determine the first service and the second service belonging to the first application through the correlation of the services in the service set, and then determine that the data flow for accessing the first service and the second service is the data flow of the first application, so as to realize the data flow classification.
  • the data stream classification method provided by the embodiments of this application classifies by the internal correlation of time information, and does not need to identify the byte information in the data stream according to the identification rules, which can reduce the workload of the technicians and avoid application updates. The problem of unrecognized data flow.
  • FIG. 3 is a schematic diagram of another embodiment of a data stream classification method in an embodiment of this application.
  • the packet forwarding device extracts multiple data streams, and extracts address information and time information of each data stream in the multiple data streams.
  • multiple client devices When multiple client devices respectively access one or more applications, multiple data streams will be established.
  • One application includes multiple services.
  • the message forwarding device acquires multiple data streams, and the multiple data streams are data streams generated by multiple client devices respectively accessing multiple services.
  • the data stream forwarded by the message forwarding device within a time period can be obtained.
  • the time period of the time period may be 12 hours or 24 hours, etc. The specific time period is not limited here.
  • the message forwarding device extracts the address information and time information of each data stream in the multiple data streams.
  • the address information includes source IP address, source port number, destination IP address, and destination port number; optionally, the address information is a stream quintuple, that is, source IP address, source port number, destination IP address, destination port number, and transmission protocol.
  • Time information refers to the start time and/or end time of the data stream.
  • the start time of the data stream can be the SYN packet sending time, and the end time of the data stream can be the FIN packet sending time; for the data stream sent based on the UDP protocol, The start time of the data stream is the moment when the UDP message is received.
  • the message forwarding device will establish a forwarding table after receiving the UDP message.
  • the aging time of the forwarding table is usually 120 seconds, that is, if no data stream hits the forwarding table within 120 seconds, This entry is aging and deleted; the end time of the data flow is the last packet hit time during the entire existence of the forwarding table, so the end time can be the aging deletion time of the forwarding table minus 120 seconds.
  • 120 seconds is only an example of the aging time of the forwarding table.
  • the aging time of the forwarding table of the message forwarding device can be configured by the administrator according to specific network scenarios, or can be updated in a self-learning manner.
  • the packet forwarding device filters out the first data flow set generated by the first client device accessing the multiple services from the multiple data flows.
  • the packet forwarding device After the packet forwarding device extracts the address information of each data stream in the multiple data streams, it can filter out the multiple data streams that the first client device accesses according to the source IP address in the address information of the data stream.
  • the first client device is a client device assigned to use a first IP address among the plurality of client devices.
  • the packet forwarding device filters the data flow whose source IP address is the first IP address to obtain the first data flow set generated by the first client device accessing multiple services.
  • Each row in the table represents the address information and time information of a data stream.
  • the first column is the address information of the client device, namely the source IP address (srcIP) and source port number (srcPORT).
  • srcIP1 represents the first IP address, which corresponds to the first client device.
  • the source port number corresponding to different data streams may be Different, the table is distinguished by srcPORT 1 to srcPORT n; the second column is the destination IP address and destination port number of the data stream.
  • the combination of destination IP address (dstIP) and destination port number (dstPORT) can be used to identify a service.
  • the services accessed by the data stream may be different.
  • the table is distinguished from S 1 to S n ; the third column is the start time of the data stream. Different data streams correspond to different start times, and the table is distinguished from T1 to Tn.
  • the data flow table of the first client device may be arranged according to the time sequence of the start time of the data flow.
  • the packet forwarding device may use the multiple data streams obtained in step 201 to determine the set of data streams generated by each client device accessing the service in the multiple client devices according to different source IP addresses. Not limited. Exemplarily, the packet forwarding device determines a second data stream set generated by the second client device accessing multiple services.
  • the packet forwarding device determines the service set accessed by the first client device according to the destination IP address and destination port number of each data flow in the data flow set.
  • the combination of the destination IP address and the destination port number can be used to identify the service accessed by the data flow, and the packet forwarding device determines the service accessed by each data flow according to the destination IP address and the destination port number in the address information of the data flow.
  • the packet forwarding device determines a first service set accessed by the first client device, the first service set includes a first service and a second service, the combination of the destination IP address and the destination port number corresponding to the first service and the first service set
  • the combination of the destination IP address and destination port number corresponding to the second service is different.
  • the first service set accessed by the first client device is (S 1 , S 2 ,..., S n ).
  • the packet forwarding device may determine the service set corresponding to the data flow set of each client device in the multiple client devices, which is not specifically limited here.
  • the packet forwarding device determines the second service set accessed by the second client device.
  • the packet forwarding device determines the first co-occurrence service set according to the time information of each data flow in the data flow set.
  • the message forwarding device may obtain time information for the first client device to access each service in the service set, and determine the first co-occurring service set according to the time information of each service.
  • the first co-occurrence service set includes a first service and a second service, the first co-occurrence service set includes at least two services, and the interval length of time information for accessing data streams generated by the at least two services is less than or equal to The preset duration.
  • the packet forwarding device may obtain the data stream whose interval duration of the time information is less than or equal to the preset duration according to the sliding method of the time window.
  • the packet forwarding device may obtain the data stream whose interval duration of the time information is less than or equal to the preset duration according to the sliding method of the time window.
  • Table 1 If the first data stream set is arranged in the order of the start time of the data stream, starting from T1, take a time window with a duration of w, and w is a time window determined according to network conditions in actual applications
  • the experience value can be 30s or 25s, etc., which is not specifically limited here.
  • the following example introduces the maintenance method of the service co-occurrence frequency table.
  • the acquired first co-occurrence service set S it includes S 1 and S 2. Since S 1 and S 2 appear once each, in the table (S 1 , S 1 Record 1 at ). Similarly, record 1 at (S 2 , S 2 ). Since S 1 and S 2 appear in the first co-occurrence service set at the same time, in (S 1 , S 2 ) and (S 2 , S 1 ) also record 1 respectively.
  • the same time window size is also used for similar analysis, that is, the above steps 302 to 304 are repeated, and the table 2 is updated.
  • the service co-occurrence frequency table shown and the independent appearance frequency of each service In other words, if the message forwarding device obtains the second set of co-occurrence services of the second client device, it can also accumulate and update data in the service co-occurrence count table according to the above method.
  • the packet forwarding device determines the similarity between the first service and the second service according to the first co-occurrence service set, to obtain a similarity matrix.
  • the message forwarding device determines the similarity between the first service and the second service according to the first co-occurrence service set to obtain a similarity matrix; the message forwarding device determines the first service and the second service according to the similarity matrix The correlation between this second service.
  • the message forwarding device calculates the similarity between services according to the service co-occurrence frequency table obtained in step 304.
  • the matrix ⁇ in Table 2, the size of the matrix is Ms ⁇ Ms, Ms is the total number of services appearing in the data stream set, the data in the i-th row and the j-th column are denoted by ⁇ ij , the i-th service and the j-th service The number of occurrences at the same time, the similarity between the two is recorded as ⁇ ij .
  • the message forwarding device can be based on the cosine similarity calculation method, the intersection and union calculation method or the Euclidean distance The calculation method determines the similarity between services.
  • the packet forwarding device calculates the similarity between services according to the service co-occurrence frequency table obtained in step 304 and the number of independent appearances of the services.
  • a service appears independently, in fact, the number of times that the service appears simultaneously with other services is 0.
  • ⁇ ij is the similarity between the i service and the j-th service
  • ⁇ i is the i-th row of the matrix ⁇
  • ⁇ j is the j-th row of the matrix ⁇
  • represents infinite norm calculation
  • represents the vector product.
  • ⁇ ij is the similarity between the i-th service and the j-th service
  • ⁇ ij represents the number of co-occurrences between the i-th service and the j-th service, that is, the data in the i-th row and the j-th column in Table 2
  • Ni represents the total number of occurrences of the i-th service, that is, the data in the i-th row and the i-th column in Table 2.
  • ⁇ ij is the similarity between the i-th service and the j-th service, dist( ⁇ i , ⁇ j ) represents, and ⁇ ij represents the number of co-occurrences between the i-th service and the j-th service, which is shown in Table 2.
  • Ms is the total number of services appearing in the data stream set.
  • the message forwarding device can obtain the similarity matrix E according to the similarity between any two services in the first co-occurring service set.
  • the message forwarding device extracts the first feature vector of the first service and the second feature vector of the second service from the similarity matrix by using graph embedding technology.
  • the similarity matrix E can be regarded as the adjacency matrix of the graph composed of services, the nodes in the graph are services, and the connection weight of the edges is the value of the adjacency matrix.
  • Graph embedding technology is based on the connection relationship between the nodes in the graph, that is, the adjacency matrix, the nodes in the graph are represented by a dense vector, and the feature vector of each service is further extracted.
  • the message forwarding device extracts the first feature vector of the first service and the second feature vector of the second service from the similarity matrix through graph embedding technology.
  • step 306 is an optional step, which may or may not be performed, and is not limited here.
  • the packet forwarding device performs clustering through an unsupervised algorithm according to the first feature vector and the second feature vector, and determines that the first service and the second service are used to implement the first application.
  • the packet forwarding device performs application clustering according to the first feature vector and the second feature vector, and determines that the first service and the second service are used to implement the first application.
  • the method of performing application clustering is an unsupervised algorithm, such as a spectral clustering algorithm, a K-Means clustering algorithm, or a DBSCAN density clustering algorithm, etc., which are not specifically limited here.
  • the packet forwarding device may directly perform clustering through an unsupervised algorithm according to the similarity matrix, and determine that the first service and the second service are used to implement the first application.
  • the packet forwarding device determines that the data flow corresponding to the first service and the second service is the data flow of the first application.
  • the packet forwarding device After the packet forwarding device determines that the first service and the second service are used to implement the first application, it may determine that the data flow corresponding to the first service and the second service is the data flow of the first application.
  • the packet forwarding device extracts the DNS characteristics of the multiple network traffic data flows.
  • the packet forwarding device extracts the DNS features of the multiple data streams, and the DNS features include the combination of the destination IP address and the destination port number and the corresponding relationship between the domain name.
  • step 309 may be executed before any one of step 302 to step 308, and the details are not limited here.
  • the DNS feature of a network traffic data stream extracted by the packet forwarding device is: the DNS domain name corresponding to the first service is iLearning.huawei.com.
  • the message forwarding device extracts the DNS features of the multiple data streams, and the DNS features include the combination of the destination IP address and the destination port number and the corresponding relationship between the domain name. According to the destination address information of the data stream of the first application and the DNS feature, a label of the first application is obtained, and the label is used to identify the first application.
  • the packet forwarding device determines that the first service and the second service belong to the first application, and according to the DNS characteristics of the data flow corresponding to the first service, the DNS domain name corresponding to the first service can be determined to be iLearning.huawei.com, and report The text forwarding device can use the relevant information in the DNS domain name as the label of the first application, such as "iLearing". It is understandable that the DNS domain name corresponding to the second service can also be used to determine the label of the first application, which is not specifically limited here.
  • FIG. 4 is a schematic diagram of an embodiment of the message forwarding device in the embodiment of the application.
  • the message forwarding device provided by the embodiment of the application is applied between the internal network and the Internet, and includes:
  • the obtaining unit 401 is configured to obtain multiple data streams, and extract the address information and time information of each data stream in the multiple data streams.
  • the multiple data streams are data generated by multiple client devices respectively accessing multiple services
  • the service is used to implement the sub-functions of the application.
  • the address information includes the source IP address, source port number, destination IP address, and destination port number; the selection unit 402 is used to select from the source IP address of each data flow.
  • a set of data streams generated by a first client device accessing multiple services is filtered out, and the first client device is a client device assigned to use a first IP address among the multiple client devices;
  • the unit 403 is configured to determine the service set accessed by the first client device according to the destination IP address and destination port number of each data flow in the data flow set.
  • the service set includes the first service and the second service.
  • the combination of the destination IP address and the destination port number corresponding to a service is different from the combination of the destination IP address and destination port number corresponding to the second service; the determining unit 403 is also used to determine the value of each data stream in the data stream set.
  • the time information determines the correlation between the services in the service set; the determining unit 403 is further configured to determine that the first service and the second service are used to implement the first application according to the correlation; the determining unit 403. It is also used to determine that the data stream corresponding to the first service and the second service is the data stream of the first application.
  • the determining unit 403 is specifically configured to: perform clustering through an unsupervised algorithm according to the correlation, and determine that the first service and the second service are used to implement the first application.
  • the determining unit 403 is specifically configured to determine a first co-occurring service set according to the time information of each data flow in the data flow set, the first service and the second service belong to the first co-occurring service set, and the first co-occurring service set
  • the co-occurrence service set includes at least two services, and the interval duration for accessing the time information of the data streams generated by the at least two services is less than or equal to the preset duration; according to the first co-occurrence service set, the first co-occurrence service set is determined The correlation between the service and the second service.
  • the determining unit 403 is further configured to: determine the similarity between the first service and the second service according to the first co-occurrence service set to obtain a similarity matrix; the determining unit 403 is specifically configured to: according to the similarity matrix It is determined that the first service and the second service are used to implement the first application.
  • the determining unit 403 is specifically configured to determine the similarity between the first service and the second service according to the cosine similarity calculation method, the intersection ratio calculation method or the Euclidean distance meter algorithm.
  • the device further includes: an extracting unit 404, configured to extract the first feature vector of the first service and the second feature vector of the second service from the similarity matrix through graph embedding technology; the determining unit 403 is specifically configured to: According to the first feature vector and the second feature vector, it is determined that the first service and the second service are used to implement the first application.
  • an extracting unit 404 configured to extract the first feature vector of the first service and the second feature vector of the second service from the similarity matrix through graph embedding technology
  • the determining unit 403 is specifically configured to: According to the first feature vector and the second feature vector, it is determined that the first service and the second service are used to implement the first application.
  • the extracting unit 404 is further configured to: extract the DNS features of the multiple data streams, the DNS features including the combination of the destination IP address and the destination port number and the corresponding relationship between the domain name; the determining unit 403 is also used to extract the DNS features according to the first application The destination address information of the data stream and the DNS feature determine the label of the first application, and the label is used to identify the first application.
  • the functional units in the embodiment shown in FIG. 4 may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • Each of the above-mentioned units can be implemented in the form of hardware or software functional units; some of the units can also be implemented in the form of hardware, and the remaining units can be implemented in the form of software functional units.
  • the units are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer readable storage medium.
  • the technical solution of this application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including several instructions to make a computer device (which can be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program code .
  • FIG. 5 is a schematic diagram of another embodiment of a packet forwarding device in an embodiment of this application.
  • the message forwarding device provided in this embodiment is applied between an internal network and the Internet.
  • the message forwarding device may be a router or a gateway, etc.
  • the specific device form is not limited in this embodiment of the application.
  • the message forwarding device 500 may have relatively large differences due to different configurations or performances, and may include one or more processors 501 and a memory 505, and the memory 505 stores programs or data.
  • the memory 505 may be volatile storage or non-volatile storage.
  • the processor 501 is one or more central processing units (CPU, Central Processing Unit).
  • the CPU may be a single-core CPU or a multi-core CPU.
  • the processor 501 may communicate with the memory 505, and the message forwarding device A series of instructions in memory 505 are executed on 500.
  • the processor 501 may also be an application specific integrated circuit (ASIC, Application Specific Integrated Circuit) or a field programmable gate array (FPGA, Field-Programmable Gate Array). It is understandable that if the processor 501 is an ASIC chip or the like that can store instructions, the memory 505 may not exist.
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • the message forwarding device 500 further includes one or more power sources 502; one or more wired or wireless network interfaces 503, such as Ethernet interfaces; one or more input and output interfaces 504, which can be used for To connect a display, a mouse, a keyboard, a touch screen device or a sensor device, etc., the input and output interface 504 is an optional component, which may or may not exist, and is not limited here.
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Probability & Statistics with Applications (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本申请实施例公开了一种数据流的分类方法,应用于内部网络和互联网之间的报文转发设备。本申请实施例方法包括:报文转发设备获取多条数据流,并提取多条数据流中每条数据流的地址信息和时间信息;根据每条数据流的源IP地址,从多条数据流中筛选出第一客户端设备访问多个服务产生的数据流集合;根据数据流集合中每条数据流的目的IP地址和目的端口号,确定第一客户端设备访问的包括第一服务和第二服务的服务集合;根据数据流集合中每条数据流的时间信息,确定服务集合中的各服务之间的相关性;进而确定第一服务和第二服务用于实现第一应用;由此报文转发设备确定第一服务和第二服务对应的数据流为第一应用的数据流。

Description

数据流的分类方法和报文转发设备
本申请要求于2019年5月14日提交中国国家知识产权局、申请号为201910399861.4、发明名称为“数据流的分类方法和报文转发设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信技术领域,特别涉及一种数据流的分类方法和报文转发设备。
背景技术
软件定义广域网(software defined wide area network,SD-WAN)场景中存在多种企业私有应用,识别网络流量中承载哪些应用的数据,对于网络管理是重要的。
深度报文解析(deep packet inspection,DPI)技术被应用于对数据流进行应用分类。在流量转发的过程中,DPI设备基于数据流中的字节信息提取流量特征,然后将提取的流量特征与预设的识别规则进行匹配可以得到分类结果。
由于现有技术中,DPI设备对流量进行分类所用的识别规则需要由技术人员总结获取,耗时费力。
发明内容
本申请实施例提供了一种数据流的分类方法,应用于内部网络和互联网之间的报文转发设备,可以减少技术人员的工作量,避免应用更新带来的无法识别数据流的问题。
本申请实施例第一方面提供了一种数据流的分类方法,应用于内部网络和互联网之间的报文转发设备,包括:报文转发设备获取多条数据流,并提取该多条数据流中每条数据流的地址信息和时间信息,该多条数据流是多个客户端设备分别访问多个服务产生的数据流,该服务用于实现应用的子功能,该地址信息包括源IP地址、源端口号、目的IP地址和目的端口号;该报文转发设备根据该每条数据流的源IP地址,从该多条数据流中筛选出第一客户端设备访问多个服务产生的数据流集合,该第一客户端设备是该多个客户端设备中被分配使用第一IP地址的客户端设备;该报文转发设备根据该数据流集合中每条数据流的目的IP地址和目的端口号,确定该第一客户端设备访问的服务集合,该服务集合包括第一服务和第二服务,该第一服务对应的目的IP地址和目的端口号的组合与该第二服务对应的目的IP地址和目的端口号的组合不同;该报文转发设备根据该数据流集合中每条数据流的时间信息,确定该服务集合中的各服务之间的相关性;该报文转发设备根据该相关性,确定该第一服务和该第二服务用于实现第一应用;该报文转发设备确定该第一服务和该第二服务对应的数据流为该第一应用的数据流。
应用由一组服务组成,服务用于实现应用的子功能。当客户端设备访问某一应用时,会应用提供方服务器建立多条数据流,多条数据流用于实现属于该应用的多个服务,该多条数据流的时间信息将存在较强的相关性。多个客户端设备分别访问一个或多个应用时,将建立多条数据流。报文转发设备获取多条数据流,该多条数据流是多个客户端设 备分别访问多个服务产生的数据流。报文转发设备提取该多条数据流中每条数据流的地址信息和时间信息,根据该每条数据流的源IP地址,可以从该多条数据流中筛选出第一客户端设备产生的第一数据流集合。报文转发设备根据该数据流集合中每条数据流的该目的IP地址和目的端口号,可以确定该第一客户端设备访问的服务集合,根据该数据流集合中每条数据流的时间信息。报文转发设备确定该服务集合中的各服务之间的相关性,各服务之间的相关性是指各服务在时间信息层面的相关程度。报文转发设备根据该相关性,可以确定用于实现第一应用的该第一服务和该第二服务,由此,报文转发设备可以确定该第一服务和该第二服务对应的数据流为该第一应用的数据流,实现了对于多条数据流的分类。本申请实施例提供的数据流分类方法,通过时间信息的内在关联性进行分类,不需要根据识别规则对数据流中的字节信息进行识别,可以减少技术人员的工作量,避免应用更新带来的无法识别数据流的问题。
在第一方面的一种可能的实现方式中,该时间信息包括:数据流的开始时刻和/或结束时刻。
本申请实施例提供的数据流的分类方法,提供了时间信息的几种具体形式,可以是数据流的开始时刻、结束时刻,或者,开始时刻和结束时刻,这样在实现数据流分类的过程中可以增加方案实现的灵活性,此外,根据开始时刻和结束时刻共同完成数据流分类还可以增加分类的准确度。
在第一方面的一种可能的实现方式中,该报文转发设备根据该相关性,确定该第一服务和该第二服务用于实现第一应用包括:该报文转发设备根据该相关性,通过非监督算法进行聚类,确定该第一服务和该第二服务用于实现第一应用。
本申请实施例提供的数据流的分类方法,使用机器学习中的非监督算法来进行流量分类,在算法开发训练过程中,不需要使用标签样本,仅被用来验证算法的有效性,可以简化分类流程,降低技术人员的工作量。
在第一方面的一种可能的实现方式中,该聚类的方法包括:谱聚类算法、K-Means聚类算法或DBSCAN密度聚类算法。
本申请实施例提供的数据流的分类方法,提供了多种可能的聚类方法,提高了方案实现的灵活性。
在第一方面的一种可能的实现方式中,该报文转发设备根据该数据流集合中每条数据流的时间信息,确定该服务集合中的各服务之间的相关性,包括:该报文转发设备根据该数据流集合中每条数据流的时间信息,确定第一同现服务集合,该第一服务和该第二服务属于该第一同现服务集合,该第一同现服务集合中包括至少两个服务、且访问该至少两个服务产生的数据流的时间信息的间隔时长小于或等于预设的时长;该报文转发设备根据该第一同现服务集合,确定该第一服务与该第二服务之间的相关性。
本申请实施例提供的数据流的分类方法,可以从单一客户端设备访问的多个服务中筛选出间信息的间隔时长小于或等于预设的时长的服务,得到同现服务集合,再确定服务之间的相关性,增强了方案的可实现性。
在第一方面的一种可能的实现方式中,该方法还包括:该报文转发设备根据该第一同现服务集合确定该第一服务和该第二服务之间的相似度,得到相似度矩阵;该报文转发设备根据该相关性,确定该第一服务和该第二服务用于实现第一应用包括:该报文转 发设备根据该相似度矩阵确定该第一服务和该第二服务用于实现第一应用。
本申请实施例提供的数据流的分类方法,通过同现服务集合可以确定服务之间的相似度,得到相似度矩阵,进而根据相似度矩阵确定服务是否用于实现同一应用,该方案可以提高分类的准确度。
在第一方面的一种可能的实现方式中,该报文转发设备根据该第一同现服务集合确定该第一服务和该第二服务之间的相似度包括:该报文转发设备根据余弦相似度计算法、交并比计算法或者欧式距离计算法确定该第一服务和该第二服务之间的相似度。
本申请实施例提供的数据流的分类方法,提供了计算服务之间相似度的几种具体计算方法,提高了方案的可实现性和灵活性。
在第一方面的一种可能的实现方式中,该方法还包括:该报文转发设备通过图嵌入技术从该相似度矩阵中提取该第一服务的第一特征向量和该第二服务的第二特征向量;该报文转发设备根据该相关性,确定该第一服务和该第二服务用于实现第一应用包括:该报文转发设备根据该第一特征向量和该第二特征向量,确定该第一服务和该第二服务用于实现第一应用。
本申请实施例提供的数据流的分类方法,得到相似度矩阵后可以进一步通过图嵌入技术提取服务的特征向量,进而根据特征向量确定用于实现同一应用的服务,可以提高分类的准确度。
在第一方面的一种可能的实现方式中,该方法还包括:该报文转发设备提取该多条数据流的DNS特征,该DNS特征包括该目的IP地址和目的端口号的组合和域名的对应关系;根据该第一应用的数据流的目的地址信息和该DNS特征,确定该第一应用的标签,该标签用于标识该第一应用。
本申请实施例提供的数据流的分类方法,报文转发设备还可以获取数据流的DNS特征,通过DNS特征对分类的数据流对应的应用进行标识,可以便于用户对应用类型有直观的认识。
本申请实施例第二方面提供了一种报文转发设备,应用于内部网络和互联网之间,包括:获取单元,用于获取多条数据流,并提取该多条数据流中每条数据流的地址信息和时间信息,该多条数据流是多个客户端设备分别访问多个服务产生的数据流,该服务用于实现应用的子功能,该地址信息包括源IP地址、源端口号、目的IP地址和目的端口号;选择单元,用于根据该每条数据流的源IP地址,从该多条数据流中筛选出第一客户端设备访问多个服务产生的数据流集合,该第一客户端设备是该多个客户端设备中被分配使用第一IP地址的客户端设备;确定单元,用于根据该数据流集合中每条数据流的目的IP地址和目的端口号,确定该第一客户端设备访问的服务集合,该服务集合包括第一服务和第二服务,该第一服务对应的目的IP地址和目的端口号的组合与该第二服务对应的目的IP地址和目的端口号的组合不同;该确定单元,还用于根据该数据流集合中每条数据流的时间信息,确定该服务集合中的各服务之间的相关性;该确定单元,还用于根据该相关性,确定该第一服务和该第二服务用于实现第一应用;该确定单元,还用于确定该第一服务和该第二服务对应的数据流为该第一应用的数据流。
在第二方面的一种可能的实现方式中,该确定单元具体用于:根据该相关性,通过非监督算法进行聚类,确定该第一服务和该第二服务用于实现第一应用。
在第二方面的一种可能的实现方式中,该确定单元具体用于:根据该数据流集合中每条数据流的时间信息,确定第一同现服务集合,该第一服务和该第二服务属于该第一同现服务集合,该第一同现服务集合中包括至少两个服务、且访问该至少两个服务产生的数据流的时间信息的间隔时长小于或等于预设的时长;根据该第一同现服务集合,确定该第一服务与该第二服务之间的相关性。
在第二方面的一种可能的实现方式中,该确定单元还用于:根据该第一同现服务集合确定该第一服务和该第二服务之间的相似度,得到相似度矩阵;该确定单元具体用于:根据该相似度矩阵确定该第一服务和该第二服务用于实现第一应用。
在第二方面的一种可能的实现方式中,该确定单元具体用于:根据余弦相似度计算法、交并比计算法或者欧式距离计算法确定该第一服务和该第二服务之间的相似度。
在第二方面的一种可能的实现方式中,该设备还包括:提取单元,用于通过图嵌入技术从该相似度矩阵中提取该第一服务的第一特征向量和该第二服务的第二特征向量;该确定单元具体用于:根据该第一特征向量和该第二特征向量,确定该第一服务和该第二服务用于实现第一应用。
在第二方面的一种可能的实现方式中,该提取单元还用于:提取该多条数据流的DNS特征,该DNS特征包括该目的IP地址和目的端口号的组合和域名的对应关系;该确定单元还用于根据该第一应用的数据流的目的地址信息和该DNS特征,确定该第一应用的标签,该标签用于标识该第一应用。
本申请实施例第三方面提供了一种报文转发设备,应用于内部网络和互联网之间,包括:处理器和网络接口;该网络接口用于收发数据;该处理器用于执行上述第一方面及其各实现方式中的方法。
本申请实施例第四方面提供了一种计算机程序产品,该计算机程序产品包括指令,当该指令在计算机上运行时,使得该计算机执行上述第一方面及其各实现方式中的方法。
本申请实施例第五方面提供了一种计算机可读储存介质,该计算机可读存储介质存储指令,当该指令在计算机上运行时,执行前述本申请实施例第一方面及其各实现方式中的方法。
本申请实施例第六方面提供了一种通信系统,包括前述第二方面的报文转发设备。
从以上技术方案可以看出,本申请实施例具有以下优点:
本申请实施例提供了一种数据流的分类方法,应用于内部网络和互联网之间的报文转发设备。应用通常由一组服务组成,服务为实现应用的子功能,多个客户端设备访问应用,即访问多个服务时将产生大量的数据流。首先,报文转发设备提取多条数据流的源地址信息、目的地址信息和时间信息,该多个客户端设备中包括第一客户端设备;该报文转发设备可以根据该源地址信息,筛选出该第一客户端设备访问多个服务产生的数据流集合;该报文转发设备根据该目的地址信息,可以确定该第一客户端设备访问的服务集合,该服务集合包括第一服务和第二服务;该报文转发设备根据该数据流集合的时间信息,确定该服务集合中的服务之间的相关性;该报文转发设备根据该相关性,确定该第一服务和该第二服务用于实现第一应用;该报文转发设备确定该第一服务和该第二服务对应的数据流为该第一应用的数据流。由此,报文转发设备可以根据数据流的源地 址信息、目的地址信息和时间信息实现应用的分类,相较由技术人员总结获取识别规则,可以降低技术人员工作量,方便快捷地完成应用分类。
附图说明
图1为本申请实施例中SD-WAN网络架构图;
图2为本申请实施例中一种数据流的分类方法的一个实施例示意图;
图3为本申请实施例中一种数据流的分类方法的另一个实施例示意图;
图4为本申请实施例中报文转发设备的一个实施例示意图;
图5为本申请实施例中一种报文转发设备的另一个实施例示意图。
具体实施方式
将汇聚流量的属于不同应用的数据流进行区分或识别可以便于网络管理,例如为不同应用的流量提供不同的服务保证等。
通常,采用DPI技术对数据流进行应用分类。在流量转发的过程中,DPI设备基于数据流中的字节信息提取流量特征,然后将提取的流量特征与预设的识别规则进行匹配可以得到分类结果。由于DPI设备对流量进行分类所用的识别规则需要由技术人员总结获取,耗时费力,且难以满足应用更新带来的无法识别数据流的问题。此外,由于流量特征的提取是基于数据流中的字节信息,对于加密报文,其流量特征难以提取。
为此,本申请实施例提供了一种数据流的分类方法,用于将属于不同应用的数据流进行分类,可以减少技术人员的工作量,避免应用更新带来的无法识别数据流的问题。
本申请实施例提供的数据流的分类方法可以应用于多种内部网络,例如企业网或园区网等,本申请对于具体应用场景不做限定。下面以应用于SD-WAN为例进行介绍。
请参阅图1,为本申请实施例中SD-WAN网络架构图,SD-WAN是将软件定义网络(software defined network,SDN)技术应用到广域网场景中所形成的一种服务。SDN使用虚拟化技术,可以简化管理和运维工作。在SD-WAN场景中,存在很多企业私有应用,客户希望对这类应用的流量进行重点服务质量(quality of service,QoS)保障和流量可视,便于网络管理。
SD-WAN解决方案中,企业分支所在地的网络设备组成分支站点(例如图中site1、siteN),公司总部或数据中心(enterprise HQ/DC)所在地的网络设备组成总部站点。分支站点与总部站点之间互联,或者分支站点之间互联都是通过创建动态智能虚拟专用网络(dynamic smart virtual private network,DSVPN)隧道实现。DSVPN隧道对应的逻辑链路类型可以是Internet、多协议标签交换(multiprotocol label switching,MPLS)或长期演进(long-term evolution,LTE)等。
本申请实施例提供的数据流的分类方法,实现流量识别的节点为内部网络和互联网之间的报文转发设备,例如路由器。
基于图1所示的架构,请参阅图2,为本申请实施例中一种数据流的分类方法的一个实施例示意图。
应用由一组服务组成,服务用于实现应用的子功能。当客户端设备访问某一应用时, 会与属于该应用的多个服务建立多条数据流,该多条数据流的时间信息将存在较强的相关性,当考虑多个客户端设备的应用访问行为时,在统计层面上,可以更准确的确定具有时间信息相关性的数据流。
201、报文转发设备获取多条数据流,并提取该多条数据流中每条数据流的地址信息和时间信息。
多个客户端设备分别访问一个或多个应用时,将建立多条数据流。一个应用包括多个服务,本实施例中报文转发设备获取多条数据流,该多条数据流是多个客户端设备分别访问多个服务产生的数据流。本申请实施例中,可以获取一个时间段内经该报文转发设备转发的数据流,该时间段的时长可以是12小时或24小时等,具体时长此处不做限定。
报文转发设备提取该多条数据流中每条数据流的地址信息和时间信息。
该地址信息包括源网际协议(internet protocol,IP)地址、源端口号、目的IP地址和目的端口号;其中目的IP地址和目的端口号的组合可以用于标识该数据流访问的服务。可选的,地址信息为流五元组,即IP地址、源端口号、目的IP地址、目的端口号和传输协议。
时间信息指数据流的开始时刻和/或结束时刻。
可选地,对基于传输控制协议(transmission control protocol,TCP)协议发送的数据流来说,开始时刻可以是SYN包发送时刻,结束时刻可以是FIN包发送时刻。对基于用户数据报协议(user datagram protocol,UDP)协议发送的数据流来说,数据流开始时刻为接收到UDP报文的时刻,报文转发设备收到UDP报文后将建立转发表,转发表的老化时长通常为120秒,即120秒内若没有数据流命中转发表,此表项老化删除;数据流的结束时刻即为转发表整个存在期间的最后一次报文命中时间,因此结束时刻可以是转发表老化删除时刻减120秒。当然,120秒仅为转发表老化时间的举例,报文转发设备转发表的老化时间可以由管理员根据具体网络场景进行配置,也可以采用自学习的方式更新。
202、报文转发设备根据该每条数据流的源IP地址,从该多条数据流中筛选出第一客户端设备访问多个服务产生的数据流集合。
报文转发设备提取该多条数据流中每条数据流的地址信息后,可以根据数据流的地址信息中的源IP地址,从该多条数据流中筛选出第一客户端设备访问多个服务产生的数据流集合。该第一客户端设备是该多个客户端设备中被分配使用第一IP地址的客户端设备。报文转发设备筛选源IP地址为第一IP地址的数据流即可得到第一客户端设备访问多个服务产生的数据流集合。
可选地,报文转发设备可以将步骤201中获取的多条数据流,根据源IP地址的不同,确定多个客户端设备中每个客户端设备对应的数据流集合,具体此处不做限定。
203、该报文转发设备根据该数据流集合中每条数据流的目的IP地址和目的端口号,确定该第一客户端设备访问的服务集合。
目的IP地址和目的端口号的组合可以用于标识数据流访问的服务,报文转发设备根据数据流的地址信息中的目的IP地址和目的端口号确定每条数据流访问的服务。报文转发设备确定该第一客户端设备访问的服务集合,该服务集合包括第一服务和第二服 务,该第一服务对应的目的IP地址和目的端口号的组合与该第二服务对应的目的IP地址和目的端口号的组合不同。
可选地,报文转发设备可以确定多个客户端设备中每个客户端设备的数据流集合对应的服务集合,具体此处不做限定。
204、报文转发设备根据该数据流集合中每条数据流的时间信息,确定该服务集合中的各服务之间的相关性。
根据步骤202和步骤203,报文转发设备获取了第一客户端设备建立的数据流集合,以及第一客户端设备访问的服务集合。根据数据流的时间信息,报文转发设备可以获取第一客户端设备访问该服务集合中的各服务的时间信息。
该服务集合中的各服务之间的相关性是指各服务在时间信息层面的相关程度。
可选地,由于第一客户端设备访问一个应用时将与一组服务建立数据流,与该组服务建立数据流的开始时刻将具有相关性,例如该服务集合对应的数据流的开始时刻之间的时间间隔小于预设的第一时长阈值。该第一时长阈值为实际应用中根据网络条件确定的一个经验值,可以是30s或25s等,此处具体不作限定。同理,在以下的实施例中还会出现第二时长阈值,该第二时长阈值也是根据网络条件确定的一个经验值。“第一”和“第二”仅用于区分,在实际应用中,物理值可以一样,也可以不一样。根据该数据流集合中每条数据流的开始时刻,若报文转发设备确定该服务集合中第一服务和第二服务的开始时刻小于第一时长阈值,则该第一服务与该第二服务相关;反之则不相关。
可选地,第一客户端设备停止访问这一应用时,与这一组服务建立的数据流的结束时刻也应具有相关性,例如该服务集合对应的多个数据流的结束时刻之间的时间间隔也应小于预设的第二时长阈值。根据该数据流集合中每条数据流的结束时刻,若报文转发设备确定该服务集合中第一服务和第二服务的结束时刻小于第二时长阈值,则该第一服务与该第二服务相关;反之则不相关。
可选地,时间信息可以是数据流的开始时刻和结束时刻,可以理解的是若服务集合中两个服务对应的数据流的开始时刻之间的时间间隔小于预设的第一时长阈值,且结束时刻之间的时间间隔小于预设的第二时长阈值,可以更准确地判断该第一服务与该第二服务相关。
205、报文转发设备根据该相关性,确定该第一服务和该第二服务用于实现第一应用。
根据步骤204中对该服务集合中各服务相关性的判断结果,报文转发设备可以确定该服务集合中具有相关性的第一服务和第二服务为实现第一应用的服务,可选地,与第一服务和第二服务不具有相关性的第三服务,为实现第二应用的服务;可选地,若第一服务、第二服务和第四服务之间具有相关性,则可确定第一服务、第二服务和第四服务为用于实现第一应用的服务。
206、该报文转发设备确定该第一服务和该第二服务对应的数据流为该第一应用的数据流。
报文转发设备确定该第一服务和该第二服务用于实现第一应用后,可以确定该第一服务和该第二服务对应的数据流为该第一应用的数据流。
由此,报文转发设备通过获取多条数据流,并提取该多条数据流中每条数据流分别 的地址信息和时间信息,进一步确定第一客户端设备访问服务集合产生的的数据流集合,通过服务集合中各服务的相关性确定属于第一应用的第一服务和第二服务,进而确定访问第一服务和第二服务的数据流为第一应用的数据流,从而实现数据流的分类。本申请实施例提供的数据流分类方法,通过时间信息的内在关联性进行分类,不需要根据识别规则对数据流中的字节信息进行识别,可以减少技术人员的工作量,避免应用更新带来的无法识别数据流的问题。
基于图1所示的架构,请参阅图3,为本申请实施例中一种数据流的分类方法的另一个实施例示意图。
301、报文转发设备提取获取多条数据流,并提取该多条数据流中每条数据流分别的地址信息和时间信息。
多个客户端设备分别访问一个或多个应用时,将建立多条数据流。一个应用包括多个服务,本实施例中报文转发设备获取多条数据流,该多条数据流是多个客户端设备分别访问多个服务产生的数据流。本申请实施例中,可以获取一个时间段内经该报文转发设备转发的数据流,该时间段的时长可以是12小时或24小时等,具体时长此处不做限定。
报文转发设备提取该多条数据流中每条数据流的地址信息和时间信息。
该地址信息包括源IP地址、源端口号、目的IP地址和目的端口号;可选的,地址信息为流五元组,即源IP地址、源端口号、目的IP地址、目的端口号和传输协议。
时间信息指数据流的开始时刻和/或结束时刻。
可选地,对基于TCP协议发送的数据流来说,数据流的开始时刻可以是SYN包发送时刻,数据流的结束时刻可以是FIN包发送时刻;对基于UDP协议发送的数据流来说,数据流开始时刻为接收到UDP报文的时刻,报文转发设备收到UDP报文后将建立转发表,转发表的老化时长通常为120秒,即120秒内若没有数据流命中转发表,此表项老化删除;数据流的结束时刻即为转发表整个存在期间的最后一次报文命中时间,因此结束时刻可以是转发表老化删除时刻减120秒。当然,120秒仅为转发表老化时间的举例,报文转发设备转发表的老化时间可以由管理员根据具体网络场景进行配置,也可以采用自学习的方式更新。
302、报文转发设备根据该每条数据流的源IP地址,从该多条数据流中筛选出第一客户端设备访问多个服务产生的第一数据流集合。
报文转发设备提取该多条数据流中每条数据流的地址信息后,可以根据数据流的地址信息中的源IP地址,从该多条数据流中筛选出第一客户端设备访问多个服务产生的第一数据流集合。该第一客户端设备是该多个客户端设备中被分配使用第一IP地址的客户端设备。报文转发设备筛选源IP地址为第一IP地址的数据流即可得到第一客户端设备访问多个服务产生的第一数据流集合。
示例性的,请参考下表,为第一客户端设备的数据流表:
表1
Figure PCTCN2020087363-appb-000001
Figure PCTCN2020087363-appb-000002
表中每一行代表一条数据流的地址信息和时间信息。第一列为客户端设备的地址信息,即源IP地址(srcIP)和源端口号(srcPORT),srcIP1代表第一IP地址,对应于第一客户端设备,不同数据流对应的源端口号可能不同,表中以srcPORT 1至srcPORT n进行区分;第二列为数据流的目的IP地址和目的端口号,目的IP地址(dstIP)和目的端口号(dstPORT)的组合可用于标识一个服务,不同数据流访问的服务可能不同,表中以S 1至S n进行区分;第三列为数据流的开始时刻,不同数据流对应的开始时刻不同,表中以T1至Tn进行区分。可选地,第一客户端设备的数据流表可以根据数据流开始时刻的时间顺序排列。
可选地,报文转发设备可以将步骤201中获取的多条数据流,根据源IP地址的不同,确定多个客户端设备中每个客户端设备访问服务产生的数据流集合,具体此处不做限定。示例性的,报文转发设备确定第二客户端设备访问多个服务产生的第二数据流集合。
303、报文转发设备根据该数据流集合中每条数据流的该目的IP地址和目的端口号,确定该第一客户端设备访问的服务集合。
目的IP地址和目的端口号的组合可以用于标识数据流访问的服务,报文转发设备根据数据流的地址信息中的目的IP地址和目的端口号确定每条数据流访问的服务。报文转发设备确定该第一客户端设备访问的第一服务集合,该第一服务集合包括第一服务和第二服务,该第一服务对应的目的IP地址和目的端口号的组合与该第二服务对应的目的IP地址和目的端口号的组合不同。
示例性地,第一客户端设备访问的第一服务集合为(S 1,S 2,……,S n)。
可选地,报文转发设备可以确定多个客户端设备中每个客户端设备的数据流集合对应的服务集合,具体此处不做限定。示例性的,报文转发设备确定第二客户端设备访问的第二服务集合。
304、报文转发设备根据该数据流集合中每条数据流的时间信息,确定第一同现服务集合。
报文转发设备可以获取第一客户端设备访问该服务集合中的各服务的时间信息,根据各服务的时间信息确定第一同现服务集合。第一同现服务集合中包括第一服务和第二服务,该第一同现服务集合中包括至少两个服务、且访问该至少两个服务产生的数据流的时间信息的间隔时长小于或等于预设的时长。
可选地,报文转发设备可以根据时间窗滑动的方法获取时间信息的间隔时长小于或等于预设的时长的数据流。示例性地,请参考表1,若该第一数据流集合按照数据流的开始时刻的先后顺序排列,从T1开始,取时长为w的时间窗,w为实际应用中根据网络条件确定的一个经验值,可以是30s或25s等,此处具体不作限定。确定时间窗内是否有数据流,即在T1至T1+w的时间段内是否有数据流出现,若有,则将该时间窗向后滑动w长度,即确定在T1+w至T1+2w的时间段内是否有数据流出现,若有则继续上述步骤。若没有数据流出现,则将该时间窗截止时间记为T1+λ 1w,其中λ 1属于整数,且T1+λ 1w小于或等于Tn,在这种情况下,T1对应的数据流所标识的服务实际上是独立出现。 取开始时刻在T1至T1+λ 1w时间段内的数据流对应的服务组成的集合,记为第一同现服务集合S,假设S中包含第一服务S 1和第二服务S 2,可以理解的是,S 1和S 2产生的数据流的开始时刻的间隔时长小于或等于w。可选地,为了更清晰地描述多个服务之间的相关性,可通过一定的数据结构例如二维表进行维护,称为服务同现次数表,请参阅表2。
表2
Figure PCTCN2020087363-appb-000003
然后,从上一次时间窗的截止时间(如T1+λ 1w)开始,找到离该截止时间(T1+λ 1w)最近的流的开始时间作为新的时间窗的起点,并再次进行上述类似滑动时间窗分析,更新服务同现次数表以及各服务的独立出现次数,直到表1所示的第一客户端的数据流表分析完毕。
下面举例对服务同现次数表的维护方法进行介绍,对于获取的第一同现服务集合S,包含S 1和S 2,由于S 1和S 2各出现一次,在表格(S 1,S 1)处记录1,类似的,在(S 2,S 2)处也记录1,由于S 1和S 2同时出现在第一同现服务集合中,在(S 1,S 2)及(S 2,S 1)处也分别记录1。
可选地,对除第一客户端之外的其它的客户端,如第二客户端,也采用相同时间窗大小进行类似的分析,即重复执行上述步骤302~步骤304,并更新表2所示的服务同现次数表以及各服务的独立出现次数。换句话说,若报文转发设备获取了第二客户端设备的第二同现服务集合,也可以根据上述方法,在服务同现次数表中进行累加并更新数据。
305、报文转发设备根据该第一同现服务集合确定该第一服务和该第二服务之间的相似度,得到相似度矩阵。
该报文转发设备根据该第一同现服务集合确定该第一服务和该第二服务之间的相似度,得到相似度矩阵;该报文转发设备根据该相似度矩阵确定该第一服务和该第二服务之间的相关性。
可选地,该报文转发设备根据步骤304获取的服务同现次数表,计算服务之间的相似度。将表2中用矩阵Γ表示,矩阵大小为Ms×Ms,Ms为数据流集合中出现的服务的总数,第i行,第j列的数据Γ ij表示,第i个服务和第j个服务同时出现的次数,两者之间相似度记为ε ij,相似度的计算方法有多种,可选地,该报文转发设备可以根据余弦相似度计算法、交并比计算法或者欧式距离计算法确定服务之间的相似度。
可选地,报文转发设备根据步骤304获取的服务同现次数表和服务独立出现的次数,计算服务之间的相似度。一个服务独立出现,实际上就是该服务与其他服务同时出现的次数为0。
下面逐一进行介绍:
根据余弦相似度计算第i个服务与第j个服务之间的相似度的方法参见公式(1):
Figure PCTCN2020087363-appb-000004
其中,ε ij为i个服务和第j个服务之间的相似度,Γ i为矩阵Γ第i行,Γ j为矩阵Γ第j行,|·|表示无穷范数计算,·表示向量内积。
根据交并比的方式计算如公式(2)所示:
Figure PCTCN2020087363-appb-000005
其中,ε ij为第i个服务和第j个服务之间的相似度,Γ ij表示第i个服务和第j个服务同现次数,即表2中第i行,第j列的数据,Ni代表第i个服务出现的总次数,即表2中第i行,第i列的数据。
根据欧式距离的方法计算如公式(3)所示:
Figure PCTCN2020087363-appb-000006
其中,ε ij为第i个服务和第j个服务之间的相似度,dist(Γ i,Γ j)表示,Γ ij表示第i个服务和第j个服务同现次数,即表2中第i行,第j列的数据,Ms为数据流集合中出现的服务的总数。
报文转发设备根据该第一同现服务集合中任意两个服务之间的相似度,可以得到相似度矩阵E。
306、报文转发设备通过图嵌入技术从该相似度矩阵中提取该第一服务的第一特征向量和该第二服务的第二特征向量。
相似度矩阵Ε可看作服务之间组成的图的邻接矩阵,图中的节点为服务,边的连接权重为邻接矩阵的值。图嵌入技术是根据图中节点间的连接关系,即邻接矩阵,将图中的节点以一个稠密向量来表示,进一步提取每个服务的特征向量。该报文转发设备通过图嵌入技术从该相似度矩阵中提取该第一服务的第一特征向量和该第二服务的第二特征向量。
需要说明的是,步骤306为可选步骤,可以执行也可以不执行,此处不做限定。
307、报文转发设备根据该第一特征向量和该第二特征向量,通过非监督算法进行聚类,确定该第一服务和该第二服务用于实现第一应用。
报文转发设备根据该第一特征向量和该第二特征向量进行应用聚类,确定该第一服务和该第二服务用于实现第一应用。可选地,进行应用聚类的方法为非监督算法,例如谱聚类算法、K-Means聚类算法或DBSCAN密度聚类算法等,具体此处不做限定。
需要说明的是,若不执行步骤306,则报文转发设备可以直接根据相似度矩阵,通过非监督算法进行聚类,确定该第一服务和该第二服务用于实现第一应用。
308、报文转发设备确定该第一服务和该第二服务对应的数据流为该第一应用的数据流。
报文转发设备确定该第一服务和该第二服务用于实现第一应用后,可以确定该第一服务和该第二服务对应的数据流为该第一应用的数据流。
309、报文转发设备提取该多条网络流量数据流的DNS特征。
该报文转发设备提取该多条数据流的DNS特征,该DNS特征包括该目的IP地址和目的端口号的组合和域名的对应关系。
需要说明的是,步骤309可以在步骤302至步骤308中任一项之前执行,具体此处不做限定。
示例性的,报文转发设备提取一条网络流量数据流的DNS特征为:第一服务对应DNS域名为iLearning.huawei.com。
310、根据该第一应用的数据流的目的地址信息和该DNS特征,得到该第一应用的标签。
报文转发设备提取该多条数据流的DNS特征,DNS特征包括该目的IP地址和目的端口号的组合和域名的对应关系。根据该第一应用的数据流的目的地址信息和该DNS特征,得到该第一应用的标签,该标签用于标识该第一应用。
示例性的,报文转发设备确定第一服务和第二服务属于第一应用,根据第一服务对应的数据流的DNS特征,可以确定第一服务对应的DNS域名为iLearning.huawei.com,报文转发设备可以用DNS域名中的相关信息作为第一应用的标签,例如“iLearing”。可以理解的是,第二服务对应的DNS域名也可以用于确定第一应用的标签,此处不做具体限定。
上面介绍了数据流的分类方法,下面将对实现该数据流的分类方法的报文转发装置进行介绍,请参阅图4,为本申请实施例中报文转发设备的一个实施例示意图。
本申请实施例提供的报文转发设备,应用于内部网络和互联网之间,包括:
获取单元401,用于获取多条数据流,并提取该多条数据流中每条数据流的地址信息和时间信息,该多条数据流是多个客户端设备分别访问多个服务产生的数据流,该服务用于实现应用的子功能,该地址信息包括源IP地址、源端口号、目的IP地址和目的端口号;选择单元402,用于根据该每条数据流的源IP地址,从该多条数据流中筛选出第一客户端设备访问多个服务产生的数据流集合,该第一客户端设备是该多个客户端设备中被分配使用第一IP地址的客户端设备;确定单元403,用于根据该数据流集合中每条数据流的目的IP地址和目的端口号,确定该第一客户端设备访问的服务集合,该服务集合包括第一服务和第二服务,该第一服务对应的目的IP地址和目的端口号的组合与该第二服务对应的目的IP地址和目的端口号的组合不同;该确定单元403,还用于根据该数据流集合中每条数据流的时间信息,确定该服务集合中的各服务之间的相关性;该确定单元403,还用于根据该相关性,确定该第一服务和该第二服务用于实现第一应用;该确定单元403,还用于确定该第一服务和该第二服务对应的数据流为该第一应用的数据流。
该确定单元403具体用于:根据该相关性,通过非监督算法进行聚类,确定该第一服务和该第二服务用于实现第一应用。
该确定单元403具体用于:根据该数据流集合中每条数据流的时间信息,确定第一同现服务集合,该第一服务和该第二服务属于该第一同现服务集合,该第一同现服务集合中包括至少两个服务、且访问该至少两个服务产生的数据流的时间信息的间隔时长小于或等于预设的时长;根据该第一同现服务集合,确定该第一服务与该第二服务之间的相关性。
该确定单元403还用于:根据该第一同现服务集合确定该第一服务和该第二服务之间的相似度,得到相似度矩阵;该确定单元403具体用于:根据该相似度矩阵确定该第一服务和该第二服务用于实现第一应用。
该确定单元403具体用于:根据余弦相似度计算法、交并比计算法或者欧式距离计 算法确定该第一服务和该第二服务之间的相似度。
该设备还包括:提取单元404,用于通过图嵌入技术从该相似度矩阵中提取该第一服务的第一特征向量和该第二服务的第二特征向量;该确定单元403具体用于:根据该第一特征向量和该第二特征向量,确定该第一服务和该第二服务用于实现第一应用。
该提取单元404还用于:提取该多条数据流的DNS特征,该DNS特征包括该目的IP地址和目的端口号的组合和域名的对应关系;该确定单元403还用于根据该第一应用的数据流的目的地址信息和该DNS特征,确定该第一应用的标签,该标签用于标识该第一应用。
另外,在附图4所示实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述各个单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现;也可以部分单元采用硬件的形式实现,其余单元采用软件功能单元的形式实现。
所述各单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
附图4所示的报文转发装置实现数据流分类的过程的更多细节请参考上述附图2和附图3相关方法实施例中的描述,在这里不做重复描述。
请参阅图5,为本申请实施例中一种报文转发设备的另一个实施例示意图。本实施例提供的报文转发设备应用于内部网络和互联网之间,该报文转发设备可以是路由器或网关等,本申请实施例中对其具体设备形态不做限定。
该报文转发设备500可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器501和存储器505,该存储器505中存储有程序或数据。
其中,存储器505可以是易失性存储或非易失性存储。可选地,处理器501是一个或多个中央处理器(CPU,Central Processing Unit,该CPU可以是单核CPU,也可以是多核CPU。处理器501可以与存储器505通信,在报文转发设备500上执行存储器505中的一系列指令。
可替换地,处理器501也可以是应用的集成电路(ASIC,Application Specific Integrated Circuit)或者现场可编程门阵列(FPGA,Field-Programmable Gate Array)。可以理解的是,若处理器501为可储存指令的ASIC芯片等,则存储器505可以不存在。
可选地,报文转发设备500还包括一个或一个以上电源502;一个或一个以上有线或无线网络接口503,例如以太网接口;一个或一个以上输入输出接口504,输入输出接口504可以用于连接显示器、鼠标、键盘、触摸屏设备或传感设备等,输入输出接口504为可选部件,可以存在也可以不存在,此处不做限定。
本实施例中报文转发设备500中的处理器501所执行的流程可以参考前述方法实施 例中描述的方法流程,此处不加赘述。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。

Claims (19)

  1. 一种数据流的分类方法,其特征在于,应用于内部网络和互联网之间的报文转发设备,包括:
    报文转发设备获取多条数据流,并提取所述多条数据流中每条数据流的地址信息和时间信息,所述多条数据流是多个客户端设备分别访问多个服务产生的数据流,所述服务用于实现应用的子功能,所述地址信息包括源IP地址、源端口号、目的IP地址和目的端口号;
    所述报文转发设备根据所述每条数据流的源IP地址,从所述多条数据流中筛选出第一客户端设备访问多个服务产生的数据流集合,所述第一客户端设备是所述多个客户端设备中被分配使用第一IP地址的客户端设备;
    所述报文转发设备根据所述数据流集合中每条数据流的目的IP地址和目的端口号,确定所述第一客户端设备访问的服务集合,所述服务集合包括第一服务和第二服务,所述第一服务对应的目的IP地址和目的端口号的组合与所述第二服务对应的目的IP地址和目的端口号的组合不同;
    所述报文转发设备根据所述数据流集合中每条数据流的时间信息,确定所述服务集合中的各服务之间的相关性;
    所述报文转发设备根据所述相关性,确定所述第一服务和所述第二服务用于实现第一应用;
    所述报文转发设备确定所述第一服务和所述第二服务对应的数据流为所述第一应用的数据流。
  2. 根据权利要求1所述的方法,其特征在于,所述时间信息包括:
    数据流的开始时刻和/或结束时刻。
  3. 根据权利要求1或2所述的方法,其特征在于,所述报文转发设备根据所述相关性,确定所述第一服务和所述第二服务用于实现第一应用包括:
    所述报文转发设备根据所述相关性,通过非监督算法进行聚类,确定所述第一服务和所述第二服务用于实现第一应用。
  4. 根据权利要求3所述的方法,其特征在于,所述聚类的方法包括:谱聚类算法、K-Means聚类算法或DBSCAN密度聚类算法。
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,所述报文转发设备根据所述数据流集合中每条数据流的时间信息,确定所述服务集合中的各服务之间的相关性,包括:
    所述报文转发设备根据所述数据流集合中每条数据流的时间信息,确定第一同现服务集合,所述第一服务和所述第二服务属于所述第一同现服务集合,所述第一同现服务集合中包括至少两个服务、且访问所述至少两个服务产生的数据流的时间信息的间隔时长小于或等于预设的时长;
    所述报文转发设备根据所述第一同现服务集合,确定所述第一服务与所述第二服务之间的相关性。
  6. 根据权利要求5所述的方法,其特征在于,所述方法还包括:
    所述报文转发设备根据所述第一同现服务集合确定所述第一服务和所述第二服务之间的相似度,得到相似度矩阵;
    所述报文转发设备根据所述相关性,确定所述第一服务和所述第二服务用于实现第一应用包括:
    所述报文转发设备根据所述相似度矩阵确定所述第一服务和所述第二服务用于实现第一应用。
  7. 根据权利要求6所述的方法,其特征在于,所述报文转发设备根据所述第一同现服务集合确定所述第一服务和所述第二服务之间的相似度包括:
    所述报文转发设备根据余弦相似度计算法、交并比计算法或者欧式距离计算法确定所述第一服务和所述第二服务之间的相似度。
  8. 根据权利要求6或7所述的方法,其特征在于,所述方法还包括:
    所述报文转发设备通过图嵌入技术从所述相似度矩阵中提取所述第一服务的第一特征向量和所述第二服务的第二特征向量;
    所述报文转发设备根据所述相关性,确定所述第一服务和所述第二服务用于实现第一应用包括:
    所述报文转发设备根据所述第一特征向量和所述第二特征向量,确定所述第一服务和所述第二服务用于实现第一应用。
  9. 根据权利要求1至8中任一项所述的方法,其特征在于,所述方法还包括:
    所述报文转发设备提取所述多条数据流的DNS特征,所述DNS特征包括所述目的IP地址和目的端口号的组合和域名的对应关系;
    根据所述第一应用的数据流的目的地址信息和所述DNS特征,确定所述第一应用的标签,所述标签用于标识所述第一应用。
  10. 一种报文转发设备,其特征在于,应用于内部网络和互联网之间,包括:
    获取单元,用于获取多条数据流,并提取所述多条数据流中每条数据流的地址信息和时间信息,所述多条数据流是多个客户端设备分别访问多个服务产生的数据流,所述服务用于实现应用的子功能,所述地址信息包括源IP地址、源端口号、目的IP地址和目的端口号;
    选择单元,用于根据所述每条数据流的源IP地址,从所述多条数据流中筛选出第一客户端设备访问多个服务产生的数据流集合,所述第一客户端设备是所述多个客户端设备中被分配使用第一IP地址的客户端设备;
    确定单元,用于根据所述数据流集合中每条数据流的目的IP地址和目的端口号,确定所述第一客户端设备访问的服务集合,所述服务集合包括第一服务和第二服务,所述第一服务对应的目的IP地址和目的端口号的组合与所述第二服务对应的目的IP地址和目的端口号的组合不同;
    所述确定单元,还用于根据所述数据流集合中每条数据流的时间信息,确定所述服务集合中的各服务之间的相关性;
    所述确定单元,还用于根据所述相关性,确定所述第一服务和所述第二服务用于实现第一应用;
    所述确定单元,还用于确定所述第一服务和所述第二服务对应的数据流为所述第一 应用的数据流。
  11. 根据权利要求10所述的设备,其特征在于,所述确定单元具体用于:
    根据所述相关性,通过非监督算法进行聚类,确定所述第一服务和所述第二服务用于实现第一应用。
  12. 根据权利要求10或11所述的设备,其特征在于,所述确定单元具体用于:
    根据所述数据流集合中每条数据流的时间信息,确定第一同现服务集合,所述第一服务和所述第二服务属于所述第一同现服务集合,所述第一同现服务集合中包括至少两个服务、且访问所述至少两个服务产生的数据流的时间信息的间隔时长小于或等于预设的时长;
    根据所述第一同现服务集合,确定所述第一服务与所述第二服务之间的相关性。
  13. 根据权利要求12所述的设备,其特征在于,所述确定单元还用于:
    根据所述第一同现服务集合确定所述第一服务和所述第二服务之间的相似度,得到相似度矩阵;
    所述确定单元具体用于:
    根据所述相似度矩阵确定所述第一服务和所述第二服务用于实现第一应用。
  14. 根据权利要求13所述的设备,其特征在于,所述确定单元具体用于:
    根据余弦相似度计算法、交并比计算法或者欧式距离计算法确定所述第一服务和所述第二服务之间的相似度。
  15. 根据权利要求13或14所述的设备,其特征在于,所述设备还包括:
    提取单元,用于通过图嵌入技术从所述相似度矩阵中提取所述第一服务的第一特征向量和所述第二服务的第二特征向量;
    所述确定单元具体用于:
    根据所述第一特征向量和所述第二特征向量,确定所述第一服务和所述第二服务用于实现第一应用。
  16. 根据权利要求10至15中任一项所述的设备,其特征在于,所述提取单元还用于:
    提取所述多条数据流的DNS特征,所述DNS特征包括所述目的IP地址和目的端口号的组合和域名的对应关系;
    所述确定单元还用于根据所述第一应用的数据流的目的地址信息和所述DNS特征,确定所述第一应用的标签,所述标签用于标识所述第一应用。
  17. 一种报文转发设备,其特征在于,包括:
    处理器和网络接口;
    所述网络接口用于收发数据;
    所述处理器用于执行如权利要求1至9中任一项所述的方法。
  18. 一种包含指令的计算机程序产品,其特征在于,当其在计算机上运行时,使得所述计算机执行如权利要求1至9中任一项所述的方法。
  19. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储指令,当所述指令在计算机上运行时,使得所述计算机执行如权利要求1至9中任一项所述的方法。
PCT/CN2020/087363 2019-05-14 2020-04-28 数据流的分类方法和报文转发设备 WO2020228527A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP20805486.6A EP3905597B1 (en) 2019-05-14 2020-04-28 Data stream classification method and message forwarding device
US17/468,250 US20210409334A1 (en) 2019-05-14 2021-09-07 Data Flow Classification Method and Packet Forwarding Device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910399861.4A CN111953552B (zh) 2019-05-14 2019-05-14 数据流的分类方法和报文转发设备
CN201910399861.4 2019-05-14

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/468,250 Continuation US20210409334A1 (en) 2019-05-14 2021-09-07 Data Flow Classification Method and Packet Forwarding Device

Publications (1)

Publication Number Publication Date
WO2020228527A1 true WO2020228527A1 (zh) 2020-11-19

Family

ID=73290134

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/087363 WO2020228527A1 (zh) 2019-05-14 2020-04-28 数据流的分类方法和报文转发设备

Country Status (4)

Country Link
US (1) US20210409334A1 (zh)
EP (1) EP3905597B1 (zh)
CN (1) CN111953552B (zh)
WO (1) WO2020228527A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114697271A (zh) * 2020-12-31 2022-07-01 华为技术有限公司 确定数据流标签的方法、装置以及相关设备
CN114710388B (zh) * 2022-03-25 2024-01-23 江苏科技大学 一种校园网安全系统及网络监护系统
US11675873B1 (en) * 2022-06-28 2023-06-13 Lemon Inc. Website similarity determination
CN115333955B (zh) * 2022-08-11 2023-06-02 武汉烽火技术服务有限公司 一种多层端口缓存的管理方法和装置
CN116668405B (zh) * 2023-07-25 2023-09-29 明阳时创(北京)科技有限公司 一种多服务消息通知机制实现方法、系统、介质及设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103051725A (zh) * 2012-12-31 2013-04-17 华为技术有限公司 应用识别方法、数据挖掘方法、装置及系统
CN107864168A (zh) * 2016-09-22 2018-03-30 华为技术有限公司 一种网络数据流分类的方法及系统
US10013291B1 (en) * 2011-11-14 2018-07-03 Ca, Inc. Enhanced software application platform
CN109450740A (zh) * 2018-12-21 2019-03-08 青岛理工大学 一种基于dpi和机器学习算法进行流量分类的sdn控制器
CN109726735A (zh) * 2018-11-27 2019-05-07 南京邮电大学 一种基于K-means聚类和随机森林算法的移动应用程序识别方法

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2004201907B2 (en) * 1999-06-30 2007-04-26 Apptitude Acquisition Corporation A searching apparatus and method
US7133365B2 (en) * 2001-11-02 2006-11-07 Internap Network Services Corporation System and method to provide routing control of information over networks
US8111707B2 (en) * 2007-12-20 2012-02-07 Packeteer, Inc. Compression mechanisms for control plane—data plane processing architectures
US20140321290A1 (en) * 2013-04-30 2014-10-30 Hewlett-Packard Development Company, L.P. Management of classification frameworks to identify applications
CN103297270A (zh) * 2013-05-24 2013-09-11 华为技术有限公司 应用类型识别方法及网络设备
FR3016108B1 (fr) * 2013-12-30 2019-06-28 Taiwan Semiconductor Manufacturing Company, Ltd. Gestion de la qualite des applications dans un systeme de communication cooperatif
KR102171348B1 (ko) * 2014-01-08 2020-10-29 삼성전자주식회사 어플리케이션 검출 방법 및 장치
WO2015167421A1 (en) * 2014-04-28 2015-11-05 Hewlett-Packard Development Company, L.P. Network flow classification
CN103916294B (zh) * 2014-04-29 2018-05-04 华为技术有限公司 协议类型的识别方法和装置
TW201728124A (zh) * 2014-09-16 2017-08-01 科勞簡尼克斯股份有限公司 以彈性地定義之通信網路控制器為基礎之網路控制、操作及管理
CN107646187A (zh) * 2015-06-12 2018-01-30 慧与发展有限责任合伙企业 应用标识高速缓存
CN107181724B (zh) * 2016-03-11 2021-02-12 华为技术有限公司 一种协同流的识别方法、系统以及使用该方法的服务器
CN107547437B (zh) * 2017-05-11 2020-09-08 新华三信息安全技术有限公司 应用识别方法及装置
CN109063777B (zh) * 2018-08-07 2019-12-03 北京邮电大学 网络流量分类方法、装置及实现装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10013291B1 (en) * 2011-11-14 2018-07-03 Ca, Inc. Enhanced software application platform
CN103051725A (zh) * 2012-12-31 2013-04-17 华为技术有限公司 应用识别方法、数据挖掘方法、装置及系统
CN107864168A (zh) * 2016-09-22 2018-03-30 华为技术有限公司 一种网络数据流分类的方法及系统
CN109726735A (zh) * 2018-11-27 2019-05-07 南京邮电大学 一种基于K-means聚类和随机森林算法的移动应用程序识别方法
CN109450740A (zh) * 2018-12-21 2019-03-08 青岛理工大学 一种基于dpi和机器学习算法进行流量分类的sdn控制器

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3905597A4

Also Published As

Publication number Publication date
EP3905597B1 (en) 2024-02-14
CN111953552A (zh) 2020-11-17
EP3905597A1 (en) 2021-11-03
US20210409334A1 (en) 2021-12-30
EP3905597A4 (en) 2022-03-30
CN111953552B (zh) 2022-12-13

Similar Documents

Publication Publication Date Title
WO2020228527A1 (zh) 数据流的分类方法和报文转发设备
US10951495B2 (en) Application signature generation and distribution
US11601351B2 (en) Aggregation of select network traffic statistics
US11159386B2 (en) Enriched flow data for network analytics
CN102739457B (zh) 一种基于dpi和svm技术的网络流量识别方法
JP4774357B2 (ja) 統計情報収集システム及び統計情報収集装置
US9787581B2 (en) Secure data flow open information analytics
CN108270699B (zh) 报文处理方法、分流交换机及聚合网络
US10050892B2 (en) Method and apparatus for packet classification
CN110177123B (zh) 基于dns映射关联图的僵尸网络检测方法
US11650994B2 (en) Monitoring network traffic to determine similar content
US20230164043A1 (en) Service application detection
JP6662812B2 (ja) 計算装置及び計算方法
JP7435744B2 (ja) 識別方法、識別装置及び識別プログラム
WO2023144946A1 (ja) 分析装置、分析方法及び分析プログラム
Shaman User profiling based on network application traffic monitoring
KR101605187B1 (ko) 응용 트래픽 분석을 위한 미지 트래픽 플로우 수집 장치 및 수집 방법
Lucente Pmacct: Steps forward interface counters
JP5300642B2 (ja) 通信網における頻出フロー検出方法と装置およびプログラム
Grémillet et al. Traffic classification techniques supporting semantic networks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20805486

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020805486

Country of ref document: EP

Effective date: 20210728

NENP Non-entry into the national phase

Ref country code: DE