CN112560878A - Service classification method and device and Internet system - Google Patents

Service classification method and device and Internet system Download PDF

Info

Publication number
CN112560878A
CN112560878A CN201910853330.8A CN201910853330A CN112560878A CN 112560878 A CN112560878 A CN 112560878A CN 201910853330 A CN201910853330 A CN 201910853330A CN 112560878 A CN112560878 A CN 112560878A
Authority
CN
China
Prior art keywords
service
services
determining
data stream
pair
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910853330.8A
Other languages
Chinese (zh)
Inventor
华卓隽
罗奇
王璐
黄林杰
邱亚平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201910853330.8A priority Critical patent/CN112560878A/en
Priority to PCT/CN2020/112312 priority patent/WO2021047401A1/en
Publication of CN112560878A publication Critical patent/CN112560878A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/51Discovery or management thereof, e.g. service location protocol [SLP] or web services

Abstract

The application discloses a service classification method and device and an internet system, and belongs to the technical field of internet. The service classification method comprises the following steps: determining n data stream groups of a plurality of data streams; determining a service pair corresponding to each data stream group in the n data stream groups to obtain a plurality of service pairs; determining an interval corresponding to each service in a plurality of service pairs, the interval corresponding to each service pair comprising: the interval of the start time of the corresponding data stream of the two services in the same data stream group in each service pair; determining whether the services in the service pair are associated or not based on the interval corresponding to each service pair to obtain the association relationship of the services in each service pair; and classifying the plurality of services in the plurality of service pairs based on the incidence relation of the services in each service pair in the plurality of service pairs. The method and the device can classify a plurality of services, so that management of the Internet system can be realized based on classification results.

Description

Service classification method and device and Internet system
Technical Field
The present application relates to the field of internet technologies, and in particular, to a service classification method and apparatus, and an internet system.
Background
The internet system comprises the forwarding equipment, a client and a server, wherein the client and the server are connected with the forwarding equipment, and the server provides services for the client through the forwarding equipment.
When the server provides service for the client, data streams corresponding to the service are transmitted between the server and the client through the forwarding equipment.
Currently, there are many types of services provided by a server to a client, and therefore, a method for classifying services is urgently needed to manage an internet system.
Disclosure of Invention
The application provides a service classification method and device and an Internet system, and can classify services, wherein the technical scheme is as follows:
in a first aspect, a service classification method is provided, and the method includes: determining n data stream groups in a plurality of data streams, wherein n is more than or equal to 1, each data stream group in the n data stream groups corresponds to one client and at least two services, and different data stream groups correspond to different clients; determining a service pair corresponding to each data stream group in the n data stream groups to obtain a plurality of service pairs, wherein each service pair corresponding to each data stream group comprises: any two services corresponding to each data stream group; determining an interval corresponding to each service of the plurality of service pairs, the interval corresponding to each service comprising: the interval of the start time of the data stream corresponding to two services in the same data stream group in each service pair is the time when the forwarding device receives the data stream; determining whether the services in each service pair are associated or not based on the interval corresponding to each service pair to obtain the association relationship of the services in each service pair; classifying the plurality of services in the plurality of service pairs based on the association relationship of the services in each of the plurality of service pairs.
In the service classification method provided by the embodiment of the application, the service classification device may determine whether the services in each service pair are associated based on the interval corresponding to each service pair, so as to obtain the association relationship of the services in each service pair; and then classifying the plurality of services in the plurality of service pairs based on the association relationship of the services in each service pair. Therefore, classification of the services is realized, and management of the Internet system can be realized based on the classification result of the services.
Optionally, the determining whether the service in each service pair is associated based on the interval corresponding to each service pair includes: determining a set of intervals of the intervals corresponding to each service pair which are smaller than a first time threshold; determining a frequency of each interval in the set of intervals; determining q intervals of which the frequency exceeds a frequency threshold in the group of intervals, wherein q is more than or equal to 1; determining a maximum interval and a minimum interval of the q intervals; determining that two services in each service pair are associated when a difference between the maximum interval and the minimum interval is less than a second time threshold.
It can be seen that there is an association between two services only if their start times are relatively similar. In the embodiment of the application, whether the two services are related or not is judged through the first time threshold, the second time threshold and the frequency threshold. Alternatively, it may also be determined whether the two services are associated in other manners, for example, when the number of intervals smaller than the first time threshold in the interval corresponding to the service pair is greater than the number threshold, determining that the two services in the service pair are associated.
Optionally, the frequency threshold Y ═ E + k × a, where E denotes a median of the frequencies of the set of intervals, a denotes an absolute median difference of the frequencies of the set of intervals, and k denotes a constant. Of course, the frequency threshold may be other values, such as a value input by a worker into the service classification device.
Optionally, the classifying the plurality of services in the plurality of service pairs based on the association relationship between the services in each of the plurality of service pairs includes: determining p service groups in the plurality of services based on the incidence relation of the services in each service pair in the plurality of service pairs, wherein the service groups comprise at least two services which are correlated in pairs in the plurality of services, and p is more than or equal to 1; determining, based on the p service groups, an associated service set for each service of the plurality of services, the associated service set including: the service related to each service in the service group where each service is located; classifying the plurality of services based on similarity of associated service sets of the plurality of services.
Optionally, the determining p service groups in the plurality of services based on the association relationship between the services in each service pair in the plurality of service pairs includes: determining at least one associated service pair in the plurality of service pairs based on the association relationship of the services in each service pair in the plurality of service pairs, wherein two services in the associated service pair are associated; generating an undirected graph based on the plurality of services and the at least one associated pair of services, wherein the undirected graph comprises: the nodes are used for representing the corresponding services, and the edges are used for connecting the two nodes corresponding to the corresponding associated service pairs; determining p maximal cliques in the undirected graph, wherein the maximal cliques comprise a plurality of nodes connected by edges between every two nodes; and determining the p service groups which are in one-to-one correspondence with the p maximal groups, wherein the nodes in the maximal groups are in one-to-one correspondence with the services in the corresponding service groups.
Optionally, before the classifying the plurality of services based on the similarity of the associated service sets of the plurality of services, the method further comprises: and determining the similarity of two associated service sets based on the intersection ratio of the two associated service sets of any two services in the plurality of services. In the embodiment of the present application, the similarity is determined based on the intersection-to-parallel ratio as an example, and of course, other ways of determining the similarity may also be available, which are not limited in the embodiment of the present application.
Optionally, the classifying the plurality of services based on the similarity of the associated service sets of the plurality of services includes: generating a similarity matrix of the plurality of services based on the similarity of the associated service sets of the plurality of services, wherein the similarity matrix comprises m rows and m columns of elements, and the ith row and the jth column of elements are used for representing: the similarity between the associated service set of the ith service and the associated service set of the jth service in the plurality of services is more than or equal to 1 and less than or equal to m, and j is more than or equal to 1 and less than or equal to m; and clustering the similarity matrix to obtain the classification results of the services. Alternatively, the service classification device may cluster the similarity matrix using a spectral clustering algorithm (or other algorithm for clustering based on the similarity matrix, such as K-Means or DBSCAN, etc.).
Optionally, after the clustering the similarity matrix to obtain the classification results of the plurality of services, the method further includes: detecting whether an accuracy of classification results of the plurality of services is less than or equal to an accuracy threshold; and when the accuracy of the classification result is less than or equal to the accuracy threshold, repeatedly executing the process of clustering the similarity matrix to obtain the classification results of the plurality of services. Thus, through multiple verification, the accuracy of classification of multiple services can be improved.
Optionally, the determining an interval corresponding to each service pair in the plurality of service pairs includes: ordering the plurality of services; determining an interval corresponding to each service pair of the plurality of service pairs based on the ranking of the plurality of services. It can be seen that after a plurality of services are sequenced, when the interval corresponding to each service is determined, the start time of the data stream corresponding to the service arranged before is subtracted from the start time of the data stream corresponding to the service arranged after. In this way, it is possible to avoid a situation in which the determination interval is repeated by subtracting the start time of the data stream corresponding to the subsequent service from the start time of the data stream corresponding to the previous service, and subtracting the start time of the data stream corresponding to the previous service from the start time of the data stream corresponding to the subsequent service.
Optionally, the determining, based on the ranking of the plurality of services, an interval corresponding to each service pair in the plurality of service pairs includes: determining the interval T corresponding to each servicepq=Tp-Tq(ii) a Wherein the each service pair comprises the p-th of the plurality of servicesService and the qth service, p > q ≧ 1, and TpRepresents: the start time of any corresponding data stream of the p-th service in a data stream group corresponding to each service pair, and the TqIndicating the start time of any corresponding data stream of the q-th service in the one data stream group.
Optionally, the method further comprises: determining a set of clients corresponding to each service in the plurality of services, wherein the set of clients corresponding to each service comprises: a client corresponding to the data stream corresponding to each service; classifying the plurality of services based on the similarity of the client sets corresponding to the plurality of services; after classifying the plurality of services in the plurality of service pairs based on the incidence relation of the services in each service pair, adjusting the incidence relation of the services in each service pair based on the incidence relation of the services in each service pair according to the result of classifying the plurality of services based on the similarity of the client set, and classifying the plurality of services.
For example, it is assumed that, in the result of classifying the services based on the similarity of the client set, two of the services are classified into the same class, and in the result of classifying the services based on the association relationship between the services in each of the service pairs, the two services are not classified into the same class. The service classification device may adjust the result of classifying the services based on the association relationship of the services in each of the plurality of service pairs so that the two services are classified into the same class.
Optionally, the method further comprises: acquiring a domain name corresponding to each service in the plurality of services; classifying the plurality of services based on the similarity of domain names corresponding to the plurality of services; after classifying the plurality of services in the plurality of service pairs based on the association relationship of the services in each of the plurality of service pairs, adjusting the association relationship of the services in each of the plurality of service pairs based on the result of classifying the plurality of services based on the similarity of the domain names corresponding to the plurality of services, and classifying the result of classifying the plurality of services.
For example, it is assumed that, in the result of classifying services based on the similarity of domain names, two services are classified into the same class, and in the result of classifying services based on the association relationship between the services in each of the plurality of service pairs, the two services are not classified into the same class. The service classification device may adjust the result of classifying the services based on the association relationship of the services in each of the plurality of service pairs so that the two services are classified into the same class.
Therefore, the service classification device can classify the services based on a plurality of classification methods and mutually reference the classification results of the methods, so that the classification results of a plurality of services are more accurate.
In a second aspect, a service classification apparatus is provided, which has a function of implementing the behavior of the service classification method in the first aspect. The service classification device comprises at least one module, and the at least one module is used for implementing the service classification method provided by the first aspect.
In a third aspect, a service classification apparatus is provided, which includes: at least one processor, at least one interface, a memory, and at least one communication bus, the processor being configured to execute a program stored in the memory to implement the service classification method of the first aspect.
In a fourth aspect, a computer-readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, implements the service classification method of the first aspect.
In a fifth aspect, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of service classification of the first aspect described above.
In a sixth aspect, an internet system is provided, which includes a service classification apparatus, a plurality of servers, and a plurality of clients; the service classification device is the service classification device of the second aspect or the third aspect.
Drawings
Fig. 1 is a schematic structural diagram of an internet system according to an embodiment of the present application;
fig. 2 is a flowchart of a service classification method according to an embodiment of the present application;
fig. 3 is a flowchart of a method for classifying a plurality of services in a plurality of service pairs according to an embodiment of the present application;
fig. 4 is a schematic diagram of an undirected graph provided in an embodiment of the present application;
fig. 5 is a block diagram of a service classification apparatus according to an embodiment of the present application;
fig. 6 is a block diagram of a first classification module according to an embodiment of the present application;
fig. 7 is a block diagram of another service classification apparatus provided in the embodiments of the present application;
FIG. 8 is a block diagram of a third determining module provided in an embodiment of the present application;
fig. 9 is a schematic structural diagram of a service classification apparatus according to an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
Fig. 1 is a schematic structural diagram of an internet system according to an embodiment of the present application, and referring to fig. 1, the internet system 10 includes a forwarding device 101, and a plurality of clients 102 (e.g., 3 clients are shown in fig. 1) and a plurality of servers 103 (e.g., 3 clients are shown in fig. 1) connected to the forwarding device 101. Client 102 is communicatively coupled to server 103 via forwarding device 101.
The server 103 can provide at least one service to the client 102 through the forwarding device, and the server 103 can distinguish the service provided by the server 103 through an Internet Protocol (IP) address and a service port (port) of the server 103. The application running in the client 102 needs to implement its function based on multiple services, and the multiple services may be provided by one server 103 or multiple servers 103, which is not limited in this embodiment of the present application. Alternatively, the forwarding device 101 may be a device capable of forwarding a data flow, such as a switch or a router. Illustratively, the forwarding device 101 may include a network processor, through which the forwarding device 101 implements a communication connection between the client 102 and the server 103.
When the service end provides service for the application running in the client, the data stream related to the service can be transmitted between the service port and the client through the forwarding device. The data stream is used for indicating the IP address of the client, the client port, the IP address of the server and the service port corresponding to the data stream, wherein the IP address of the server and the service port correspond to one service. When forwarding device 101 receives a data stream sent by client 102, forwarding device 101 forwards the data stream to a service port indicated by the data stream; when forwarding device 101 receives a data stream sent by server 103, forwarding device 101 forwards the data stream to a client indicated by the data stream. One or more services may correspond to an application. In order to manage an internet system, a Deep Packet Inspection (DPI) technology for identifying a type of an application corresponding to a data stream is provided. The forwarding device maintains an application characteristic database in which a corresponding relationship between application characteristic information and application types is stored. When the type of the application is identified through the DPI technology, the forwarding device searches the application characteristic database according to the application characteristic information carried in the received data flow so as to identify the application corresponding to the data flow. However, since the DPI technology requires the application feature database, when a new type of application appears, a developer is required to find application feature information of the new type of application and update the application feature database. The process of determining the type of the application corresponding to the data flow through the DPI technology is complex and has poor real-time performance. Moreover, the DPI technology cannot classify the services provided by the service end.
The embodiment of the application provides a method for classifying services. Fig. 2 is a flowchart of a service classification method provided in an embodiment of the present application, where the method may be applied to a service classification device, and in the embodiment of the present application, the service classification device is, for example, the forwarding device 101 shown in fig. 1. Referring to fig. 2, the method may include:
step 201, receiving a plurality of data streams.
Each data stream may correspond to a client and a service, where the data stream is used for transmission to the corresponding client, or is a data stream sent by the corresponding client, and the data stream is a data stream of the corresponding service. The plurality of data streams may correspond to a plurality of clients and a plurality of services.
Step 202, determining a start time of each data stream in the plurality of data streams, where the start time of each data stream is a time when the forwarding device receives the data stream.
The forwarding device needs to determine the start times of these data streams separately. Illustratively, each time a data stream is received by a forwarding device, the forwarding device generates a start time for the data stream. And, the forwarding device stores a flow table, and the forwarding device can record the five-tuple carried by the data flow and the start time of the data flow in the flow table. When determining the start time of each of the multiple data flows, the forwarding device may directly query the flow table for the start time corresponding to the five-tuple of the data flow.
For example, the start times of the multiple data streams determined by the forwarding device may be as shown in table 1. The forwarding device may indicate a data flow by an IP address of a client corresponding to the data flow and an identifier of a service, where the identifier of the service may include: the IP address of the service end where the service is located, and the identification of the service port providing the service by the service end. For example, the start time of the data stream corresponding to the client having the IP address Src _ IP1 and the service Dst _ IP1_ port _1 is T1. Here, Dst _ IP1 is used to indicate an IP address of a service end where the service is located, and port _1 indicates a service port of the service.
TABLE 1
IP address of client Service Starting time
Src_IP1 Dst_IP1_port_1 T1
Src_IP1 Dst_IP1_port_2 T2
Src_IP1 Dst_IP1_port_3 T3
Src_IP1 Dst_IP1_port_n Tn
It should be noted that the services corresponding to the multiple data streams may be located on the same server or different servers, and the clients corresponding to the multiple data streams may be the same client or different clients. In table 1, only the services corresponding to the multiple data streams are located at the same server, and the clients corresponding to the multiple data streams are the same client. In addition, there may be data streams with the same corresponding clients and services in the multiple data streams, and table 1 only exemplifies that the services corresponding to the multiple data streams are different.
Step 203, determining a service corresponding to each data flow in the plurality of data flows.
Step 204, determining a client corresponding to each data stream in the plurality of data streams.
For example, the forwarding device may query the flow tables to determine the service and client corresponding to each data flow.
Step 205, determining n data stream groups in the plurality of data streams, where n is greater than or equal to 1, each data stream group in the n data stream groups corresponds to one client and at least two services, and different data stream groups correspond to different clients.
After determining the service and the client corresponding to each data flow, the forwarding device may group a plurality of data flows based on the client corresponding to the data flow to obtain a group of data flows corresponding to each client, where each data flow in the group of data flows corresponds to the client. And the clients corresponding to different data stream groups are different. For example, a group of data flows in the plurality of data flows may include the plurality of data flows embodied in table 1, and the clients corresponding to the group of data flows are all the clients with the network address Src _ IP1 in table 1.
Step 206, determining a service pair corresponding to each data stream group in the n data stream groups to obtain a plurality of service pairs.
Each service pair corresponding to each data flow group comprises: any two services corresponding to each data stream group. After determining the n data stream groups, the forwarding device may determine that any two services in the services form a service pair based on the service corresponding to each data stream group in the n data stream groups.
For example, in the set of data flows represented in table 1, service Dst _ IP1_ port _1 and service Dst _ IP1_ port2 form a service pair, service Dst _ IP1_ port _1 and service Dst _ IP1_ port 3 form a service pair, and service Dst _ IP1_ port _2 and service Dst _ IP1_ port 3 form a service pair.
After determining the service pairs corresponding to all the data stream groups, the forwarding device may obtain a plurality of service pairs. The plurality of service pairs comprises a union of service pairs corresponding to the n data stream groups.
Step 207, determining the interval corresponding to each service in the plurality of service pairs.
After obtaining the plurality of service pairs, the forwarding device may determine an interval corresponding to each service pair in the plurality of service pairs. Wherein, the interval corresponding to each service includes: the interval between the start times of the corresponding data streams of the two services in the same data stream group in the service pair.
For example, as shown in table 2, assuming that the service Dst _ IP1_ port _1 and the service Dst _ IP1_ port _2 are a service pair, the start time of the data stream corresponding to the service Dst _ IP1_ port _1 in a certain data stream group is T2, and the start time of the data stream corresponding to the service Dst _ IP1_ port _2 in the data stream group is T1, the interval corresponding to the service pair consisting of the service Dst _ IP1_ port _1 and the service Dst _ IP1_ port _2 includes T2-T1. Moreover, for convenience of calculation, all the obtained intervals may be rounded to the minimum time unit (e.g., 1 second), and at this time, the service Dst _ IP1_ port _1 and the service Dst _ IP1_ port _2 may be represented as (T2-T1)//1, that is, (T2-T1) rounded to 1 second.
TABLE 2
Service Service Spacer
Dst_IP1_port_1 Dst_IP1_port_2 (T2–T1)//1
Dst_IP1_port_1 Dst_IP1_port_3 (T3–T1)//1
Dst_IP1_port_2 Dst_IP1_port_3 (T3–T2)//1
Optionally, the forwarding device may also first sort the plurality of services in any order, for example, according to the identifiers of the services. The forwarding device may then determine an interval for each respective service based on the ranking of the plurality of services. Illustratively, the service corresponds to an interval Tpq=Tp-TqThe service pair comprises a p-th service and a q-th service in a plurality of services, p > q ≧ 1, TpRepresents: the start time, T, of the p-th service in any data stream corresponding to the data stream group corresponding to the service pairqIndicating the start time of any data stream corresponding to the qth service in the one data stream group.
It can be seen that after a plurality of services are sequenced, when the interval corresponding to each service is determined, the start time of the data stream corresponding to the service arranged before is subtracted from the start time of the data stream corresponding to the service arranged after. In this way, it is possible to avoid a situation in which the determination interval is repeated by subtracting the start time of the data stream corresponding to the subsequent service from the start time of the data stream corresponding to the previous service, and subtracting the start time of the data stream corresponding to the previous service from the start time of the data stream corresponding to the subsequent service.
And step 208, determining whether the services in each service pair are associated or not based on the interval corresponding to each service pair, so as to obtain the association relationship of the services in each service pair.
The forwarding device, in determining whether two services in each service pair are associated, may first determine a set of intervals that are less than a first time threshold among the intervals to which the service pair corresponds. The forwarding device then also needs to determine the frequency of each interval in the set of intervals.
It should be noted that the data stream corresponding to each service may not be unique, so the interval of the start time of the data stream corresponding to each service may not be unique, and some intervals may be the same. As shown in table 3, for a service pair consisting of the service Dst _ IP1_ port _1 and the service Dst _ IP1_ port _2, the interval of the start time of the data streams corresponding to the two services includes: five intervals of-4 seconds, -2 seconds, 0 seconds, 1 second and 2 seconds. Wherein the-4 second interval occurs 1 time (frequency 1), the 2 second interval occurs 10 times (frequency 10), the 0 second interval occurs 100 times (frequency 100), the 1 second interval occurs 100 times (frequency 100), and the 2 second interval occurs 1 time (frequency 1).
TABLE 3
Service pair Spacer Frequency of
<Dst_IP1_port_1,Dst_IP1_port_2> -4 1
<Dst_IP1_port_1,Dst_IP1_port_2> -2 10
<Dst_IP1_port_1,Dst_IP1_port_2> 0 100
<Dst_IP1_port_1,Dst_IP1_port_2> 1 100
<Dst_IP1_port_1,Dst_IP1_port_2> 2 1
After determining the frequency of each interval in the group of intervals, the forwarding device may find q intervals in the group of intervals, where the frequency exceeds a frequency threshold, and q is greater than or equal to 1. Then, the forwarding device further needs to determine a maximum interval and a minimum interval in the q intervals, and when a difference between the maximum interval and the minimum interval is smaller than a second time threshold, it indicates that the similarity between the start times of the two services in the service pair is high, and at this time, the forwarding device determines that the two services in the service pair are associated.
Illustratively, the first time threshold may be 30 seconds, 20 seconds, etc., and the second time threshold may be 5 seconds, 4 seconds, etc. Optionally, the frequency threshold may be set by a worker on the forwarding device, or may be calculated by the forwarding device itself. Optionally, when the frequency threshold is obtained by the forwarding device by itself, the frequency threshold Y is E + k × a, E represents a median of the frequencies of the group of intervals, a represents an absolute median (or variance, in this embodiment, a is taken as an example of the median) of the frequencies of the group of intervals, and k represents a constant. Illustratively, k may be a constant of 3, 4, or 5, etc. The absolute median difference is equal to the median in the result of the difference between each interval in the set of intervals and the median E.
Assuming that the first time threshold is 30 seconds, the second time threshold is 5 seconds, and k is 3, for a service pair consisting of the service Dst _ IP1_ port _1 and the service Dst _ IP1_ port _2 shown in table 3, the interval of the start time of the data streams corresponding to these two services includes: five intervals of-4 seconds, -2 seconds, 0 seconds, 1 second and 2 seconds. These five intervals are each less than the first time threshold for 30 seconds. Based on the frequency of these five intervals, the frequency threshold is 50.03 because E is 10 and a is 13.34. The first interval of the frequency of these five intervals is 1 second, and the second interval is 0 second. The difference between the first interval and the second interval is 1 second, which is less than the second time threshold of 5 seconds. Therefore, the forwarding device may determine that the service Dst _ IP1_ port _1 is associated with service Dst _ IP1_ port _ 2.
It can be seen that there is an association between two services only if their start times are relatively similar. In the embodiment of the application, whether the two services are related or not is judged through the first time threshold, the second time threshold and the frequency threshold. Alternatively, it may also be determined whether the two services are associated in other manners, for example, when the number of intervals smaller than the first time threshold in the interval corresponding to the service pair is greater than the number threshold, determining that the two services in the service pair are associated.
The forwarding device may determine whether the services in each of the plurality of service pairs are associated using the method in step 208 to obtain an association relationship between the services in each service pair.
Step 209 classifies the plurality of services in the plurality of service pairs based on the association of the services in each of the plurality of service pairs.
After determining the association relationship of the services in each service pair, the forwarding device may classify the services in the plurality of service pairs according to the association relationship. Illustratively, as shown in fig. 3, step 209 may include:
step 2091, determining p service groups in the plurality of services based on the association relationship of the services in each service pair in the plurality of service pairs, wherein the service groups include at least two services associated between each two of the plurality of services, and p is greater than or equal to 1.
For example, the forwarding device may first determine at least one associated service pair of the plurality of service pairs, two of the associated service pairs being associated, based on the association relationship of the services in each of the plurality of service pairs. Of course, there may be other service pairs than the associated service pair for the plurality of service pairs, and two of the other service pairs are not associated.
Thereafter, the forwarding device may generate an undirected graph based on the plurality of services and the at least one associated pair of services, wherein the undirected graph comprises: the node is used for representing the corresponding service, and the edge is used for connecting two nodes corresponding to the corresponding associated service pair. By way of example, assume that the plurality of services includes services 1-5, and services 1 and 3 constitute a first respective associated service pair, services 2 and 3 constitute a second associated service pair, services 1 and 2 constitute a third associated service pair, and services 4 and 5 constitute a fourth associated service pair. At this time, an undirected graph generated based on the five services and four associated service pairs thereof may be as shown in fig. 4. Referring to FIG. 4, the undirected graph includes nodes 1.1-1.5 and edges 2.1-2.4. The node 1, x corresponds to the service x, the edge 2, y corresponds to the y-th associated service pair, x is more than or equal to 1 and less than or equal to 5, and y is more than or equal to 1 and less than or equal to 4. Thus, node 1.1 and node 1.3 are connected by edge 2.1, node 1.2 and node 1.3 are connected by edge 2.2, node 1.1 and node 1.2 are connected by edge 2.3, and node 1.4 and node 1.5 are connected by edge 2.4.
After the undirected graph is obtained, the forwarding device can determine p maximal cliques in the undirected graph, wherein the maximal cliques comprise a plurality of nodes connected by edges between every two nodes. Wherein the intersection of the p maximal cliques is zero. The forwarding device may determine the clique in the undirected graph based on any method, such as Bron-Kerbosch (a method for determining cliques), and the like. For example, the forwarding device determining the maximal clique in the undirected graph shown in fig. 4 may comprise: 3.1-3.2 of a very large group. The maximum group 3.1 comprises nodes 1.1-1.3, and the maximum group 3.2 comprises nodes 1.4-1.5.
After determining at least one maximal clique, the forwarding device may determine p service cliques corresponding to the p maximal cliques one to one based on the p maximal cliques, and nodes in the maximal cliques correspond to services in the corresponding service cliques one to one.
Step 2092, determining an associated service set for each of the plurality of services based on the p service groups, the associated service set including: and the service associated with each service in the service group in which each service is located.
After determining the p service cliques, the forwarding device may determine an associated service set for each service in the service clique based on the previously determined at least one associated service pair.
For example, in the service group (including: services 1-3) corresponding to the maximal group 3.1 in fig. 4, the associated service set of service 1 is { service 2 and service 3}, the associated service set of service 2 is { service 2 and service 3}, and the associated service set of service 3 is { service 1 and service 2 }.
Step 2093 determines the similarity between two associated service sets of any two services of the plurality of services based on the cross-over ratio of the two associated service sets.
After determining the associated service set of each service, the forwarding device needs to determine the similarity between any two associated service sets. In the embodiment of the present application, the similarity is determined based on the intersection-to-parallel ratio as an example, and of course, other ways of determining the similarity may also be available, which are not limited in the embodiment of the present application.
And the intersection ratio of every two associated service sets refers to the ratio of the number of intersection elements to the number of union elements of the two associated service sets. Illustratively, the intersection of the associated service set m1{ c1, c2, c3, c4, c5, c6} and the associated service set m2{ c2, c3, c4, c5, c6, c7} is { c2, c3, c4, c5, c6} and is { c1, c2, c3, c4, c5, c6, c7 }. The number of the intersection elements is 5, and the number of the union elements is 7, then the intersection ratio of the associated service set m1 and the associated service set m2 is 5/7 ≈ 71%.
Optionally, the forwarding device may directly use the intersection ratio of the two associated service sets as the similarity of the two associated service sets; or, the intersection ratio of the two associated service sets may be calculated, and the result after calculation may be used as the similarity of the two associated service sets. For example, the calculation process may include multiplying the intersection ratio by a preset intersection ratio coefficient, and the like, which is not limited in this embodiment of the application.
Step 2094 classifies the plurality of services based on the similarity of the associated service sets of the plurality of services.
Optionally, a similarity threshold may be preset, and when the similarity of the associated service sets of the two services is greater than the similarity threshold, the forwarding device may classify the two services into the same class.
Still alternatively, the forwarding device may cluster the similarity of the associated service sets of every two services in the service group through a clustering algorithm. At this time, in step 2094, the forwarding device may first generate a similarity matrix of the multiple services based on the similarity of the associated service sets of the multiple services, and then cluster the similarity matrix to obtain a classification result of the multiple services. Wherein, the similarity matrix comprises m rows and m columns of elements, and the ith row and the jth column of elements are used for representing: the similarity between the associated service set of the ith service and the associated service set of the jth service in the plurality of services is more than or equal to 1 and less than or equal to m, and j is more than or equal to 1 and less than or equal to m;
by way of example, assume that the intersection of the associated service sets of any two services in the determined service group is as shown in table 4. Wherein the service group comprises: service Dst _ IP1_ port _1, service Dst _ IP1_ port _2, and service Dst _ IP1_ port _ 3. The associated service Set of the service Dst _ IP1_ port _1 is denoted Set1The associated service Set of the service Dst _ IP1_ port _2 is denoted Set2The associated service Set of the service Dst _ IP1_ port _3 is denoted Set3. The similarity (e.g., equal to cross-over ratio) between the associated service Set of the service Dst _ IP1_ port _1 and the associated service Set of the service Dst _ IP1_ port _2 is represented as IoU (Set)1,Set2) The similarity (e.g., equal to cross-over ratio) between the associated service Set of the service Dst _ IP1_ port _1 and the associated service Set of the service Dst _ IP1_ port _3 is IoU (Set)1,Set3) The similarity (e.g., equal to cross-over ratio) between the associated service Set of the service Dst _ IP1_ port _2 and the associated service Set of the service Dst _ IP1_ port _3 is IoU (Set)2,Set3). Based on the similarity shown in table 4, it may be determined that the similarity matrix may be:
Figure BDA0002197549740000091
TABLE 4
Dst_IP1_port_1 Dst_IP1_port_2 Dst_IP1_port_3
Dst_IP1_port_1 1 IoU(Set1,Set2) IoU(Set1,Set3)
Dst_IP1_port_2 IoU(Set1,Set2) 1 IoU(Set2,Set3)
Dst_IP1_port_3 IoU(Set1,Set3) IoU(Set2,Set3) 1
Alternatively, the forwarding device may cluster the similarity matrix using a spectral clustering algorithm (or other algorithm for clustering based on the similarity matrix, such as K-Means or DBSCAN, etc.).
To sum up, in the service classification method provided in the embodiment of the present application, the service classification device may determine n data stream groups in the multiple data streams, and determine a service pair corresponding to each data stream group in the n data stream groups, to obtain multiple service pairs; and then, determining whether the services in each service pair are associated or not based on the interval corresponding to each service pair to obtain the association relationship of the services in each service pair, and classifying a plurality of services in a plurality of service pairs based on the association relationship of the services in each service pair. Therefore, classification of the services is realized, and management of the Internet system can be realized based on the classification result of the services.
Optionally, after classifying the service, the forwarding device may classify the data stream corresponding to the service based on a classification result of the service; the types of the data streams corresponding to the same type of service are the same, and the types of the data streams corresponding to different types of services are different. For example, assume that service IP1+ port 1 belongs to the same class as service IP1+ port 3, and service IP1+ port2 alone belongs to the same class. Further, the forwarding device may classify the data stream corresponding to the service IP1+ port 1 and the data stream corresponding to the service IP1+ port 3 into the same class, and classify the data stream corresponding to the service IP1+ port2 into one class, thereby implementing classification of the data streams corresponding to the services in the service group. After classifying the services corresponding to the services in all the service groups, the classification of all the data streams can be realized.
It should be noted that the similarity of the services classified into one category is often high, and the services are often corresponding to the same application. Moreover, the start times of the data streams of the services based on the same application are often closer, and the similarity is higher. It can be seen that the process of classification is based on the similarity of the start times of the data streams corresponding to the services. Optionally, the manner of classifying based on the similarity of the start times of the data streams corresponding to the services may also be different from the manner provided in the embodiment of the present application, for example, the similarity of the start times of the data streams of the services may be directly calculated, and whether the two services are similar or not may be determined by using a similarity threshold, so as to classify the similar services into one class.
Optionally, it is assumed that in step 2094, the forwarding device clusters the similarity matrix to obtain a classification result of the multiple services. After classifying the plurality of services, the forwarding device may further verify the classification results of the plurality of services to determine the accuracy of the classification results of the plurality of services. For example, after the step 2094, the forwarding device may further detect whether the classification results of the multiple services satisfy the classification constraint condition, and repeatedly perform the process of clustering the similarity matrix in the step 2094 when the classification results of the multiple services do not satisfy the classification constraint condition; and when the classification results of the plurality of services meet the classification constraint conditions, ending the process of classifying the plurality of services. Thus, through multiple verification, the accuracy of classification of multiple services can be improved.
By way of example, the classification constraints may include: the accuracy of the classification results for the plurality of services is greater than an accuracy threshold. Optionally, when detecting whether the classification result satisfies the classification constraint condition, the forwarding device may evaluate the classification result by using a contour coefficient index to obtain an evaluation index of the classification result, where the evaluation index may indicate accuracy of the classification result. For example, when the profile coefficient index is used to evaluate the classification result, the forwarding device may first generate a degree matrix according to the similarity matrix. And inputting the classification result and the degree matrix into the contour coefficient index, and outputting an evaluation index for representing the quality of the classification result, wherein the evaluation index is a numerical value generally in an interval of [ -1, 1 ]. The larger the value of the evaluation index is, the higher the accuracy of the classification result is. When the value of the evaluation index is greater than the accuracy threshold, the accuracy of the classification result is high, and the forwarding device determines that the classification result meets the classification constraint condition; when the value of the evaluation index is not greater than the accuracy threshold, indicating that the accuracy of the classification result is low, the forwarding device determines that the classification result does not satisfy the classification constraint condition.
For example, assuming that the accuracy threshold is 0.8, the forwarding device may generate a degree matrix according to the similarity matrix in step 2094, and input the obtained classification result {0, 0, 0, 1, 0, 1, 2} and the degree matrix into the profile coefficient index. When the output evaluation index is 0.5, the forwarding equipment determines that the classification result meets the classification constraint condition; when the output evaluation index is 0.98, the forwarding device determines that the classification result does not satisfy the classification constraint condition.
For example, when the forwarding device needs to perform the process of clustering the similarity matrix in step 2094 again to obtain the classification results of multiple services, the number of the preset types may be changed to output the classification results corresponding to different numbers of the preset types. For example, it is assumed that when the classification result {0, 0, 0, 1, 0, 1, 2} obtained in step 2094 does not satisfy the classification constraint condition, the preset number of types may be changed from 3 to 2, and then clustering is performed by the spectral clustering algorithm.
The service classification method provided by the embodiment of the application can realize the classification of the data streams without maintaining a database, simplifies the service classification process and improves the real-time performance of the service classification. At present, more and more applications are enterprise private applications, and when a client running in such a private application communicates with a server through forwarding equipment, a data stream received by the forwarding equipment is usually encrypted, and an IP address of the client, an IP address of the server and a port corresponding to the data stream can only be obtained from the data stream. When the type of the application is identified by the DPI technology, the forwarding device needs to obtain application characteristic information from the data stream. For the encrypted data stream, the forwarding device cannot acquire the application characteristic information, and further cannot identify the type of the application, so that the management of the internet system cannot be realized. The classification of the data streams can be realized only by the IP address of the client, the IP address of the server and the port, so that the service classification method provided by the embodiment of the application can realize accurate classification of the encrypted data streams, has a wide application range, and can realize management of an internet system when the data streams are encrypted.
In this embodiment of the present application, the forwarding device classifies a plurality of services through steps 201 to 209, and optionally, after classifying the plurality of services, the forwarding device may further update the plurality of data streams in step 201. Then, based on the updated data streams, steps 201 to 209 are repeatedly executed to obtain classification results of the services corresponding to the updated data streams. It should be noted that the plurality of updated data streams may include at least part of the data streams before updating, or may not include the data streams before updating, which is not limited in this embodiment of the application. In one embodiment, the updating of the plurality of data streams refers to updating information of the recorded plurality of data streams. For example, information of a data stream that has been already recorded is deleted, and information of a data stream that is newly received is recorded. The information of a data flow includes an identification of a corresponding client and an identification of a service for the data flow.
After classifying the multiple services, the forwarding device may further identify the types of the services corresponding to the subsequently received data streams to be forwarded based on the classification of the multiple services. Then, the forwarding device may manage the internet system based on the types of the services corresponding to the multiple data streams, for example, limit the speed of the data stream corresponding to a certain type of service, or increase the transmission bandwidth of the data stream corresponding to a certain type of service.
Optionally, in the foregoing embodiment, a service classification device that executes the service classification method is taken as an example of a forwarding device. As an example, the service classification device may not be a forwarding device. At this time, in step 201, the service classification device does not receive multiple data streams, but obtains a data stream to be forwarded by the forwarding device. As another example, the service classification device may also include multiple devices, for example, the service classification device may include the forwarding device and the auxiliary device, in which case, a part of the steps 201 and 209 may be performed by the forwarding device, and another part of the steps may be performed by the auxiliary device.
Optionally, the forwarding device may also classify the plurality of services based on other means. For example, the forwarding device may classify services based on similarity of the set of clients. For example, the forwarding device may determine a set of clients corresponding to each service in the plurality of services, where the set of clients corresponding to each service includes: a client corresponding to the data stream corresponding to each service; the forwarding device may then classify the plurality of services based on the similarity of the set of clients to which the plurality of services correspond. As another example, the forwarding device may classify the plurality of services based on similarity of domain names. For example, the forwarding device may obtain a domain name corresponding to each of the plurality of services, and then classify the plurality of services based on similarity of the domain names corresponding to the plurality of services.
After the forwarding device performs step 209, the forwarding device may also adjust the classification results of the multiple services in step 209 in combination with other classification results.
In one aspect, the forwarding device may adjust the classification result obtained in step 209 based on the result of classifying the plurality of services based on the similarity of the set of clients. For example, assume that two services are classified into the same class in the classification result of the plurality of services based on the similarity of the client set, and the two services are not classified into the same class in the classification result of step 209. The forwarding device may adjust the classification results of step 209 so that the two services are classified into the same class.
On the other hand, the forwarding device may adjust the classification result of step 209 according to the result of classifying the plurality of services based on the similarity of the domain names. For example, it is assumed that in the result of classifying the services based on the similarity of the domain names, two services are classified into the same class, and in the classification result of step 209, the two services are not classified into the same class. The forwarding device may adjust the classification results of step 209 so that the two services are classified into the same class.
In yet another aspect, the service classification device adjusts the classification result of step 209 based on the results of classifying the plurality of services based on the similarity of the set of clients and the results of classifying the plurality of services based on the similarity of the domain name. For example, assume that, in the result of classifying a plurality of services based on the similarity of the set of clients, some two services are classified into the same class; the two services are classified into one class in the classification result for the plurality of services based on the similarity of the domain names, and the two services are not classified into the same class in the classification result of step 209. The forwarding device may adjust the classification results of step 209 so that the two services are classified into the same class. For another example, it is assumed that two services are classified into the same class in the result of classifying the services based on the similarity of the client set, the two services are not classified into the same class in the result of classifying the services based on the similarity of the domain name, and the two services are not classified into the same class in the classification result of step 209. The forwarding device may not need to adjust the classification result of step 209.
The sequence of the method provided by the embodiment of the application can be properly adjusted, and the steps can be correspondingly increased or decreased based on the situation. Any method that can be easily conceived by a person skilled in the art within the technical scope disclosed in the present application is covered by the protection scope of the present application, and thus the detailed description thereof is omitted.
The service classification method provided by the embodiment of the present application is introduced above, and the service classification device provided by the embodiment of the present application is introduced below.
Fig. 5 is a block diagram of a service classification apparatus according to an embodiment of the present application, and referring to fig. 5, the service classification apparatus 500 includes:
a first determining module 501, configured to determine n data stream groups in a plurality of data streams, where n is greater than or equal to 1, where each data stream group in the n data stream groups corresponds to one client and at least two services, and different data stream groups correspond to different clients;
a second determining module 502, configured to determine a service pair corresponding to each data stream group in the n data stream groups, to obtain a plurality of service pairs, where each service pair corresponding to each data stream group includes: any two services corresponding to each data stream group;
a third determining module 503, configured to determine an interval corresponding to each service in the plurality of service pairs, where the interval corresponding to each service includes: the interval of the start time of the data stream corresponding to two services in the same data stream group in each service pair is the time when the forwarding device receives the data stream;
a fourth determining module 504, configured to determine whether the services in each service pair are associated based on the interval corresponding to each service pair, so as to obtain an association relationship between the services in each service pair;
a first classification module 505, configured to classify a plurality of services in the plurality of service pairs based on the association relationship between the services in each service pair in the plurality of service pairs.
To sum up, in the service classification apparatus provided in this embodiment of the present application, the first determining module may determine n data stream groups in the multiple data streams, and the second determining module may determine a service pair corresponding to each data stream group in the n data stream groups to obtain multiple service pairs; then, the fourth determining module may determine whether the services in each service pair are associated based on the interval corresponding to each service pair to obtain an association relationship between the services in each service pair, and the first classifying module may classify a plurality of services in the plurality of service pairs based on the association relationship between the services in each service pair in the plurality of service pairs. Therefore, classification of the services is realized, and management of the Internet system can be realized based on the classification result of the services.
Optionally, the fourth determining module 504 is configured to: determining a set of intervals of the intervals corresponding to each service pair which are smaller than a first time threshold; determining a frequency of each interval in the set of intervals; determining q intervals of which the frequency exceeds a frequency threshold in the group of intervals, wherein q is more than or equal to 1; determining a maximum interval and a minimum interval of the q intervals; determining that two services in each service pair are associated when a difference between the maximum interval and the minimum interval is less than a second time threshold.
Optionally, the frequency threshold Y ═ E + k × a, where E denotes a median of the frequencies of the set of intervals, a denotes an absolute median difference of the frequencies of the set of intervals, and k denotes a constant.
Optionally, fig. 6 is a block diagram of a first classification module provided in an embodiment of the present application, and as shown in fig. 6, the first classification module 505 includes:
the first determining submodule 5051 is configured to determine p service groups in the multiple services based on an association relationship between services in each service pair in the multiple service pairs, where the service groups include at least two services associated with each other in the multiple services, and p is greater than or equal to 1;
a second determining submodule 5052, configured to determine, based on the p service groups, an associated service set for each service in the plurality of services, where the associated service set includes: the service related to each service in the service group where each service is located;
the classification sub-module 5053 is configured to classify the plurality of services based on similarity of associated service sets of the plurality of services.
Optionally, the first determination submodule 5051 is configured to:
determining at least one associated service pair in the plurality of service pairs based on the association relationship of the services in each service pair in the plurality of service pairs, wherein two services in the associated service pair are associated;
generating an undirected graph based on the plurality of services and the at least one associated pair of services, wherein the undirected graph comprises: the nodes are used for representing the corresponding services, and the edges are used for connecting the two nodes corresponding to the corresponding associated service pairs;
determining p maximal cliques in the undirected graph, wherein the maximal cliques comprise a plurality of nodes connected by edges between every two nodes;
and determining the p service groups which are in one-to-one correspondence with the p maximal groups, wherein the nodes in the maximal groups are in one-to-one correspondence with the services in the corresponding service groups.
Optionally, referring to fig. 7, fig. 7 is a block diagram of another service classification apparatus provided in the embodiment of the present application, and on the basis of fig. 5, the service classification apparatus 500 further includes:
a fifth determining module 506, configured to determine a similarity between two associated service sets of any two services in the plurality of services based on a cross-over ratio between the two associated service sets.
Optionally, the classification sub-module 5053 is configured to:
generating a similarity matrix of the plurality of services based on the similarity of the associated service sets of the plurality of services, wherein the similarity matrix comprises m rows and m columns of elements, and the ith row and the jth column of elements are used for representing: the similarity between the associated service set of the ith service and the associated service set of the jth service in the plurality of services is more than or equal to 1 and less than or equal to m, and j is more than or equal to 1 and less than or equal to m;
and clustering the similarity matrix to obtain the classification results of the services.
Optionally, as shown in fig. 7, the service classification apparatus 500 further includes:
a detecting module 507, configured to detect whether an accuracy of the classification results of the plurality of services is less than or equal to an accuracy threshold;
a repeating module 508, configured to repeat, when the accuracy of the classification result is less than or equal to the accuracy threshold, the process of clustering the similarity matrix to obtain the classification results of the multiple services.
Optionally, fig. 8 is a block diagram of a third determining module provided in an embodiment of the present application, and as shown in fig. 7, the third determining module 503 includes:
a ranking submodule 5031 for ranking the plurality of services;
a third determining sub-module 5032 configured to determine an interval corresponding to each service pair in the plurality of service pairs based on the ranking of the plurality of services.
Optionally, the third determining sub-module 5032 is configured to: determining the interval T corresponding to each servicepq=Tp-Tq(ii) a Wherein each service pair comprises the p-th service and the q-th service in the plurality of services, p > q ≧ 1, and TpRepresents: the start time of any corresponding data stream of the p-th service in a data stream group corresponding to each service pair, and the TqRepresents the firstq service start times of corresponding ones of the data streams in the one data stream group.
Optionally, as shown in fig. 7, the service classification apparatus 500 further includes:
a sixth determining module 509, configured to determine a set of clients corresponding to each of the plurality of services, where the set of clients corresponding to each service includes: a client corresponding to the data stream corresponding to each service;
a second classification module 510, configured to classify the multiple services based on similarity of client sets corresponding to the multiple services;
a first adjusting module 511, configured to, after classifying the plurality of services in the plurality of service pairs based on the association relationship between the services in each of the plurality of service pairs, adjust the association relationship between the services in each of the plurality of service pairs based on the result of classifying the plurality of services based on the similarity of the client set, and classify the result of classifying the plurality of services.
Optionally, as shown in fig. 7, the service classification apparatus 500 further includes:
an obtaining module 512, configured to obtain a domain name corresponding to each of the multiple services;
a third classification module 513, configured to classify the multiple services based on similarity of domain names corresponding to the multiple services;
a second adjusting module 514, configured to, after classifying the multiple services in the multiple service pairs based on the association relationship between the services in the multiple service pairs, adjust the association relationship between the services in the multiple service pairs based on the result of classifying the multiple services based on the similarity between the domain names corresponding to the multiple services, and classify the multiple services based on the result of classifying the multiple services.
To sum up, in the service classification apparatus provided in this embodiment of the present application, the first determining module may determine n data stream groups in the multiple data streams, and the second determining module may determine a service pair corresponding to each data stream group in the n data stream groups to obtain multiple service pairs; then, the fourth determining module may determine whether the services in each service pair are associated based on the interval corresponding to each service pair to obtain an association relationship between the services in each service pair, and the first classifying module may classify a plurality of services in the plurality of service pairs based on the association relationship between the services in each service pair in the plurality of service pairs. Therefore, classification of the services is realized, and management of the Internet system can be realized based on the classification result of the services.
The service classification device provided in the embodiment of the present application is introduced above, and possible product forms of the service classification device are introduced below. It should be understood that any form of product having the features of the service sorting apparatus described above with reference to fig. 5 or 7 falls within the scope of the present application. It should also be understood that the following description is only exemplary and not limiting to the product form of the service classification apparatus of the embodiments of the present application.
An embodiment of the present application provides a service classification apparatus, as shown in fig. 9, the service classification apparatus 600 includes: at least one processor 601 (one shown in fig. 9), at least one interface 602 (one shown in fig. 9), a memory 603, and at least one communication bus 604 (one shown in fig. 9). The processor 601 is configured to execute a program stored in the memory 603 to implement the service classification method according to the embodiment of the present application.
The processor 601 includes one or more processing cores, and the processor 601 executes various functional applications and data processing by running computer programs and units.
The memory 603 may be used for storing computer programs and units. In particular, the memory 603 may store an operating system and application program elements required for at least one function. The operating system may be a real time eXecutive (RTX) operating system, such as LINUX, UNIX, WINDOWS, or OSX.
The interface 602 may be multiple, the interface 602 being used to communicate with other storage devices or network devices. For example, in the present embodiment, the interface 602 may be used for transceiving data streams.
The memory 603 and the interface 602 are connected to the processor 601 via a communication bus 604.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product comprising one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device including one or more available media integrated servers, data centers, and the like. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium, or a semiconductor medium (e.g., solid state disk), among others.
The embodiment of the application provides an internet system which comprises a forwarding device, a plurality of clients and a plurality of servers, wherein the clients and the servers are connected with the forwarding device. The server can provide at least one service to the client through the forwarding device, and when the server provides the service to the application running in the client, data streams related to the service can be transmitted between the server and the client through the forwarding device. The forwarding device may be the service classification apparatus described in fig. 5, fig. 7, or fig. 9. The internet system may refer to the internet system shown in fig. 1, and details of the embodiment of the present application are not described herein.
In this application, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The term "plurality" means two or more unless expressly limited otherwise.
It should be noted that, the method embodiments and the apparatus embodiments provided in the embodiments of the present application can all be mutually referred to, and the embodiments of the present application do not limit this.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.
While the invention has been described with reference to specific embodiments, the scope of the invention is not limited thereto, and those skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (27)

1. A method for classifying services, the method comprising:
determining n data stream groups in a plurality of data streams, wherein n is more than or equal to 1, each data stream group in the n data stream groups corresponds to one client and at least two services, and different data stream groups correspond to different clients;
determining a service pair corresponding to each data stream group in the n data stream groups to obtain a plurality of service pairs, wherein each service pair corresponding to each data stream group comprises: any two services corresponding to each data stream group;
determining an interval corresponding to each service of the plurality of service pairs, the interval corresponding to each service comprising: the interval of the start time of the data stream corresponding to two services in the same data stream group in each service pair is the time when the forwarding device receives the data stream;
determining whether the services in each service pair are associated or not based on the interval corresponding to each service pair to obtain the association relationship of the services in each service pair;
classifying the plurality of services in the plurality of service pairs based on the association relationship of the services in each of the plurality of service pairs.
2. The method of claim 1, wherein the determining whether the service in each service pair is associated based on the interval corresponding to each service pair comprises:
determining a set of intervals of the intervals corresponding to each service pair which are smaller than a first time threshold;
determining a frequency of each interval in the set of intervals;
determining q intervals of which the frequency exceeds a frequency threshold in the group of intervals, wherein q is more than or equal to 1;
determining a maximum interval and a minimum interval of the q intervals;
determining that two services in each service pair are associated when a difference between the maximum interval and the minimum interval is less than a second time threshold.
3. The method according to claim 2, wherein the frequency threshold Y ═ E + k × a, where E denotes the median of the frequencies of the set of intervals, a denotes the difference in absolute median of the frequencies of the set of intervals, and k denotes a constant.
4. The method according to any one of claims 1 to 3, wherein classifying the plurality of services in the plurality of service pairs based on the association relationship of the services in each of the plurality of service pairs comprises:
determining p service groups in the plurality of services based on the incidence relation of the services in each service pair in the plurality of service pairs, wherein the service groups comprise at least two services which are correlated in pairs in the plurality of services, and p is more than or equal to 1;
determining, based on the p service groups, an associated service set for each service of the plurality of services, the associated service set including: the service related to each service in the service group where each service is located;
classifying the plurality of services based on similarity of associated service sets of the plurality of services.
5. The method of claim 4, wherein determining p service groups of the plurality of services based on the associations of the services in the respective service pairs comprises:
determining at least one associated service pair in the plurality of service pairs based on the association relationship of the services in each service pair in the plurality of service pairs, wherein two services in the associated service pair are associated;
generating an undirected graph based on the plurality of services and the at least one associated pair of services, wherein the undirected graph comprises: the nodes are used for representing the corresponding services, and the edges are used for connecting the two nodes corresponding to the corresponding associated service pairs;
determining p maximal cliques in the undirected graph, wherein the maximal cliques comprise a plurality of nodes connected by edges between every two nodes;
and determining the p service groups which are in one-to-one correspondence with the p maximal groups, wherein the nodes in the maximal groups are in one-to-one correspondence with the services in the corresponding service groups.
6. The method of claim 4 or 5, wherein before said classifying the plurality of services based on the similarity of the associated service sets of the plurality of services, the method further comprises:
and determining the similarity of two associated service sets based on the intersection ratio of the two associated service sets of any two services in the plurality of services.
7. The method of any of claims 4 to 6, wherein said classifying the plurality of services based on the similarity of the associated service sets of the plurality of services comprises:
generating a similarity matrix of the plurality of services based on the similarity of the associated service sets of the plurality of services, wherein the similarity matrix comprises m rows and m columns of elements, and the ith row and the jth column of elements are used for representing: the similarity between the associated service set of the ith service and the associated service set of the jth service in the plurality of services is more than or equal to 1 and less than or equal to m, and j is more than or equal to 1 and less than or equal to m;
and clustering the similarity matrix to obtain the classification results of the services.
8. The method of claim 7, wherein after clustering the similarity matrix to obtain the classification results of the plurality of services, the method further comprises:
detecting whether an accuracy of classification results of the plurality of services is less than or equal to an accuracy threshold;
and when the accuracy of the classification result is less than or equal to the accuracy threshold, repeatedly executing the process of clustering the similarity matrix to obtain the classification results of the plurality of services.
9. The method of any of claims 1 to 8, wherein the determining the interval for each of the plurality of service pairs comprises:
ordering the plurality of services;
determining an interval corresponding to each service pair of the plurality of service pairs based on the ranking of the plurality of services.
10. The method of claim 9, wherein the determining the interval for each service pair of the plurality of service pairs based on the ranking of the plurality of services comprises:
determining the interval T corresponding to each servicepq=Tp-Tq
Wherein each service pair comprises the p-th service and the q-th service in the plurality of services, and p > q ≧ 1Said T ispRepresents: the start time of any corresponding data stream of the p-th service in a data stream group corresponding to each service pair, and the TqIndicating the start time of any corresponding data stream of the q-th service in the one data stream group.
11. The method according to any one of claims 1 to 10, further comprising:
determining a set of clients corresponding to each service in the plurality of services, wherein the set of clients corresponding to each service comprises: a client corresponding to the data stream corresponding to each service;
classifying the plurality of services based on the similarity of the client sets corresponding to the plurality of services;
after classifying the plurality of services in the plurality of service pairs based on the incidence relation of the services in each service pair, adjusting the incidence relation of the services in each service pair based on the incidence relation of the services in each service pair according to the result of classifying the plurality of services based on the similarity of the client set, and classifying the plurality of services.
12. The method according to any one of claims 1 to 11, further comprising:
acquiring a domain name corresponding to each service in the plurality of services;
classifying the plurality of services based on the similarity of domain names corresponding to the plurality of services;
after classifying the plurality of services in the plurality of service pairs based on the association relationship of the services in each of the plurality of service pairs, adjusting the association relationship of the services in each of the plurality of service pairs based on the result of classifying the plurality of services based on the similarity of the domain names corresponding to the plurality of services, and classifying the result of classifying the plurality of services.
13. A service classification apparatus, characterized in that the service classification apparatus comprises:
the system comprises a first determining module, a second determining module and a third determining module, wherein the first determining module is used for determining n data stream groups in a plurality of data streams, n is larger than or equal to 1, each data stream group in the n data stream groups corresponds to one client and at least two services, and different data stream groups correspond to different clients;
a second determining module, configured to determine a service pair corresponding to each data stream group in the n data stream groups to obtain multiple service pairs, where each service pair corresponding to each data stream group includes: any two services corresponding to each data stream group;
a third determining module, configured to determine an interval corresponding to each service of the plurality of service pairs, where the interval corresponding to each service includes: the interval of the start time of the data stream corresponding to two services in the same data stream group in each service pair is the time when the forwarding device receives the data stream;
a fourth determining module, configured to determine whether the services in each service pair are associated based on the interval corresponding to each service pair, so as to obtain an association relationship between the services in each service pair;
and the first classification module is used for classifying a plurality of services in the plurality of service pairs based on the incidence relation of the services in each service pair.
14. The service classification apparatus of claim 13, wherein the fourth determination module is configured to:
determining a set of intervals of the intervals corresponding to each service pair which are smaller than a first time threshold;
determining a frequency of each interval in the set of intervals;
determining q intervals of which the frequency exceeds a frequency threshold in the group of intervals, wherein q is more than or equal to 1;
determining a maximum interval and a minimum interval of the q intervals;
determining that two services in each service pair are associated when a difference between the maximum interval and the minimum interval is less than a second time threshold.
15. The service classification device according to claim 14, characterised in that the frequency threshold Y ═ E + k × a, where E denotes the median of the frequencies of the set of intervals, a denotes the difference in the absolute median of the frequencies of the set of intervals, and k denotes a constant.
16. The service classification apparatus according to any one of claims 13 to 15, wherein the first classification module comprises:
the first determining submodule is used for determining p service groups in the plurality of services based on the incidence relation of the services in each service pair in the plurality of service pairs, wherein the service groups comprise at least two services which are correlated in pairs of the plurality of services, and p is more than or equal to 1;
a second determining sub-module, configured to determine, based on the p service groups, an associated service set for each service in the plurality of services, where the associated service set includes: the service related to each service in the service group where each service is located;
a classification submodule, configured to classify the plurality of services based on similarity of the associated service sets of the plurality of services.
17. The service classification apparatus of claim 16, wherein the first determining submodule is configured to:
determining at least one associated service pair in the plurality of service pairs based on the association relationship of the services in each service pair in the plurality of service pairs, wherein two services in the associated service pair are associated;
generating an undirected graph based on the plurality of services and the at least one associated pair of services, wherein the undirected graph comprises: the nodes are used for representing the corresponding services, and the edges are used for connecting the two nodes corresponding to the corresponding associated service pairs;
determining p maximal cliques in the undirected graph, wherein the maximal cliques comprise a plurality of nodes connected by edges between every two nodes;
and determining the p service groups which are in one-to-one correspondence with the p maximal groups, wherein the nodes in the maximal groups are in one-to-one correspondence with the services in the corresponding service groups.
18. The service classification apparatus according to claim 16 or 17, characterized in that the service classification apparatus further comprises:
a fifth determining module, configured to determine a similarity between two associated service sets of any two services in the multiple services based on a cross-over ratio between the two associated service sets.
19. The service classification apparatus according to any one of claims 16 to 18, wherein the classification submodule is configured to:
generating a similarity matrix of the plurality of services based on the similarity of the associated service sets of the plurality of services, wherein the similarity matrix comprises m rows and m columns of elements, and the ith row and the jth column of elements are used for representing: the similarity between the associated service set of the ith service and the associated service set of the jth service in the plurality of services is more than or equal to 1 and less than or equal to m, and j is more than or equal to 1 and less than or equal to m;
and clustering the similarity matrix to obtain the classification results of the services.
20. The service classification apparatus of claim 19, characterized in that the service classification apparatus further comprises:
a detection module to detect whether an accuracy of classification results of the plurality of services is less than or equal to an accuracy threshold;
and the repeating module is used for repeatedly executing the process of clustering the similarity matrix to obtain the classification results of the services when the accuracy of the classification results is less than or equal to the accuracy threshold.
21. The service classification apparatus according to any one of claims 13 to 20, characterized in that the third determination module comprises:
a ranking submodule for ranking the plurality of services;
a third determining sub-module, configured to determine, based on the ranking of the plurality of services, an interval corresponding to each service pair of the plurality of service pairs.
22. The service classification apparatus of claim 21, wherein the third determination submodule is configured to:
determining the interval T corresponding to each servicepq=Tp-Tq
Wherein each service pair comprises the p-th service and the q-th service in the plurality of services, p > q ≧ 1, and TpRepresents: the start time of any corresponding data stream of the p-th service in a data stream group corresponding to each service pair, and the TqIndicating the start time of any corresponding data stream of the q-th service in the one data stream group.
23. The service classification apparatus according to any one of claims 13 to 22, characterized in that the service classification apparatus further comprises:
a sixth determining module, configured to determine a set of clients corresponding to each service in the plurality of services, where the set of clients corresponding to each service includes: a client corresponding to the data stream corresponding to each service;
a second classification module, configured to classify the multiple services based on similarities of client sets corresponding to the multiple services;
a first adjusting module, configured to, after classifying the plurality of services in the plurality of service pairs based on the association relationship between the services in each of the plurality of service pairs, adjust the association relationship between the services in each of the plurality of service pairs based on the association relationship between the services in each of the plurality of service pairs according to a result of classifying the plurality of services based on the similarity of the client set, and classify the result of the plurality of services.
24. The service classification apparatus according to any one of claims 13 to 23, characterized in that the service classification apparatus further comprises:
an obtaining module, configured to obtain a domain name corresponding to each of the multiple services;
a third classification module, configured to classify the multiple services based on similarities of domain names corresponding to the multiple services;
and a second adjusting module, configured to, after classifying the multiple services in the multiple service pairs based on the association relationship between the services in the multiple service pairs, adjust the association relationship between the services in the multiple service pairs based on the association relationship between the services in the multiple service pairs and the results of classifying the multiple services according to the results of classifying the multiple services based on the similarity between the domain names corresponding to the multiple services.
25. A service classification apparatus, characterized in that the service classification apparatus comprises: at least one processor, at least one interface, a memory, and at least one communication bus, the processor being configured to execute a program stored in the memory to implement the service classification method of any of claims 1 to 12.
26. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the service classification method of any one of claims 1 to 12.
27. An Internet system is characterized by comprising a service classification device, a plurality of service terminals and a plurality of client terminals;
the service classification apparatus is according to any one of claims 13 to 24, or the service classification apparatus is according to claim 25.
CN201910853330.8A 2019-09-10 2019-09-10 Service classification method and device and Internet system Pending CN112560878A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910853330.8A CN112560878A (en) 2019-09-10 2019-09-10 Service classification method and device and Internet system
PCT/CN2020/112312 WO2021047401A1 (en) 2019-09-10 2020-08-29 Service classification method and apparatus, and internet system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910853330.8A CN112560878A (en) 2019-09-10 2019-09-10 Service classification method and device and Internet system

Publications (1)

Publication Number Publication Date
CN112560878A true CN112560878A (en) 2021-03-26

Family

ID=74866914

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910853330.8A Pending CN112560878A (en) 2019-09-10 2019-09-10 Service classification method and device and Internet system

Country Status (2)

Country Link
CN (1) CN112560878A (en)
WO (1) WO2021047401A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112564928A (en) * 2019-09-10 2021-03-26 华为技术有限公司 Service classification method and equipment and Internet system

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114817563B (en) * 2022-04-27 2023-04-28 电子科技大学 Mining method of specific Twitter user group based on maximum group discovery
CN117338309B (en) * 2023-08-21 2024-03-15 合肥心之声健康科技有限公司 Identity recognition method and storage medium
CN116956294B (en) * 2023-09-19 2024-01-09 北方健康医疗大数据科技有限公司 Code attack detection method, system, equipment and medium applied to training model

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7457870B1 (en) * 2004-02-27 2008-11-25 Packeteer, Inc. Methods, apparatuses and systems facilitating classification of web services network traffic
CN104168123A (en) * 2014-07-26 2014-11-26 珠海市君天电子科技有限公司 Data push method, data server, client and data push system
CN104468507B (en) * 2014-10-28 2018-01-30 刘胜利 Based on the Trojan detecting method without control terminal flow analysis
CN106341346B (en) * 2016-09-08 2019-07-19 重庆邮电大学 A kind of routing algorithm ensureing QoS in data center network based on SDN
CN107967311B (en) * 2017-11-20 2021-06-29 创新先进技术有限公司 Method and device for classifying network data streams

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112564928A (en) * 2019-09-10 2021-03-26 华为技术有限公司 Service classification method and equipment and Internet system

Also Published As

Publication number Publication date
WO2021047401A1 (en) 2021-03-18

Similar Documents

Publication Publication Date Title
CN112560878A (en) Service classification method and device and Internet system
CN110471916B (en) Database query method, device, server and medium
US10929345B2 (en) System and method of performing similarity search queries in a network
US7882262B2 (en) Method and system for inline top N query computation
WO2017160409A1 (en) Real-time detection of abnormal network connections in streaming data
CN112602304A (en) Identifying device types based on behavioral attributes
JP5950979B2 (en) Node deduplication in network monitoring system
EP3151483A1 (en) Path planning method and controller
US11250166B2 (en) Fingerprint-based configuration typing and classification
US9813442B2 (en) Server grouping system
WO2021047402A1 (en) Application identification method and apparatus, and storage medium
CN110177123B (en) Botnet detection method based on DNS mapping association graph
CN108366012B (en) Social relationship establishing method and device and electronic equipment
CN109582808A (en) A kind of user information querying method, device, terminal device and storage medium
Shim et al. Application traffic classification using payload size sequence signature
CN111460315B (en) Community portrait construction method, device, equipment and storage medium
Lee et al. Identifying and aggregating homogeneous ipv4/24 blocks with hobbit
CN114401516A (en) 5G slice network anomaly detection method based on virtual network traffic analysis
CN106445709A (en) Method and system for invoking servers in distributed manner
CN106487535B (en) Method and device for classifying network traffic data
Ma et al. GraphNEI: A GNN-based network entity identification method for IP geolocation
CN112560877A (en) Service classification method and device and Internet system
CN112564928A (en) Service classification method and equipment and Internet system
Smeriga et al. Behavior-aware network segmentation using ip flows
CN115269126B (en) Cloud platform inverse affinity scheduling system based on cosine similarity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination