WO2024001557A1 - 业务识别方法、系统、装置、存储介质及程序产品 - Google Patents

业务识别方法、系统、装置、存储介质及程序产品 Download PDF

Info

Publication number
WO2024001557A1
WO2024001557A1 PCT/CN2023/093892 CN2023093892W WO2024001557A1 WO 2024001557 A1 WO2024001557 A1 WO 2024001557A1 CN 2023093892 W CN2023093892 W CN 2023093892W WO 2024001557 A1 WO2024001557 A1 WO 2024001557A1
Authority
WO
WIPO (PCT)
Prior art keywords
domain name
information
traffic
mapping relationship
address
Prior art date
Application number
PCT/CN2023/093892
Other languages
English (en)
French (fr)
Inventor
宋科
朱娜
李华光
李林
蔡洪波
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2024001557A1 publication Critical patent/WO2024001557A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/02Capturing of monitoring data
    • H04L43/028Capturing of monitoring data by filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2483Traffic characterised by specific attributes, e.g. priority or QoS involving identification of individual flows
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/45Network directories; Name-to-address mapping
    • H04L61/4505Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
    • H04L61/4511Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]

Definitions

  • the embodiments of the present application relate to the field of communication technology, specifically, to a service identification method, system, device, storage medium and program product.
  • Deep Packet Inspection is used to identify business categories in user Internet traffic, mainly relying on plain text characteristics in user traffic (such as domain name in DNS, Host in HTTP, HTTPS/TLS/QUIC SNI) to quickly distinguish and identify service categories, so that network element equipment or network management systems have multiple functions such as statistics, billing, and quality difference analysis based on business categories.
  • Embodiments of the present application provide a service identification method, system, device, storage medium and program product.
  • embodiments of the present application provide a service identification method.
  • the method includes: obtaining a first domain name system message; and obtaining first traffic information of the target service according to the first domain name system message.
  • the first traffic The information includes domain name information; according to the first traffic information, a first mapping relationship between the target service and the domain name information and a first traffic statistical characteristic of the target service are obtained; and a second domain name system message is obtained according to the domain name information, Obtain second traffic information according to the second domain name system message, and obtain the relevant IP address of the domain name information according to the second traffic information; according to the first mapping relationship, the relevant IP address and the third
  • a traffic statistical feature is used to match time series statistical features and identify the current business.
  • embodiments of the present application provide a service identification system, including: a sampling module configured to obtain a first domain name system message, and obtain the first traffic information of the target service according to the first domain name system message, the The first traffic information includes domain name information; a training module is configured to obtain a first mapping relationship between the target business and the domain name information and a first traffic statistical characteristic of the target business based on the first traffic information; a matching module, is set to be based on the domain
  • the name information obtains the second domain name system message, obtains the second traffic information according to the second domain name system message, and obtains the relevant IP address of the domain name information according to the second traffic information, and according to the first mapping relationship, The relevant IP address is matched with the first traffic statistical characteristic in time series to identify the current service.
  • embodiments of the present application provide a service identification device, including a memory and a processor.
  • the memory stores a program.
  • the service identification method of the first aspect is implemented. .
  • embodiments of the present application provide a computer-readable storage medium that stores computer-executable instructions, and the computer-executable instructions are used to execute the service identification method of the first aspect.
  • embodiments of the present application provide a computer program product, including a computer program or computer instructions.
  • the computer program or computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device can read the computer program from the computer. Reading the storage medium reads the computer program or the computer instructions, and the processor executes the computer program or the computer instructions, so that the computer device performs the service identification method of the first aspect.
  • Figure 1 is a schematic flowchart of a service identification method provided by an embodiment of the present application
  • FIG. 2 is a detailed flowchart of step S3000 in Figure 1;
  • Figure 3 is a schematic diagram of the first-level mapping relationship and the second-level mapping relationship in the service identification method provided by an embodiment of the present application;
  • Figure 4 is a schematic diagram of the first mapping relationship in the service identification method provided by an embodiment of the present application.
  • FIG. 5 is a detailed flowchart of step S4000 in Figure 1;
  • FIG. 6 is a detailed flowchart of step S5000 in Figure 1;
  • FIG. 7 is a detailed flowchart of step S5100 in Figure 6;
  • FIG 8 is a detailed flow diagram of step S5110 in Figure 7;
  • Figure 9 is a schematic diagram of the fifth mapping relationship between relevant IP addresses in the service identification method provided by an embodiment of the present application.
  • FIG. 10 is a detailed flowchart of step S5200 in Figure 6;
  • Figure 11 is a schematic diagram of time window division in the service identification method provided by an embodiment of the present application.
  • Figure 12 is a schematic diagram of the value range corresponding to the traffic statistical characteristics in each time window in the service identification method provided by an embodiment of the present application;
  • FIG 13 is a schematic diagram of step S5230 in Figure 10;
  • Figure 14 is a schematic diagram of a service identification system provided by an embodiment of the present application.
  • words such as setting, installation, and connection should be understood in a broad sense. Those skilled in the art can reasonably determine the meaning of the above words in the embodiments of the present application based on the content of the technical solution.
  • words such as “further”, “exemplarily” or “optionally” are used as examples, illustrations or illustrations, and should not be interpreted as being more preferable or better than other embodiments or designs. Advantages. The use of the words “further,””exemplarily,” or “optionally” is intended to present relevant concepts.
  • DPI is used to identify business categories in user Internet traffic. It mainly relies on plain text characteristics in user traffic to quickly distinguish and identify business categories, so that network element equipment or network management systems can perform statistics, billing, quality difference analysis, etc. based on business categories. A variety of functions.
  • DNS Domain Name System
  • HTTP Hypertext Transfer Protocol
  • HTTPS Hypertext Transfer Protocol Secure
  • HTTPS Hypertext Transfer Protocol Secure
  • Quick UDP Quick UDP Internet Connection
  • TLS Transport Layer Security
  • SSL Secure Socket Layer
  • TLS Transport Layer Security
  • the significant changes in TLS1.3 are Extended fields that are not necessarily presented in clear text are encrypted in ClientHello and ServerHello (server hello) messages, and the Certificate (digital certificate) message itself is completely encrypted.
  • the QUIC protocol also evolved from the original Google gQUIC to the official IETF-QUIC, and the initial package ClientHello (client greeting) message was completely encrypted.
  • the DNS protocol has evolved from plain text to DoH (DNS over HTTPS) and DoQ (DNS over QUIC), so that the domain name in the DNS request message and the IP address, domain name and alias in the DNS response message are no longer visible in plain text.
  • ClientHello Although the current QUIC initial package ClientHello message is completely encrypted, its key can be calculated using the salt value and public encryption algorithm disclosed in a series of related protocols (Request For Comments, RFC), so the content of the message is essentially harmful to the network.
  • the device is publicly visible and belongs to the ClientHello pseudo-encryption.
  • IETF's ECH (Encrypted ClientHello) protocol belongs to ClientHello true encryption, which will cause the contents of ClientHello messages of TLS and QUIC to be unable to be decrypted and no longer visible to network devices.
  • DPI Deep Packet Indication
  • the domain name and IP address in the DNS request/response message, the Host field in the HTTP request, and the server name indication (Server) in the ClientHello message of SSL/TLS/QUIC Name Indication (SNI) extension fields are important features for DPI service identification and classification.
  • XR eXtended Reality
  • QoS/QoE Quality of Service, Quality of Service/ Quality of Experience, an important application in the field of experience quality
  • embodiments of the present application provide a service identification method, system, device, storage medium and program product, by obtaining the first domain name system message, and then obtaining the first traffic information of the target service according to the first domain name system message.
  • the first traffic information obtains the first mapping relationship between the target business and domain name information and the first traffic statistical characteristics of the target business, and then obtains the second domain name system message based on the domain name information, obtains the second traffic information based on the second domain name system message, and obtains the second traffic information based on the domain name information.
  • the second traffic information is to obtain the relevant IP address of the domain name information, and perform time series statistical feature matching based on the first mapping relationship, the relevant IP address and the first traffic statistical feature to identify the current business, thereby improving the DPI system's identification of encrypted services.
  • the execution subject of the service identification method is a service identification method system.
  • the service identification system includes a service identification method device.
  • the service identification method device can be an independent DPI device, a gateway/router/firewall with built-in DPI function, etc.
  • the sampling targets of this service identification method can be network terminal devices such as mobile phones and tablet computers.
  • Figure 1 is a schematic flowchart of a service identification method provided by an embodiment of the present application, including but not limited to step S1000, step S2000, step S3000, step S4000 and step S5000.
  • Step S1000 Obtain the first domain name system message.
  • the sampling terminal is set to plain text DNS, the sampling terminal initiates various services to the DNS server, initiates a query request to the DNS server, and obtains the first domain name system message according to the response message returned by the DNS server.
  • the sampling terminal can initiate one or more query requests to the DNS server for the same service, thereby implementing multiple samplings, and obtaining multiple first domain name system messages for the same service based on the response messages returned multiple times by the DNS server.
  • the sampling terminal can initiate one or more query requests to the DNS server for multiple services, thereby achieving multiple sampling of multiple services, and obtaining multiple responses of different services based on the response messages returned multiple times by the DNS server.
  • First Domain Name System message can be initiated.
  • the DNS server can be a server provided by a service provider that stores mappings of resource records such as domain names and IP addresses.
  • Step S2000 Obtain first traffic information of the target service according to the first domain name system message, where the first traffic information includes domain name information.
  • the sampling terminal can initiate the same service to the DNS server multiple times, which will generate traffic information for several domain names.
  • the first domain name system message of each sample is obtained respectively.
  • the The first traffic information of each service includes one or more domain name information of the service, and the domain name information includes several domain names of the traffic.
  • the sampling terminal can initiate different services to the DNS server, generate traffic information of several domain names in different services, sample each service multiple times, and obtain the first traffic information of each service each time.
  • the traffic information includes the domain name information of each service, and the domain name information includes the domain names of several traffic flows.
  • Step S3000 According to the first traffic information, obtain the first mapping relationship between the target service and the domain name information and the first traffic statistical characteristics of the target service.
  • the sampling terminal can initiate the same service or different services to the DNS server multiple times, and obtain the first mapping relationship between the target service and the domain name information based on the first traffic information of the service, including the business name of the target service and the third of the domain name.
  • the first mapping relationship may be a mapping relationship from a business name to a domain name.
  • the first traffic statistical characteristics include but are not limited to at least one of the following: the range of the number of concurrent TCP connections, or the range of the number of concurrent UDP connections, or the range of the ratio of the average uplink rate to the average downlink rate. , or the ratio range of upstream traffic to downstream traffic, or different network-side port number ranges, or different domain name ranges.
  • the first mapping relationship may be a mapping relationship from a domain name to a business name.
  • the first mapping relationship may be a mapping relationship from a business name to a domain name.
  • the first mapping relationship may be a mapping relationship from a domain name to a business name, and then from a business name to a domain name.
  • the first mapping relationship may be a mapping relationship from a business name to a domain name, and then from a domain name to a business name.
  • Figure 2 is a detailed flowchart of step S3000, including the following steps:
  • Step S3100 Obtain the domain name information corresponding to the business information, and establish a first-level mapping relationship from the business information to the domain name information.
  • the first traffic information includes service information
  • the service information includes a service name (or service type).
  • Network services may involve access to multiple different servers. Therefore, the service information may correspond to different domain name information for the same target service.
  • the same service name may correspond to different domain names, or different service names may correspond to the same domain name.
  • the domain name information corresponding to the business information can be obtained according to the first traffic information, and a first-level mapping relationship from the business information to the domain name information can be established, so that the corresponding one or more domain names can be found from the business information, for example, as shown in Figure 3 , business 1 corresponds to domain name 1, domain name 2, and domain name 3, and business 2 corresponds to domain name 1, domain name 3, and domain name 4.
  • the corresponding domain name 1, domain name 2, and domain name 3 can be found through business 1, and the corresponding domain name 1, domain name 3, and domain name 4 can be found through business 2.
  • Step S3200 Obtain the business information corresponding to the domain name information, and establish a secondary mapping relationship from the domain name information to the business information.
  • domain name 1 corresponds to business 1 and business 2
  • domain name 2 corresponds to business 1 and business x
  • domain name 3 corresponds to business 1
  • business 2 and business y
  • domain name 4 corresponds to business 2
  • the corresponding business can be found through domain name 1 1.
  • Business 2. Find business 1 and business x through domain name 2.
  • Step S3300 Generate a first mapping relationship in which business information and domain name information are related to each other based on the secondary mapping relationship and the primary mapping relationship.
  • the first mapping relationship is from domain name information to business information, and then from business information to domain name information.
  • the first mapping relationship is from domain name to business name, and then from domain name to business name.
  • one or more services corresponding to the domain name can be found through the domain name, and then other domain names of one or more services can be found.
  • domain name 1 corresponds to business 1 and business 2.
  • business 1 corresponds to domain name 1, domain Name 2, domain name 3, and business 2 correspond to domain name 1, domain name 3, and domain name 4 respectively. You can find the corresponding business 1 and business 2 through domain name 1, and then find the domain name 1, domain name 2, and domain name 3 corresponding to business 1 and business 2.
  • Step S4000 Obtain the second domain name system message according to the domain name information, obtain the second traffic information according to the second domain name system message, and obtain the relevant IP address of the domain name information according to the second traffic information;
  • the second domain name system message is actively and continuously collected according to the domain name information
  • the second traffic information is obtained according to the second domain name system message
  • the relevant IP address corresponding to the domain name information can be obtained according to the second traffic information.
  • Step S4100 Initiate a DNS query request to the DNS server based on the domain name information.
  • Step S4200 Receive a response message returned by the DNS server; wherein the response message includes a second domain name system message.
  • Step S4300 Obtain the second traffic information according to the second domain name system message, and obtain the relevant IP address corresponding to the domain name information according to the second traffic information.
  • the business identification system as a DNS client, periodically and proactively initiates DNS query requests to the designated DNS server based on domain name information.
  • the DNS server sends a response message including an IP address to the business identification system based on the query request.
  • the business The identification system receives the response message returned by the DNS server, and obtains the relevant IP address corresponding to the domain name information based on the response message.
  • DNS server 1 returns IP address 1 and IP address 2
  • DNS server 2 returns IP address 3 and IP address 4, thereby obtaining 4 IP addresses related to domain name 1.
  • the domain name information may be domain name information corresponding to certain business information in the second traffic information, and the corresponding domain name information is obtained through the business information.
  • the domain name information may be all domain name information obtained through the first mapping relationship. For example, the domain name of a certain business is obtained, and then multiple business names corresponding to the domain name are found according to the first-level mapping relationship, and then the domain name is obtained according to the first-level mapping relationship.
  • the secondary mapping relationship finds all domain names corresponding to multiple business names, initiates a query request to the DNS server based on all these domain names, and then obtains the relevant IP addresses corresponding to all these domain names, and establishes a cache IP address set.
  • the business identification system obtains the time to live information (Time To Live, TTL) corresponding to the domain name information based on the response message returned by the DNS, and determines whether there is an expired IP address in the relevant IP address based on the time to live information. The relevant IP If there is an expired IP address in the address, the expired IP address will be deleted, and the association between the expired IP address and the domain name will be deleted.
  • TTL Time To Live
  • Step S5000 Perform time series statistical feature matching based on the first mapping relationship, the relevant IP address and the first traffic statistical feature to identify the current service.
  • Figure 6 is a detailed flowchart of step S5000, including steps S5100 and S5200:
  • Step S5100 Determine related services of the current service according to the first mapping relationship and related IP addresses.
  • Step S5200 Perform time series statistical feature matching on relevant services according to the first traffic statistical feature to identify the current service.
  • the association between the business name and the domain name information can be found according to the first mapping relationship.
  • the domain name information Through the domain name information, multiple related IP addresses corresponding to the domain name information can be found, forming a cached IP address set.
  • Figure 7 is a detailed flowchart of step S5100, including steps S5110, S5120, and S5130:
  • Step S5110 Establish a cache IP address set according to the relevant IP address and the first mapping relationship
  • the business information corresponding to the domain name information in the second traffic information can be found through the first mapping relationship, and all IP addresses corresponding to the business information can be found according to the business information, thereby establishing a cached IP address set.
  • a second mapping relationship between relevant IP addresses and domain name information can also be established to more comprehensively expand the relevant IP addresses based on the first mapping relationship and the second mapping relationship.
  • Figure 8 is a detailed flowchart of step S5110, including steps S5111, S5112, S5113, and S5114:
  • Step S5111 Establish a second mapping relationship between relevant IP addresses and domain name information, and establish a third mapping relationship between relevant IP addresses and business information based on the first mapping relationship and the second mapping relationship.
  • Figure 9 is a schematic diagram of a fifth mapping relationship between related IP addresses.
  • the mapping relationship between the IP address and the domain name can be obtained through active DNS collection during the network traffic processing stage, that is, the second mapping relationship between the IP address and the domain name, and the second mapping relationship between the relevant IP address and the domain name information can be established. Since the second mapping relationship is The first mapping relationship includes a secondary mapping relationship from domain name information to business information. Therefore, a third mapping relationship between the relevant IP address and the business information can be established based on the secondary mapping relationship. The third mapping relationship can be from the relevant IP address to the domain name. information, and then from domain name information to business information, for example, IP address 1-domain name 1-business 1, IP address 1-domain name x-business x.
  • Step S5112 Establish a fourth mapping relationship between relevant IP addresses and domain name information based on the first mapping relationship and the third mapping relationship.
  • the first mapping relationship includes the correlation information between business information and domain name information, for example, from domain name 1 to business 1, and then from business 1 to domain name 1 and domain name 2, therefore, the business information corresponding to All domain name information of Domain name information, for example, IP address 1-domain name 1-business 1-domain name 1 (domain name 2, domain name 3), IP address 1-domain name 1-business 2-domain name 1 (domain name 3, domain name 4).
  • Step S5113 Establish a fifth mapping relationship between related IP addresses based on the second mapping relationship and the fourth mapping relationship.
  • the second mapping relationship includes the relationship between relevant IP addresses and domain name information, all domain name information corresponding to the relevant IP addresses can be found according to the second mapping relationship, so a relationship between relevant IP addresses can be established based on the second mapping relationship.
  • the fifth mapping relationship between related IP addresses includes from related IP addresses to domain name information, then from domain name information to business information, then from business information to domain name information, and then from domain name information to related IP addresses, for example, IP address 1-domain name 1- Business 1-domain name 2-IP address x.
  • Step S5114 Based on the fifth mapping relationship, establish a set of cached IP addresses corresponding to each domain name information.
  • the corresponding domain name information can be found according to the relevant IP address of a certain target business, and then all relevant IP addresses corresponding to the domain name information can be found, and all relevant IP addresses can be used as a cache IP address set.
  • Step S5120 Obtain the current IP address of the current service, and match the current IP address with the cached IP address set.
  • the current IP address of the current service is obtained, and the current IP address is matched with all related IP addresses in the cached IP address set.
  • the current IP address of the current service is obtained, and the current IP address is matched with some related IP addresses in the cached IP address set. For example, the current IP address is first matched with the cached IP address set. Match 50% of the relevant IP addresses, and then determine whether to match the other 50% of the relevant IP addresses based on the matching situation. How many related IP addresses in the cached IP address set are matched first can be determined according to the actual situation, and are not limited in this embodiment.
  • the system can obtain the four-tuple of the Transmission Control Protocol (TCP) message (i.e., source IP address, destination IP address, TCP source port, TCP destination port), establish the TCP flow context, and only For the first IP packet of the TCP flow context, compare the first IP packet with each relevant IP address in the cached IP address set.
  • TCP Transmission Control Protocol
  • the system establishes the UDP flow context based on the four-tuple of the User Datagram Protocol (UDP) message (i.e., source IP address, destination IP address, UDP source port, and UDP destination port).
  • UDP User Datagram Protocol
  • For the first IP packet of the UDP flow context compare the first IP packet with each relevant IP address in the cached IP address set.
  • Step S5130 If the current IP address successfully matches the cached IP address set, obtain the relevant traffic information of the relevant services corresponding to the current IP address.
  • the current IP address successfully matches the cached IP address set, it means that one or more related services corresponding to the current IP address have been found, and the relevant traffic information of the related services has been obtained.
  • the fifth mapping relationship includes the association relationship of related IP addresses - domain name information - business information - domain name information - related IP addresses, the corresponding business names of all related services can be found according to the current IP address, thereby finding each related business. related traffic information.
  • Figure 10 is a detailed flowchart of step S5200, including steps S5210, S5220, and S5230:
  • Step S5210 Divide the first traffic information in the target service into time windows to obtain multiple time windows; wherein the time windows include multiple first traffic statistical characteristics.
  • a certain target service is sampled multiple times, and the first traffic information after multiple samplings is obtained, and the first flow rate of the target service with the longest service time during the multiple sampling processes can be obtained.
  • Information divide the first traffic information into time windows to obtain multiple time windows, and obtain traffic statistical information of the first traffic information in each time window.
  • the traffic statistical information includes the first traffic statistical characteristics, and each time window Including multiple first traffic statistical characteristics.
  • the T1 time window includes statistics of the first traffic statistical feature A, the first traffic statistical feature B, and the first traffic statistical feature C.
  • the T2 time window includes the first traffic statistical feature A, the first traffic statistical feature C, and the first traffic statistical feature C.
  • a traffic statistical feature D and a first traffic statistical feature E include the first traffic statistical feature D and the first traffic statistical feature X in the Tm time window.
  • Step S5220 Calculate the value range corresponding to the first traffic statistical feature in each time window.
  • each first traffic statistical feature has a corresponding first statistical feature value. Since the same time window includes multiple first traffic statistical features, the same first traffic statistical feature has multiple first traffic statistical features in the same time window.
  • the first statistical feature value can obtain the value range of the same first traffic statistical feature, thereby obtaining the value ranges of different first traffic statistical features in different time windows. For example, referring to Figure 11 and Figure 12, there are m times window, in the T1 time window, the value range of the first traffic statistical feature A is [a1, a2], the value range of the first traffic statistical feature B is [b1, b2], and the value range of the first traffic statistical feature C is The value range is [c1, c2].
  • the value range of the first traffic statistical feature A is [a3, a4]
  • the value range of the first traffic statistical feature C is [c1, c2]
  • the value range of the first traffic statistical feature C is [c1, c2].
  • the value range of the first traffic statistical feature D is [d1, d2]
  • the value range of the first traffic statistical feature E is [e1, e2]
  • the value range of the first traffic statistical feature D is [d3, d4]
  • the legal value range of the first traffic statistical feature X is [x1, x2].
  • Step S5230 Obtain the second traffic statistical feature of the relevant service based on the relevant traffic information, calculate the second statistical feature value of the second traffic statistical feature, and match the second statistical feature value with the value range.
  • Figure 13 is a detailed flowchart of step S5230, including steps S5231, S5232, and S5233:
  • Step S5231 Divide the relevant traffic information into time windows according to each time window.
  • the relevant IP addresses in the cached IP address set corresponding to service 1 include IP address 1, IP address X, and IP address Y. Obtain the relevant IP addresses corresponding to these relevant IP addresses.
  • the relevant traffic information of the business is divided into time windows. Different time windows include multiple different second traffic statistical characteristics. The same time window includes different second traffic statistical characteristics. The time window of the relevant traffic information The division method is similar to the first traffic information time window division, and will not be described in detail in this embodiment.
  • Step S5232 Calculate the second statistical feature value of each second traffic statistical feature in each time window.
  • Step S5233 If at least one of the second statistical feature values is within the value range, it is determined that the second statistical feature value matches the value range, and the current service identification is successful.
  • the second statistical characteristic value of the second traffic statistical characteristic in each time window is calculated, and the second statistical characteristic value is matched with the value range of the corresponding time window. If the second statistical characteristic value of the second traffic statistical characteristic is within the corresponding time window The value range of Within the range, it is determined that the second statistical feature value matches the time window.
  • the second traffic statistical feature A represents the number of concurrent TCP connections in the time window, a1 is 44, a2 is 66, and the value range of the second traffic statistical feature A in the T1 time window is [ 44,66], in the process of identifying the current business, the number of concurrent TCP connections of the IP address of the second traffic statistical characteristic A in the T1 time window falls within this range.
  • the number of concurrent TCP connections is 55, which means The second traffic statistical characteristic A of service 1 matches in the T1 time window.
  • the second statistical feature value of at least one second traffic statistical feature is within the value range corresponding to each time window, it means that each time window of a certain service is completely matched, then The current business is identified as this business, that is, the current business identification is successful.
  • the second statistical characteristic value of at least one second traffic statistical characteristic is within the value range corresponding to at least one time window, it means that the time window of a certain service matches, and the current service is identified as The business, that is, the current business, is successfully identified.
  • each service in the related services cannot match the value range, it means that there is no identifiable service in the related services, which means that the current service identification fails.
  • the associated IP address corresponding to the current service is obtained according to the fifth mapping relationship, and the traffic information corresponding to the associated IP address is identified as the current service within a preset time period. For example, referring to Figure 11 and Figure 12, if the current service is identified as service 1, then the traffic information of IP address 1, IP address X, IP address Y and other IP addresses associated with service 1 in a subsequent period of time will be identified as service 1. , indicating that the user is performing business 1.
  • an embodiment of the present application also provides a service identification system.
  • the service identification system includes:
  • the sampling module 100 is configured to obtain the first domain name system message, obtain the first traffic information of the target service based on the first domain name system message, and obtain the relevant IP address of the target service based on the domain name information; wherein, the first traffic information Includes domain name information;
  • the training module 200 is configured to obtain a first mapping relationship between the target business and domain name information and a first traffic statistical characteristic of the target business based on the first traffic information;
  • the matching module 300 is configured to obtain the second domain name system message according to the domain name information, obtain the second traffic information according to the second domain name system message, and obtain the relevant IP address of the domain name information according to the second traffic information. According to the first mapping relationship, The relevant IP address is matched with the first traffic statistical characteristic in time series to identify the current service.
  • An embodiment of the present application also provides a service identification device.
  • the service identification device includes a memory and a processor.
  • the memory stores a program. When the program is read and executed by the processor, the service identification method in the above embodiment is implemented.
  • An embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium stores one or more programs.
  • the one or more programs can be executed by one or more processors to implement the above embodiments. business identification method.
  • An embodiment of the present application also provides a computer program product, including a computer program or computer instructions,
  • the computer program or computer instructions are stored in a computer-readable storage medium
  • the processor of the computer device reads the computer program or computer instructions from the computer-readable storage medium
  • the processor executes the computer program or computer instructions, so that the computer device performs the above embodiments business identification method.
  • memory can be used to store non-transitory software programs and non-transitory computer executable programs.
  • the memory may include high-speed random access memory and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device.
  • the memory may include memory located remotely from the processor, and the remote memory may be connected to the processor through a network. Examples of the above-mentioned networks include but are not limited to the Internet, intranets, local area networks, mobile communication networks and combinations thereof.
  • the non-transitory software programs and instructions required to implement the service identification method in the above embodiment are stored in the memory. When executed by the processor, the service identification method in the above embodiment is executed.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disk (DVD) or other optical disk storage, magnetic cassettes, tapes, disk storage or other magnetic storage devices, or may Any other medium used to store the desired information and that can be accessed by a computer.
  • communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .
  • embodiments of the present application also provide a computer program product, which includes a computer program or computer instructions.
  • the computer program or computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer program from the computer-readable storage medium.
  • Program or computer instruction the processor executes the computer program or computer instruction, causing the computer device to execute the above service identification method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本申请实施例提供了一种业务识别方法、系统、装置、存储介质及程序产品,该业务识别方法包括:获取第一域名系统消息(S1000);根据第一域名系统消息得到目标业务的第一流量信息(S2000);根据第一流量信息得到目标业务与域名信息的第一映射关系与目标业务的第一流量统计特征(S3000);根据域名信息获取第二域名系统消息,根据第二域名系统消息得到第二流量信息,并根据第二流量信息,得到域名信息的相关IP地址(S4000);根据第一映射关系、相关IP地址与第一流量统计特征,对当前业务进行业务识别。

Description

业务识别方法、系统、装置、存储介质及程序产品
相关申请的交叉引用
本申请基于申请号为202210740126.7、申请日为2022年06月28日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请实施例涉及通信技术领域,具体而言,涉及一种业务识别方法、系统、装置、存储介质及程序产品。
背景技术
深度报文检测技术(Deep Packet Inspection,DPI)用于识别用户上网流量中的业务类别,主要依赖于用户流量中的明文特征(如DNS中的域名、HTTP中的Host、HTTPS/TLS/QUIC中的SNI)来快速区分识别业务类别,从而让网元设备或网管系统具备按业务类别进行统计、计费、质差分析等多种功能。
然而,对于采取完全加密方式的网络业务,例如采用基于DoH(DNS over HTTPS)、DoQ(DNS over QUIC)的DNS协议以及基于ECH(Encrypted ClientHello)的HTTPS协议及QUIC协议进行加密的网络业务,网络流量中的DNS域名和SNI(Service Name Indication)都加密不可见,致使DPI难以获取DNS域名、SNI等常见的明文流量特征信息,导致对业务类型的识别能力下降。因此,如何使DPI能够实现对完全加密用户流量的业务的识别,是当下亟待讨论和解决的问题。
发明内容
本申请实施例提供一种业务识别方法、系统、装置、存储介质及程序产品。
第一方面,本申请实施例提供一种业务识别方法,所述方法包括:获取第一域名系统消息;根据所述第一域名系统消息,得到目标业务的第一流量信息,所述第一流量信息包括域名信息;根据所述第一流量信息,得到所述目标业务与所述域名信息的第一映射关系与目标业务的第一流量统计特征;根据所述域名信息获取第二域名系统消息,根据所述第二域名系统消息得到第二流量信息,并根据所述第二流量信息,得到所述域名信息的相关IP地址;根据所述第一映射关系、所述相关IP地址与所述第一流量统计特征进行时序统计特征匹配,对当前业务进行业务识别。
第二方面,本申请实施例提供一种业务识别系统,包括:采样模块,被设置为获取第一域名系统消息,根据所述第一域名系统消息,得到目标业务的第一流量信息,所述第一流量信息包括域名信息;训练模块,被设置为根据所述第一流量信息,得到所述目标业务与所述域名信息的第一映射关系与目标业务的第一流量统计特征;匹配模块,被设置为根据所述域 名信息获取第二域名系统消息,根据所述第二域名系统消息得到第二流量信息,并根据所述第二流量信息,得到所述域名信息的相关IP地址,根据所述第一映射关系、所述相关IP地址与所述第一流量统计特征进行时序统计特征匹配,对当前业务进行业务识别。
第三方面,本申请实施例提供一种业务识别装置,包括存储器和处理器,所述存储器存储有程序,所述程序在被所述处理器读取执行时,实现第一方面的业务识别方法。
第四方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可执行指令,所述计算机可执行指令用于执行如第一方面的业务识别方法。
第五方面,本申请实施例提供一种计算机程序产品,包括计算机程序或计算机指令,所述计算机程序或所述计算机指令存储在计算机可读存储介质中,计算机设备的处理器从所述计算机可读存储介质读取所述计算机程序或所述计算机指令,所述处理器执行所述计算机程序或所述计算机指令,使得所述计算机设备执行如第一方面的业务识别方法。
附图说明
图1为本申请一实施例提供的业务识别方法的流程示意图;
图2为图1中步骤S3000的细化流程示意图;
图3为本申请一实施例提供的业务识别方法中一级映射关系、二级映射关系的示意图;
图4为本申请一实施例提供的业务识别方法中第一映射关系的示意图;
图5为图1中步骤S4000的细化流程示意图;
图6为图1中步骤S5000的细化流程示意图;
图7为图6中步骤S5100的细化流程示意图;
图8为图7中步骤S5110的细化流量示意图;
图9为本申请一实施例提供的业务识别方法中相关IP地址之间的第五映射关系的示意图;
图10为图6中步骤S5200的细化流程示意图;
图11为本申请一实施例提供的业务识别方法中时间窗口划分的示意图;
图12为本申请一实施例提供的业务识别方法中各时间窗口中流量统计特征对应的取值范围的示意图;
图13为图10中步骤S5230示意图;
图14为本申请一实施例提供的业务识别系统的示意图。
具体实施方式
需要说明的是,虽然在系统示意图中进行了功能模块划分,在流程示意图中示出了逻辑顺序,但是在某些情况下,可以以不同于装置中的模块划分,或流程示意图中的顺序执行所示出或描述的步骤。说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。
本申请实施例的描述中,除非另有明确的限定,设置、安装、连接等词语应做广义理解, 所属技术领域技术人员可以结合技术方案的内容合理确定上述词语在本申请实施例中的含义。本申请实施例中,“进一步地”、“示例性地”或者“可选地”等词用于表示作为例子、例证或说明,不应被解释为比其它实施例或设计方案更优选或更具有优势。使用“进一步地”、“示例性地”或者“可选地”等词旨在呈现相关概念。
DPI用于识别用户上网流量中的业务类别,主要依赖于用户流量中的明文特征来快速区分识别业务类别,从而让网元设备或网管系统具备按业务类别进行统计、计费、质差分析等多种功能。
当前互联网中的主流协议是域名系统(Domain Name System,DNS)、超文本传输协议(Hypertext Transfer Protocol,HTTP)、超文本传输协议安全(Hypertext Transfer Protocol Secure,HTTPS)、快速UDP互联网连接(Quick UDP Internet Connection,QUIC),其中DNS和HTTP是明文的,HTTPS和QUIC是加密的。
近年来,为了保护用户隐私及上网安全,越来越多的网络业务(例如,网站、应用App)倾向于采用加密方式,使得HTTPS和QUIC的占比越来越高。其中,HTTPS所基于的安全套接层(Secure Socket Layer,SSL)、传输层安全(Transport Layer Security,TLS)协议也在不断演进,从TLS1.2演变至TLS1.3,TLS1.3的显著变化是ClientHello、ServerHello(服务器问候)消息中将无必要以明文呈现的扩展字段都加密了,Certificate(数字证书)消息本身被完全加密了。QUIC协议也从最初的Google gQUIC演进到正式的IETF-QUIC后,将初始包ClientHello(客户端问候)消息进行了整个加密。DNS协议从明文演进到DoH(DNS over HTTPS)、DoQ(DNS over QUIC),使得DNS请求消息中的域名、DNS响应消息中的IP地址、域名与别名,也都不再是明文可见。
目前QUIC的初始包ClientHello消息虽然整个被加密,但利用一系列相关协议(Request For Comments,RFC)公开的盐值和公开加密算法是可以计算其密钥的,所以该消息的内容本质上对网络设备是公开可见的,属于ClientHello伪加密。然而,IETF的ECH(Encrypted ClientHello)协议属于ClientHello真加密,将导致TLS及QUIC的ClientHello消息内容无法解密,不再对网络设备可见。
对于DPI(Deep Packet Indication,深度报文检测)设备来说,DNS请求/响应消息中的域名和IP地址、HTTP请求中的Host字段、SSL/TLS/QUIC的ClientHello消息中的服务器名称指示(Server Name Indication,SNI)扩展字段,都是用于DPI业务识别分类的重要特征。
目前,现网中已经出现基于DoH或DoQ的DNS以及ClientHello伪加密的QUIC的业务流量,这迫使DPI设备需要对伪加密的QUIC ClientHello进行解密才能获取其中的SNI信息进行业务识别。将来,如果网络中出现基于DoH或DoQ的DNS以及基于ECH的ClientHello真加密的HTTPS/TLS/QUIC的业务流量,即完全加密的业务流量,DPI就无法采用已有的技术进行业务识别了,DPI识别率将显著下降,DPI识别能力将受到严重影响。而DPI业务识别功能已成为各种网络设备中的基础功能,现在5G网络中用户的业务流量越来越大,需要通过DPI进行业务识别,从而为不同优先级的业务分配不同的网络承载资源,使得资源与业务相匹配, 既满足业务的正常运行又不会浪费资源。比如,5G的重点业务扩展现实(eXtended Reality,XR),就需要首先通过DPI进行识别,才能继而分配网络承载资源,使其得到重点保障,XR属于网络QoS/QoE(Quality of Service,服务质量/Quality of Experience,体验质量)领域的重要应用。
基于此,本申请实施例提供了一种业务识别方法、系统、装置、存储介质及程序产品,通过获取第一域名系统消息,然后根据第一域名系统消息得到目标业务的第一流量信息,根据第一流量信息得到目标业务与域名信息的第一映射关系与目标业务的第一流量统计特征,然后根据域名信息获取第二域名系统消息,根据第二域名系统消息得到第二流量信息,并根据第二流量信息,得到域名信息的相关IP地址,根据第一映射关系、相关IP地址与第一流量统计特征进行时序统计特征匹配,对当前业务进行业务识别,进而提高DPI系统对加密业务的识别能力,能够确保DPI在上述场景下的正常业务识别。
本实施例中,该业务识别方法的执行主体为业务识别方法系统,该业务识别系统包括业务识别方法装置,该业务识别方法装置可以是独立DPI设备、内置DPI功能的网关/路由器/防火墙等。该业务识别方法的采样目标可以是手机、平板电脑等网络终端设备。
参照图1,图1为本申请一实施例提供的业务识别方法的流程示意图,包括但不限于步骤S1000、步骤S2000、步骤S3000、步骤S4000以及步骤S5000。
步骤S1000:获取第一域名系统消息。
在一些实施例中,设置采样终端为明文方式的DNS,采样终端向DNS服务器发起各种业务,并向DNS服务器发起查询请求,根据DNS服务器返回的响应消息得到第一域名系统消息。
在一些实施例中,采样终端可以对同一业务向DNS服务器发起一次或多次查询请求,从而实现多次采样,根据DNS服务器多次返回的响应消息得到同一业务的多个第一域名系统消息。
在一些实施例中,采样终端可以对多种业务向DNS服务器发起一次或多次查询请求,从而实现对多种业务的多次采样,根据DNS服务器多次返回的响应消息得到不同业务的多个第一域名系统消息。
需要说明的是,该DNS服务器可以是多个,采样终端可以向多个DNS服务器发送查询请求。DNS服务器可以是服务商提供的存储有域名和IP地址等类型资源记录映射的服务器。
步骤S2000:根据第一域名系统消息,得到目标业务的第一流量信息,第一流量信息包括域名信息。
在一些实施例中,采样终端可以向DNS服务器多次发起同一业务,会产生若干域名的流量信息,经过多次采样,分别获取每次采样的第一域名系统消息,根据第一域名系统消息得到每次业务的第一流量信息,第一流量信息包括该业务的一个或多个域名信息,该域名信息包括若干流量的域名。
在一些实施例中,采样终端可以向DNS服务器发起不同的业务,产生不同业务中若干域名的流量信息,分别对每种业务进行多次采样,获取每种业务每次的第一流量信息,第一流量信息中包括每种业务的域名信息,该域名信息包括若干流量的域名。
步骤S3000:根据第一流量信息,得到目标业务与域名信息的第一映射关系与目标业务的第一流量统计特征。
在一些实施例中,采样终端可以向DNS服务器多次发起同一业务或不同业务,根据业务的第一流量信息获取目标业务与域名信息的第一映射关系,包括目标业务的业务名与域名的第一映射关系,该第一映射关系可以是业务名到域名的映射关系。获取该目标业务的第一流量统计特征,该第一流量统计特征包括但不限于以下至少之一:并发TCP连接数范围、或并发UDP连接数范围、或上行平均速率与下行平均速率的比率范围、或上行流量与下行流量的比率范围、或不同的网络侧端口数范围、或不同的域名数范围。
在一些实施例中,该第一映射关系可以是域名到业务名的映射关系。
在一些实施例中,该第一映射关系可以是业务名到域名的映射关系。
在一些实施例中,该第一映射关系可以是域名到业务名,再由业务名到域名的映射关系。
在一些实施例中,该第一映射关系可以是业务名到域名,再由域名到业务名的映射关系。
在一些实施例中,如图2所示,图2为步骤S3000的细化流程示意图,包括以下步骤:
步骤S3100,获取业务信息对应的域名信息,建立由业务信息到域名信息的一级映射关系。
需要说明的是,第一流量信息包括业务信息,该业务信息包括业务名(或称为业务类型)。网络业务可以涉及多个不同服务器的访问,因此,业务信息中可以是同一目标业务对应不同的域名信息,例如,同一业务名对应不同域名,或不同业务名可以对应同一域名。首先可以根据第一流量信息中获取业务信息对应的域名信息,建立由业务信息到域名信息的一级映射关系,从而可以从业务信息找到对应的一个或多个域名,例如,如图3所示,业务1对应域名1、域名2、域名3,业务2对应域名1、域名3、域名4。建立由业务信息到域名信息的一级映射关系后,可以通过业务1找到对应的域名1、域名2、域名3,通过业务2找到对应的域名1、域名3、域名4。
步骤S3200,获取域名信息对应的业务信息,建立由域名信息到业务信息的二级映射关系。
需要说明的是,由于同一业务信息可以对应不同域名信息,同一域名信息可以对应不同业务信息,因此,可以获取域名信息对应的业务信息,建立由域名信息到业务信息的二级映射关系,例如,如图3所示,域名1对应业务1、业务2,域名2对应业务1、业务x,域名3对应业务1、业务2、业务y,域名4对应业务2,可以通过域名1找到对应的业务1、业务2,通过域名2找到业务1、业务x,通过域名3找到业务1、业务2、业务y,通过域名4找到业务2。
步骤S3300,根据二级映射关系与一级映射关系,生成业务信息与域名信息相互关联的第一映射关系。
在一些实施例中,该第一映射关系为域名信息到业务信息,再由业务信息到域名信息的关联关系,例如,如图4所示,该第一映射关系为从域名到业务名,再由业务名到域名的关联关系,根据第一映射关系可以通过域名找到与该域名对应的一个或多个业务,进而找到一个或多个业务的其它域名,例如,域名1对应业务1、业务2,业务1又分别对应域名1、域 名2、域名3,业务2分别对应域名1、域名3、域名4,可以通过域名1找到对应的业务1、业务2,进而找到业务1、业务2对应的域名1、域名2、域名3、域名4。
步骤S4000:根据域名信息获取第二域名系统消息,根据第二域名系统消息得到第二流量信息,并根据第二流量信息,得到域名信息的相关IP地址;
在一些实施例中,根据域名信息主动地持续性采集第二域名系统消息,根据第二域名系统消息得到第二流量信息,根据第二流量信息可以得到域名信息对应的相关IP地址,参照图5,图5为步骤S4000的细化流程示意图,包括步骤S4100、4200、4300:
步骤S4100:根据域名信息向DNS服务器发起DNS查询请求。
步骤S4200:接收DNS服务器返回的响应消息;其中,该响应消息包括第二域名系统消息。
步骤S4300:根据第二域名系统消息,得到第二流量信息,并根据第二流量信息得到域名信息对应的相关IP地址。
需要说明的是,业务识别系统作为DNS客户端的角色,根据域名信息周期性地持续主动地向指定DNS服务器发起DNS查询请求,DNS服务器根据查询请求向业务识别系统发送包括IP地址的响应消息,业务识别系统接收DNS服务器返回的响应消息,根据该响应消息得到域名信息对应的相关IP地址。业务识别系统所通信的DNS服务器可以是一个或多个,不同DNS服务器可能返回的结果不同,向多个不同DNS服务器查询同一域名,可以得到更为全面的IP地址,例如,查询域名1对应的IP地址,可以向不同DNS服务器发起DNS查询请求,DNS服务器1返回IP地址1、IP地址2,DNS服务器2返回IP地址3、IP地址4,从而获得域名1相关的4个IP地址。
在一些实施例中,该域名信息可以是第二流量信息中某业务信息对应的域名信息,通过业务信息获取对应的域名信息。
在一些实施例中,该域名信息可以是通过第一映射关系获得到的所有域名信息,例如,获取某业务的域名,然后根据一级映射关系找到该域名对应的多个业务名,然后再根据二级映射关系找到多个业务名对应的所有域名,根据这些所有域名向DNS服务器发起查询请求,进而获得这些所有域名对应的相关IP地址,并建立缓存IP地址集合。
需要说明的是,业务识别系统根据DNS返回的响应消息,得到域名信息对应的生存时间信息(Time To Live,TTL),根据该生存时间信息确定相关IP地址中是否存在过期的IP地址,相关IP地址中存在过期的IP地址,则删除过期的IP地址,并删除该过期的IP地址与域名的关联关系。
步骤S5000:根据第一映射关系、相关IP地址与第一流量统计特征进行时序统计特征匹配,对当前业务进行业务识别。
在一些实施例中,参照图6,图6为步骤S5000的细化流程示意图,包括步骤S5100、S5200:
步骤S5100:根据第一映射关系、相关IP地址确定当前业务的相关业务。
步骤S5200:根据第一流量统计特征对相关业务进行时序统计特征匹配,以对当前业务进行业务识别。
在一些实施例中,第二流量信息中可能与存在零个、一个或多个与当前业务对应的相关业务,若相关业务为零个,则说明第二流量信息中不存在与当前业务对应的相关业务,表示业务识别失败,若第二流量信息中存在一个或多个与当前业务对应的相关业务,则根据第一流量统计特征对相关业务进行时序统计特征匹配,从而对当前业务进行业务识别。
在一些实施例中,根据第一映射关系可以找到业务名与域名信息的关联关系,通过域名信息可以找到域名信息对应的多个相关的相关IP地址,形成缓存IP地址集合,在识别当前业务时,可以获取当前业务的当前IP地址,将当前IP地址与缓存IP地址集合进行匹配,若当前IP地址与缓存IP地址集合匹配成功,则获取当前IP地址对应的相关业务的相关流量信息,从而确定当前业务对应的相关业务,进而再对相关流量信息中的第二流量统计特征与第一流量统计特征进行匹配,进而对当前业务进行识别。
在一些实施例中,参照图7,图7为步骤S5100的细化流程示意图,包括步骤S5110、S5120、S5130:
步骤S5110:根据相关IP地址与第一映射关系建立缓存IP地址集合;
在一些实施例中,可以通过第一映射关系找到第二流量信息中域名信息对应的业务信息,根据业务信息找到与业务信息对应的所有IP地址,从而建立缓存IP地址集合。
在一些实施例中,还可以通过建立相关IP地址与域名信息的第二映射关系,根据第一映射关系、第二映射关系去更全面地拓展相关的IP地址。参照图8,图8为步骤S5110的细化流程示意图,包括步骤S5111、S5112、S5113、S5114:
步骤S5111:建立相关IP地址与域名信息的第二映射关系,根据第一映射关系与第二映射关系建立相关IP地址与业务信息第三映射关系。
参照图9,图9为相关IP地址之间的第五映射关系的示意图。需要说明的是,可以在网络流量处理阶段通过主动DNS采集获取IP地址到域名的映射关系,即IP地址与域名的第二映射关系,建立相关IP地址与域名信息的第二映射关系,由于第一映射关系中包括了域名信息到业务信息的二级映射关系,因此可以根据该二级映射关系建立相关IP地址与业务信息的第三映射关系,第三映射关系可以是从相关IP地址到域名信息,再由域名信息到业务信息,例如,IP地址1-域名1-业务1,IP地址1-域名x-业务x。
步骤S5112:根据第一映射关系和第三映射关系建立相关IP地址与域名信息的第四映射关系。
需要说明的是,由于第一映射关系包括了业务信息与域名信息相互关联的关联信息,例如,从域名1到业务1,再由业务1到域名1、域名2,因此,可以找到业务信息对应的所有域名信息,根据第一映射关系和第三映射关系,建立相关IP地址与域名信息的第四映射关系,包括从IP地址到域名信息,再由域名信息到业务信息,再由业务信息到域名信息,例如,IP地址1-域名1-业务1-域名1(域名2、域名3),IP地址1-域名1-业务2-域名1(域名3、域名4)。
步骤S5113:根据第二映射关系和第四映射关系建立相关IP地址之间的第五映射关系。
需要说明的是,由于第二映射关系包括了相关IP地址与域名信息的关联关系,可以根据第二映射关系找到相关IP地址对应的所有域名信息,因此根据第二映射关系可以建立相关IP地址之间的第五映射关系,包括从相关IP地址到域名信息,再由域名信息到业务信息,再由业务信息到域名信息,再由域名信息到相关IP地址,例如,IP地址1-域名1-业务1-域名2-IP地址x。
步骤S5114:根据第五映射关系,建立每个域名信息对应的缓存IP地址集合。
需要说明的是,根据第五映射关系,可以根据某一目标业务的相关IP地址可以找到对应的域名信息,进而找到该域名信息对应所有相关IP地址,将所有相关IP地址作为缓存IP地址集合。
在一实施例中,可以根据该生存时间信息确定缓存IP地址集合中是否存在过期的IP地址,若缓存IP地址集合中存在过期的IP地址,则从缓存IP地址集合中删除过期的IP地址,并删除该过期的IP地址与域名的关联关系。
步骤S5120:获取当前业务的当前IP地址,将当前IP地址与缓存IP地址集合进行匹配。
在一实施例中,在识别当前业务过程中,获取当前业务的当前IP地址,将当前IP地址与缓存IP地址集合中所有相关IP地址进行匹配。
在一实施例中,在识别当前业务过程中,获取当前业务的当前IP地址,将当前IP地址与缓存IP地址集合中部分相关IP地址进行匹配,例如,先将当前IP地址与缓存IP地址集合中50%的相关IP地址进行匹配,再根据匹配情况确定是否与另外50%的相关IP地址进行匹配。先匹配缓存IP地址集合中多少相关IP地址,可以根据实际情况进行决定,本实施例不作限定。
在一实施例中,系统可以获取传输控制协议(Transmission Control Protocol,TCP)报文的四元组(即源IP地址、目的IP地址、TCP源端口、TCP目的端口),建立TCP流上下文,只对TCP流上下文的第一个IP报文,将第一个IP报文与缓存IP地址集合中的每个相关IP地址进行对比。
在一实施例中,系统根据用户数据报协议(User Datagram Protocol,UDP)报文的四元组(即源IP地址、目的IP地址、UDP源端口、UDP目的端口),建立UDP流上下文,只对UDP流上下文的第一个IP报文,将第一个IP报文与缓存IP地址集合中的每个相关IP地址进行对比。
步骤S5130:若当前IP地址与缓存IP地址集合匹配成功,则获取当前IP地址对应的相关业务的相关流量信息。
在一实施例中,若当前IP地址与缓存IP地址集合匹配成功,则表示找到了当前IP地址对应的一个或多个相关业务,获取该相关业务的相关流量信息。其中,由于第五映射关系中包括相关IP地址-域名信息-业务信息-域名信息-相关IP地址的关联关系,因此可以根据当前IP地址找到对应的所有相关业务的业务名,从而找到各相关业务的相关流量信息。
在一些实施例中,参照图10,图10为步骤S5200的细化流程示意图,包括步骤S5210、S5220、S5230:
步骤S5210:将目标业务中第一流量信息进行时间窗口划分,得到多个时间窗口;其中,时间窗口包括多个第一流量统计特征。
在一些实施例中,在样本采样阶段,对某目标业务进行了多次采样,获得多次采样后的第一流量信息,可以获得多次采样过程中业务时间最长的目标业务的第一流量信息,将该第一流量信息进行时间窗口划分,得到多个时间窗口,获取第一流量信息在每个时间窗口内的流量统计信息,流量统计信息种包括第一流量统计特征,每个时间窗口包括多个第一流量统计特征。例如,参照图11和图12,将业务1划分为m各时间窗口,包括T1、T2、T3、Tm,时间窗口T1=(t1-t0),时间窗口T2=(t2-t1),时间窗口T3=(t3-t2),分别获取业务1在T1、T2、T3内的流量统计信息。在T1时间窗口中包括统计第一流量统计特征A、第一流量统计特征B、第一流量统计特征C,在T2时间窗口中包括有第一流量统计特征A、第一流量统计特征C、第一流量统计特征D、第一流量统计特征E,在Tm时间窗口中包括有第一流量统计特征D、第一流量统计特征X。
步骤S5220:计算第一流量统计特征在各时间窗口对应的取值范围。
在一些实施例中,各第一流量统计特征具有对应的第一统计特征值,由于同一时间窗口包括多个同一第一流量统计特征,因此,同一第一流量统计特征在同一时间窗口具备多个第一统计特征值,因此可以获得同一第一流量统计特征的取值范围,从而获得不同第一流量统计特征在不同时间窗口的取值范围,例如,参照图11和图12,存在m个时间窗口,在T1时间窗口中,第一流量统计特征A的取值范围是[a1,a2],第一流量统计特征B的取值范围是[b1,b2],第一流量统计特征C的取值范围是[c1,c2],在T2时间窗口中,第一流量统计特征A的取值范围是[a3,a4],第一流量统计特征C的取值范围是[c1,c2],第一流量统计特征D的取值范围是[d1,d2],第一流量统计特征E的取值范围是[e1,e2],在Tm时间窗口中,第一流量统计特征D的取值范围是[d3,d4],第一流量统计特征X的合法取值范围是[x1,x2]。
步骤S5230:根据相关流量信息,得到相关业务的第二流量统计特征,并计算第二流量统计特征的第二统计特征值,并将第二统计特征值与取值范围进行匹配。
在一些实施例中,参照图13,图13为步骤S5230的细化流程示意图,包括步骤S5231、S5232、S5233:
步骤S5231:根据各时间窗口,将相关流量信息进行时间窗口划分。
需要说明的是,参照图12,图12的虚线框中,业务1对应的缓存IP地址集合中的相关IP地址包括IP地址1、IP地址X、IP地址Y,获取这些相关IP地址对应的相关业务的相关流量信息,并对这些相关流量信息进行时间窗口划分,不同时间窗口包括多个不同的第二流量统计特征,相同的时间窗口包括不同的第二流量统计特征,相关流量信息的时间窗口划分方法与第一流量信息时间窗口划分相似,本实施例不作赘述。
步骤S5232:计算各第二流量统计特征在各时间窗口的第二统计特征值。
步骤S5233:若各第二统计特征值中的至少一个在取值范围内,则判断第二统计特征值与取值范围匹配,当前业务识别成功。
需要说明的是,计算第二流量统计特征在各时间窗口的第二统计特征值,将第二统计特征值与对应时间窗口的取值范围进行匹配,若第二流量统计特征值在对应时间窗口的取值范 围之内,则判断该第二统计特征值与该时间窗口匹配。
在一些实施例中,若各第二流量统计特征的第二统计特征值都在每个时间窗口对应的取值范围之内,则说明某个业务的每个时间窗口都完整匹配,则将当前业务识别为该业务,即当前业务识别成功。例如,参照图11和图12,第二流量统计特征A代表该时间窗口内的并发TCP连接数,a1为44,a2为66,第二流量统计特征A在T1时间窗口的取值范围为[44,66],在对当前业务进行识别的过程中,第二流量统计特征A在T1时间窗口内的IP地址的并发TCP连接数落入该范围内,例如并发TCP连接数为55,则表示业务1的第二流量统计特征A在该T1时间窗口匹配。
在一些实施例中,若至少一个第二流量统计特征的第二统计特征值在对应在每个时间窗口对应的取值范围之内,则说明某个业务的每个时间窗口都完整匹配,则将当前业务识别为该业务,即当前业务识别成功。
在一些实施例中,若至少一个第二流量统计特征的第二统计特征值在至少一个时间窗口对应的取值范围之内,则说明某个业务的该时间窗口匹配,则将当前业务识别为该业务,即当前业务识别成功。
在一些实施例中,若相关业务中每个业务都无法与取值范围匹配,则说明相关业务中不存在可识别的业务,即代表当前业务识别失败。
在一些实施例中,若当前业务识别成功,则放弃识别相关业务中未进行识别的业务。
在一些实施例中,若当前业务识别成功,则根据第五映射关系得到当前业务对应的关联IP地址,在预设时间段内将关联IP地址对应的流量信息均识别为当前业务。例如,参照图11和图12,当前业务识别为业务1,则在后续一段时间内的与业务1关联的IP地址1、IP地址X、IP地址Y等IP地址的流量信息均识别为业务1,代表用户正在执行业务1。
本申请的一个实施例还提供了一种业务识别系统,参照图14,该业务识别系统业务识别系统包括:
采样模块100,被设置为获取第一域名系统消息,并根据第一域名系统消息,得到目标业务的第一流量信息,并根据域名信息,得到目标业务的相关IP地址;其中,第一流量信息包括域名信息;
训练模块200,被设置为根据第一流量信息,得到目标业务与域名信息的第一映射关系与目标业务的第一流量统计特征;
匹配模块300,被设置为根据域名信息获取第二域名系统消息,根据第二域名系统消息得到第二流量信息,并根据第二流量信息,得到域名信息的相关IP地址,根据第一映射关系、相关IP地址与第一流量统计特征进行时序统计特征匹配,对当前业务进行业务识别。
本申请的一个实施例还提供了一种业务识别装置,该业务识别装置包括存储器和处理器,存储器存储有程序,程序在被处理器读取执行时,实现上述实施例中的业务识别方法。
本申请的一个实施例还提供了一种计算机可读存储介质,计算机可读存储介质存储有一个或者多个程序,一个或者多个程序可被一个或者多个处理器执行,以实现上述实施例中的业务识别方法。
本申请的一个实施例还提供了一种计算机程序产品,包括计算机程序或计算机指令,计 算机程序或计算机指令存储在计算机可读存储介质中,计算机设备的处理器从计算机可读存储介质读取计算机程序或计算机指令,处理器执行计算机程序或计算机指令,使得计算机设备执行上述实施例中的业务识别方法。
存储器作为一种非暂态计算机可读存储介质,可用于存储非暂态软件程序以及非暂态性计算机可执行程序。此外,存储器可以包括高速随机存取存储器,还可以包括非暂态存储器,例如至少一个磁盘存储器件、闪存器件、或其他非暂态固态存储器件。在一些实施方式中,存储器可包括相对于处理器远程设置的存储器,这些远程存储器可以通过网络连接至该处理器。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
实现上述实施例的业务识别方法所需的非暂态软件程序以及指令存储在存储器中,当被处理器执行时,执行上述实施例中的业务识别方法。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统可以被实施为软件、固件、硬件及其适当的组合。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。
此外,本申请实施例还提供了一种计算机程序产品,包括计算机程序或计算机指令,计算机程序或计算机指令存储在计算机可读存储介质中,计算机设备的处理器从计算机可读存储介质读取计算机程序或计算机指令,处理器执行计算机程序或计算机指令,使得计算机设备执行如上的业务识别方法。
以上是对本申请的若干实施方式进行了说明,但本申请并不局限于上述实施方式,熟悉本领域的技术人员在不违背本申请精神的前提下还可作出种种的等同变形或替换,这些等同的变形或替换均包含在本申请权利要求所限定的范围内。

Claims (16)

  1. 一种业务识别方法,所述方法包括:
    获取第一域名系统消息;
    根据所述第一域名系统消息,得到目标业务的第一流量信息;其中,所述第一流量信息包括域名信息;
    根据所述第一流量信息,得到所述目标业务与所述域名信息的第一映射关系与所述目标业务的第一流量统计特征;
    根据所述域名信息获取第二域名系统消息,根据所述第二域名系统消息得到第二流量信息,并根据所述第二流量信息,得到所述域名信息的相关IP地址;
    根据所述第一映射关系、所述相关IP地址与所述第一流量统计特征,对当前业务进行业务识别。
  2. 如权利要求1所述的业务识别方法,其中,所述第一流量信息包括业务信息,所述根据所述第一流量信息,得到所述目标业务与所述域名信息的第一映射关系,包括:
    获取所述业务信息对应的所述域名信息,建立由所述业务信息到所述域名信息的一级映射关系;
    获取所述域名信息对应的所述业务信息,建立由所述域名信息到所述业务信息的二级映射关系;
    根据所述二级映射关系与所述一级映射关系,生成所述业务信息与所述域名信息相互关联的所述第一映射关系。
  3. 如权利要求2所述的业务识别方法,其中,所述根据所述第一映射关系、所述相关IP地址与所述第一流量统计特征,对当前业务进行业务识别,包括:
    根据所述第一映射关系、所述相关IP地址确定所述当前业务的相关业务;
    根据所述第一流量统计特征对所述相关业务进行时序统计特征匹配,以对所述当前业务进行业务识别。
  4. 如权利要求3所述的业务识别方法,其中,所述根据所述第一映射关系、所述相关IP地址确定所述当前业务的相关业务,包括:
    根据所述相关IP地址与所述第一映射关系建立每个所述域名信息对应的缓存IP地址集合;
    获取所述当前业务的当前IP地址,将所述当前IP地址与所述缓存IP地址集合进行匹配;
    若所述当前IP地址与所述缓存IP地址集合匹配成功,则获取所述当前IP地址对应的相关业务的相关流量信息。
  5. 如权利要求4所述的业务识别方法,其中,所述根据所述域名信息获取第二域名系统消息,根据所述第二域名系统消息得到第二流量信息,并根据所述第二流量信息,得到所述域名信息的相关IP地址,包括:
    根据所述域名信息,向DNS服务器发起DNS查询请求;
    接收所述DNS服务器返回的响应消息;其中,所述响应消息包括所述第二域名系统消息;
    根据所述第二域名系统消息,得到第二流量信息,并根据所述第二流量信息得到所述域名信息对应的所述相关IP地址。
  6. 如权利要求5所述的业务识别方法,其中,所述业务识别方法还包括:
    根据所述响应消息,得到所述域名信息对应的生存时间信息;
    根据所述生存时间信息,从所述缓存IP地址集合中删除过期的IP地址。
  7. 如权利要求4所述的业务识别方法,其中,所述根据所述相关IP地址与所述第一映射关系建立缓存IP地址集合,包括:
    建立所述相关IP地址与所述域名信息的第二映射关系,根据所述第一映射关系与所述第二映射关系建立所述相关IP地址与所述业务信息第三映射关系;
    根据所述第一映射关系和所述第三映射关系建立所述相关IP地址与所述域名信息的第四映射关系;
    根据所述第二映射关系和所述第四映射关系建立所述相关IP地址之间的第五映射关系;
    根据所述第五映射关系,建立每个所述域名信息对应的所述缓存IP地址集合。
  8. 如权利要求4所述的业务识别方法,其中,所述根据所述第一流量统计特征对所述相关业务进行时序统计特征匹配,包括:
    将所述目标业务中所述第一流量信息进行时间窗口划分,得到多个时间窗口;其中,所述时间窗口包括多个所述第一流量统计特征;
    计算所述第一流量统计特征在各所述时间窗口对应的取值范围;
    根据所述相关流量信息,得到所述相关业务的第二流量统计特征,并计算所述第二流量统计特征的第二统计特征值,并将所述第二统计特征值与所述取值范围进行匹配。
  9. 如权利要求8所述的业务识别方法,其中,所述根据所述相关流量信息,得到所述相关业务的第二流量统计特征,并计算所述第二流量统计特征的第二统计特征值,并将所述第二统计特征值与所述取值范围进行匹配,包括:
    根据各所述时间窗口,将所述相关流量信息进行时间窗口划分;
    计算各所述第二流量统计特征在各所述时间窗口的第二统计特征值;
    若各所述第二统计特征值中的至少一个在所述取值范围内,则判断所述第二统计特征值与所述取值范围匹配,当前业务识别成功。
  10. 如权利要求7所述的业务识别方法,其中,所述业务识别方法还包括:
    若当前业务识别成功,则根据所述第五映射关系得到所述当前业务对应的关联IP地址;
    在预设时间段内,将所述关联IP地址对应的流量信息均识别为所述当前业务。
  11. 如权利要求3-10任一项所述的业务识别方法,其中,所述业务识别方法还包括:
    若当前业务识别成功,则放弃识别所述相关业务中未进行识别的业务。
  12. 如权利要求1-10任一项所述的业务识别方法,其中,所述第一流量统计特征包括以下至少之一:并发TCP连接数范围、或并发UDP连接数范围、或上行平均速率与下行平均速率的比率范围、或上行流量与下行流量的比率范围、或不同的网络侧端口数范围、或不同的域名数范围。
  13. 一种业务识别系统,其中,所述业务识别系统包括:
    采样模块,被设置为获取第一域名系统消息,并根据所述第一域名系统消息,得到目标业务的第一流量信息,并根据所述域名信息,得到目标业务的相关IP地址;其中,所述第一流量信息包括域名信息;
    训练模块,被设置为根据所述第一流量信息,得到所述目标业务与所述域名信息的第一映射关系与目标业务的第一流量统计特征;
    匹配模块,被设置为根据所述域名信息获取第二域名系统消息,根据所述第二域名系统消息得到第二流量信息,并根据所述第二流量信息,得到所述域名信息的相关IP地址,根据所述第一映射关系、所述相关IP地址与所述第一流量统计特征,对当前业务进行业务识别。
  14. 一种业务识别装置,包括存储器和处理器,所述存储器存储有程序,所述程序在被所述处理器读取执行时,实现如权利要求1至12任一所述的业务识别方法。
  15. 一种计算机可读存储介质,所述计算机可读存储介质存储有一个或者多个程序,所述一个或者多个程序可被一个或者多个处理器执行,以实现如权利要求1至12任一所述的业务识别方法。
  16. 一种计算机程序产品,包括计算机程序或计算机指令,所述计算机程序或所述计算机指令存储在计算机可读存储介质中,计算机设备的处理器从所述计算机可读存储介质读取所述计算机程序或所述计算机指令,所述处理器执行所述计算机程序或所述计算机指令,使得所述计算机设备执行如权利要求1至12任意一项所述的业务识别方法。
PCT/CN2023/093892 2022-06-28 2023-05-12 业务识别方法、系统、装置、存储介质及程序产品 WO2024001557A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210740126.7A CN117354182A (zh) 2022-06-28 2022-06-28 业务识别方法、系统、装置、存储介质及程序产品
CN202210740126.7 2022-06-28

Publications (1)

Publication Number Publication Date
WO2024001557A1 true WO2024001557A1 (zh) 2024-01-04

Family

ID=89361837

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/093892 WO2024001557A1 (zh) 2022-06-28 2023-05-12 业务识别方法、系统、装置、存储介质及程序产品

Country Status (2)

Country Link
CN (1) CN117354182A (zh)
WO (1) WO2024001557A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104660727A (zh) * 2015-02-10 2015-05-27 深圳市博瑞得科技有限公司 一种基于dns端的业务识别方法及其系统
CN106257867A (zh) * 2015-06-18 2016-12-28 中兴通讯股份有限公司 一种加密流量的业务识别方法和装置
CN113055420A (zh) * 2019-12-27 2021-06-29 中国移动通信集团陕西有限公司 Https业务识别方法、装置及计算设备
CN113395367A (zh) * 2020-03-13 2021-09-14 中国移动通信集团山东有限公司 Https业务识别方法、装置、存储介质及电子设备
US20210400011A1 (en) * 2016-06-23 2021-12-23 Cisco Technology, Inc. Utilizing service tagging for encrypted flow classification

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104660727A (zh) * 2015-02-10 2015-05-27 深圳市博瑞得科技有限公司 一种基于dns端的业务识别方法及其系统
CN106257867A (zh) * 2015-06-18 2016-12-28 中兴通讯股份有限公司 一种加密流量的业务识别方法和装置
US20210400011A1 (en) * 2016-06-23 2021-12-23 Cisco Technology, Inc. Utilizing service tagging for encrypted flow classification
CN113055420A (zh) * 2019-12-27 2021-06-29 中国移动通信集团陕西有限公司 Https业务识别方法、装置及计算设备
CN113395367A (zh) * 2020-03-13 2021-09-14 中国移动通信集团山东有限公司 Https业务识别方法、装置、存储介质及电子设备

Also Published As

Publication number Publication date
CN117354182A (zh) 2024-01-05

Similar Documents

Publication Publication Date Title
US11399010B1 (en) Private network request forwarding
US11706254B2 (en) Method and apparatus for identifying encrypted data stream
US10951495B2 (en) Application signature generation and distribution
US10805352B2 (en) Reducing latency in security enforcement by a network security system (NSS)
US11683401B2 (en) Correlating packets in communications networks
US9467461B2 (en) Countering security threats with the domain name system
US8904524B1 (en) Detection of fast flux networks
US10498618B2 (en) Attributing network address translation device processed traffic to individual hosts
US20130312054A1 (en) Transport Layer Security Traffic Control Using Service Name Identification
Rüth et al. Large-scale scanning of TCP's initial window
CA2947325A1 (en) Protocol type identification method and apparatus
IL280889A (en) A value injection system and viewing for wiretapping
US11552925B1 (en) Systems and methods of controlling internet access using encrypted DNS
US20190007327A1 (en) Automatic rule generation for flow management in software defined networking networks
US20170063716A1 (en) Notification of bandwidth consumption information to a service provider in a telecommunications network
US20170353486A1 (en) Method and System For Augmenting Network Traffic Flow Reports
WO2024001557A1 (zh) 业务识别方法、系统、装置、存储介质及程序产品
CN111431942B (zh) 一种cc攻击的检测方法、装置及网络设备
Maghsoudlou et al. FlowDNS: correlating Netflow and DNS streams at scale
CN106789864B (zh) 一种报文防攻击方法及装置
EP4262148A1 (en) Network security with server name indication
WO2016082626A1 (zh) 上网用户的检测方法及装置
Mayzaud et al. D5. 2—Second year deliverable on network and service monitoring

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23829743

Country of ref document: EP

Kind code of ref document: A1