US20220368669A1 - Filtering and organizing process for domain name system query collection - Google Patents
Filtering and organizing process for domain name system query collection Download PDFInfo
- Publication number
- US20220368669A1 US20220368669A1 US17/816,680 US202217816680A US2022368669A1 US 20220368669 A1 US20220368669 A1 US 20220368669A1 US 202217816680 A US202217816680 A US 202217816680A US 2022368669 A1 US2022368669 A1 US 2022368669A1
- Authority
- US
- United States
- Prior art keywords
- domain name
- name system
- address
- network
- class
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1001—Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
- H04L67/1036—Load balancing of requests to servers for services different from user content provisioning, e.g. load balancing across domain name servers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L61/00—Network arrangements, protocols or services for addressing or naming
- H04L61/45—Network directories; Name-to-address mapping
- H04L61/4505—Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
- H04L61/4511—Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L61/00—Network arrangements, protocols or services for addressing or naming
- H04L61/09—Mapping addresses
- H04L61/25—Mapping addresses of the same type
- H04L61/2503—Translation of Internet protocol [IP] addresses
- H04L61/251—Translation of Internet protocol [IP] addresses between different IP versions
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1001—Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
- H04L67/1038—Load balancing arrangements to avoid a single path through a load balancer
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L2101/00—Indexing scheme associated with group H04L61/00
- H04L2101/60—Types of network addresses
- H04L2101/618—Details of network addresses
- H04L2101/659—Internet protocol version 6 [IPv6] addresses
Definitions
- the present disclosure relates generally to communication networks, and more particularly to devices, non-transitory computer-readable media, and methods for filtering, distributing, and organizing domain name system queries to facilitate collection and data mining.
- DNS Domain Name System
- IP Internet Protocol
- DNS resolvers conventionally play a key role in fulfilling DNS queries by translating readily memorized URLs into less readily memorized IP addresses.
- queries submitted to DNS resolvers may contain a great deal of information about the Internet usage of Internet subscribers. This information, in turn, may help Internet service providers to improve service to their subscribers, e.g., by offering targeted services (such as advertisements) and/or by better understanding and engineering the Internet service provider networks.
- a method may include receiving a first domain name system query from a first endpoint device connected to a communications network, identifying a first network address of the first endpoint device from the first domain name system query, classifying the first domain name system query into a first class of a plurality of classes, wherein each class of the plurality of classes is associated with one predefined numerical range of a plurality of predefined numerical ranges, and wherein a target address unit of the first network address falls into the predefined numerical range associated with the first class, and forwarding the first domain name system query to a first collection server of a plurality of collection servers, wherein the first collection server is dedicated for collecting domain name system queries that are classified into the first class.
- a non-transitory computer-readable medium may store instructions which, when executed by a processing system in a communications network, cause the processing system to perform operations.
- the operations may include receiving a first domain name system query from a first endpoint device connected to a communications network, identifying a first network address of the first endpoint device from the first domain name system query, classifying the first domain name system query into a first class of a plurality of classes, wherein each class of the plurality of classes is associated with one predefined numerical range of a plurality of predefined numerical ranges, and wherein a target address unit of the first network address falls into the predefined numerical range associated with the first class, and forwarding the first domain name system query to a first collection server of a plurality of collection servers, wherein the first collection server is dedicated for collecting domain name system queries that are classified into the first class.
- a device may include a processing system including at least one processor and a non-transitory computer-readable medium storing instructions which, when executed by the processing system when deployed in a communications network, cause the processing system to perform operations.
- the operations may include receiving a first domain name system query from a first endpoint device connected to a communications network, identifying a first network address of the first endpoint device from the first domain name system query, classifying the first domain name system query into a first class of a plurality of classes, wherein each class of the plurality of classes is associated with one predefined numerical range of a plurality of predefined numerical ranges, and wherein a target address unit of the first network address falls into the predefined numerical range associated with the first class, and forwarding the first domain name system query to a first collection server of a plurality of collection servers, wherein the first collection server is dedicated for collecting domain name system queries that are classified into the first class.
- FIG. 1 illustrates an example network related to the present disclosure
- FIG. 2 illustrates a flowchart of an example method for filtering, distributing, and organizing domain name system queries, in accordance with the present disclosure
- FIG. 3 illustrates an example of a computing device, or computing system, specifically programmed to perform the steps, functions, blocks, and/or operations described herein.
- the present disclosure broadly discloses methods, computer-readable media, and devices for filtering, distributing, and organizing domain name system queries to facilitate collection and data mining.
- queries submitted to DNS resolvers may contain a great deal of information about the Internet usage of Internet subscribers. This information, in turn, may help Internet service providers to improve service to their subscribers. For instance, the information may be used to create new sources of revenue, to reduce the costs of providing service (e.g., through network design), and the like.
- processing this information is a challenge, particularly as the query traffic volume at the DNS servers increases. For instance, in some cases, the query traffic volume may exceed one million queries per second, and the rate of increase is only expected to grow year over year.
- the resources needed to capture useful data from such a volume of queries e.g., servers to receive and process the data, as well as additional resources to balance and distribute the load among the servers
- many current methods for distributing and balancing the incoming queries involve intrusive parsing of the captured queries, which consumes a large amount of processing power. The consumption of the processing power, in turn, may limit performance.
- Examples of the present disclosure distribute DNS records to servers or collectors for analysis in an efficient, coordinated manner based on the network addresses (e.g., IP address) of the records' sources.
- an incoming DNS query may be directed to a switch which is configured to identify a target address unit of the network address associated with the query's source.
- an “address unit” of an IP address is understood to refer to a grouping of bits in the IP address. For instance, in IP version 4 (IPv4), IP addresses are written in decimal form and comprises four octets. Each octet comprises eight bits and is separated from the next octet by a period.
- an octet may be considered an address unit.
- IP addresses are written in hexadecimal form and comprise eight hextets. Each hextet comprises sixteen bits and is separated from the next hextet by a colon. Thus, in an IPv6 address, a hextet may be considered an address unit. Examples of the present disclosure are equally applicable to IPv4 and IPv6 addresses; thus, any reference herein to an “address unit” is understood to encompass both an IPv4 octet and an IPv6 hextet.
- examples of the present disclosure could be implemented to operate on units of network addresses other than IP addresses and on units of IP addresses that are not IPv4 or IPv6 addresses.
- address unit is not meant to limit the nature of the addressing scheme.
- the query may be directed to a first collection server for further analysis. If, however, the value of the target address unit falls within a second predefined range, then the query may be directed to a second collection server for further analysis.
- Load balancing is therefore performed in a simple but efficient manner that speeds up the processing and forwarding of queries while consuming minimal processing power.
- the disclosed technique inherently organizes incoming DNS queries, which further reduces the processing that downstream applications might normally have to perform on the queries.
- FIG. 1 illustrates an example system 100 in which examples of the present disclosure for load balancing for domain name system query collection may operate.
- the system 100 may include any one or more types of communication networks, such as a traditional circuit switched network (e.g., a public switched telephone network (PSTN)) or a packet network such as an Internet Protocol (IP) network (e.g., an IP Multimedia Subsystem (IMS) network), an asynchronous transfer mode (ATM) network, a wired network, a wireless network, and/or a cellular network (e.g., 2G-5G, a long term evolution (LTE) network, and the like) related to the current disclosure.
- IP Internet Protocol
- IMS IP Multimedia Subsystem
- ATM asynchronous transfer mode
- wired network e.g., a wireless network
- LTE long term evolution
- cellular network e.g., 2G-5G, a long term evolution (LTE) network, and the like
- IP network is broadly defined as a network that uses Internet
- the system 100 may comprise a network 102 .
- the network 102 may be in communication with one or more access networks 120 and 122 , and with the Internet 160 .
- network 102 may combine core network components of a wired or cellular network with components of a triple play service network; where triple-play services include telephone services, Internet services and television services to subscribers.
- network 102 may functionally comprise a fixed mobile convergence (FMC) network, e.g., an IP Multimedia Subsystem (IMS) network.
- FMC fixed mobile convergence
- IMS IP Multimedia Subsystem
- network 102 may functionally comprise a telephony network, e.g., an Internet Protocol/Multi-Protocol Label Switching (IP/MPLS) backbone network utilizing Session Initiation Protocol (SIP) for circuit-switched and Voice over Internet Protocol (VoIP) telephony services.
- IP/MPLS Internet Protocol/Multi-Protocol Label Switching
- SIP Session Initiation Protocol
- VoIP Voice over Internet Protocol
- Network 102 may further comprise a broadcast television network, e.g., a traditional cable provider network or an Internet Protocol Television (IPTV) network, as well as an Internet Service Provider (ISP) network.
- IPTV Internet Protocol Television
- ISP Internet Service Provider
- network 102 may include a plurality of television (TV) servers (e.g., a broadcast server, a cable head-end), a plurality of content servers, an advertising server (AS), an interactive TV/video on demand (VoD) server, and so forth.
- TV television
- AS advertising server
- VoD interactive TV/video on demand
- network 102 may include a processing system 104 , a database (DB) 106 , a plurality of DNS resolvers 182 - 183 , a plurality of edge routers 190 - 191 , and a plurality of collection servers 192 - 193 .
- DB database
- the access networks 120 and 122 may comprise Digital Subscriber Line (DSL) networks, public switched telephone network (PSTN) access networks, broadband cable access networks, Local Area Networks (LANs), wireless access networks (e.g., an IEEE 802.11/Wi-Fi network and the like), cellular access networks, 3 rd party networks, and the like.
- DSL Digital Subscriber Line
- PSTN public switched telephone network
- LANs Local Area Networks
- wireless access networks e.g., an IEEE 802.11/Wi-Fi network and the like
- cellular access networks e.g., 3 rd party networks, and the like.
- the operator of network 102 may provide a cable television service, an IPTV service, or any other types of telecommunication services to subscribers via access networks 120 and 122 .
- the access networks 120 and 122 may comprise different types of access networks, may comprise the same type of access network, or some access networks may be the same type of access network and other may be different types of access networks.
- the network 102 may be operated by a telecommunication network service provider.
- the network 102 and the access networks 120 and 122 may be operated by different service providers, the same service provider or a combination thereof, or the access networks 120 and/or 122 may be operated by entities having core businesses that are not related to telecommunications services, e.g., corporate, governmental, or educational institution LANs, and the like.
- the access networks 120 may be in communication with one or more user endpoint (UE) devices 110 and 112 .
- access networks 122 may be in communication with one or more UE devices, e.g., UE device 114 .
- Access networks 120 and 122 may transmit and receive communications between UE devices 110 , 112 , and 114 , between UE devices 110 , 112 , and 114 , and servers 116 , servers 118 , DNS resolvers 182 - 183 , other components of network 102 , devices reachable via the Internet in general, and so forth.
- each of UE devices 110 , 112 , and 114 may comprise any single device or combination of devices that may comprise a user endpoint device.
- the UE devices 110 , 112 , and 114 may each comprise a mobile device, a cellular smart phone, a laptop, a tablet computer, a desktop computer, an application server, a bank or cluster of such devices, and the like.
- any of the UE devices 110 , 112 , and 114 may comprise sensor devices with wireless networking hardware, e.g., Internet of Things (IoT) devices, for gathering measurements of an environment, uploading the measurements to one or more servers or other devices, and so forth.
- IoT Internet of Things
- the access network 122 may also be in communication with one or more servers 116 .
- one or more servers 118 may be accessible to UE devices 110 , 112 , and 114 , to servers 116 , and so forth via Internet 160 in general.
- Each of the one or more servers 116 and one or more servers 118 may be associated with one or more IP addresses to enable communications with other devices via one or more networks.
- Each of the server(s) 116 and server(s) 118 may be associated with, for example, a merchant, a service business, a news source, a weather source, a school, a college or university, or other educational content providers, a social media site, a content distribution network, a cloud storage provider, a cloud computing application host, and so forth.
- each of server(s) 116 and server(s) 118 may comprise a computing system or server, such as computing system 300 depicted in FIG. 3 , and may be configured to provide one or more operations or functions in connection with examples of the present disclosure for filtering, distributing, and organizing domain name system queries, as described herein.
- network traffic records may relate to other types of network traffic, such as: server connection request messages at one or more servers of one or more domains, e.g., transmission control protocol (TCP) SYN/ACK messaging, Uniform Datagram Protocol (UDP) messaging, IP packets for streaming video, streaming audio, or general Internet traffic, and so forth.
- TCP transmission control protocol
- UDP Uniform Datagram Protocol
- IP packets for streaming video, streaming audio, or general Internet traffic e.g., IP packets for streaming video, streaming audio, or general Internet traffic, and so forth.
- network traffic data may be gathered and/or provided by server(s) 116 and/or server(s) 118 .
- server(s) 116 and/or server(s) 118 may maintain server logs and may provide the servers logs or log summaries periodically or by request, may transmit exception messages or error messages, and so forth (e.g., to processing system 104 ).
- UE device 110 may seek to obtain access to a webpage for a banking service, which may be hosted on one of the servers 118 , but which may be unknown to the UE device 110 and/or a user of the device 110 .
- a DNS query from the UE device 110 may comprise, for example, the domain name “examplebank.com” and may be submitted to DNS resolver 182 .
- DNS resolver 182 may provide the current IP address for device 110 to access examplebank.com if there is an associated record in a cache at DNS resolver 182 .
- DNS resolver 182 may maintain records for domains that have been recently queried (e.g., within the last 12 hours, the last 24 hours, etc.), may maintain records for certain designated domains (e.g., the most popular 10,000 and/or the 10 , 000 most queried domains over the last six months), and so forth. Otherwise, DNS resolver 182 may seek the IP address from one or more other DNS resolvers (e.g., DNS resolver 183 ) or from a DNS authoritative server.
- DNS resolver 182 may seek the IP address from one or more other DNS resolvers (e.g., DNS resolver 183 ) or from a DNS authoritative server.
- DNS architectures may include multiple layers (e.g., hierarchical layers) of DNS resolvers.
- DNS resolvers 182 - 183 may follow a recursive process for obtaining an IP address for a submitted query, by accessing other DNS resolvers and/or DNS authoritative servers.
- FIG. 1 illustrates a single layer of DNS resolvers including two DNS resolvers 182 - 183 is shown.
- any number of DNS resolvers and any number of layers of DNS resolvers may be deployed in the network 102 without departing from the scope of the present disclosure.
- processing system 104 may comprise one or more physical devices, e.g., one or more computing systems or servers, such as computing system 300 depicted in FIG. 3 , and may be configured to provide one or more operations for load balancing for domain name system query collection, as described herein.
- the terms “configure,” and “reconfigure” may refer to programming or loading a processing system with computer-readable/computer-executable instructions, code, and/or programs, e.g., in a distributed or non-distributed memory, which when executed by a processor, or processors, of the processing system within a same device or within distributed devices, may cause the processing system to perform various functions.
- processing system may comprise a computing device including one or more processors, or cores (e.g., as illustrated in FIG. 3 and discussed below) or multiple computing devices collectively configured to perform various steps, functions, and/or operations in accordance with the present disclosure.
- database (DB) 106 may comprise a physical storage device integrated with processing system 104 (e.g., a database server), or attached or coupled to the processing system 104 , to store various types of information in support of systems for load balancing for domain name system query collection, in accordance with the present disclosure.
- DB 106 may store network traffic data, or other records from which network traffic data may be derived, may store mappings or tables that indicate ranges or groupings of DNS queries that are collected by the individual collection servers, such as collection servers 192 and 193 , and so forth.
- processing system 104 may load instructions into a memory, or one or more distributed memory units, and execute the instructions for load balancing for domain name system query collection, as described herein. An example method for load balancing for domain name system query collection is described in greater detail below in connection with FIG. 2 .
- processing system 104 and collection servers 192 and 193 may operate in a distributed and/or coordinated manner to perform various steps, functions, and/or operations described herein.
- processing system 104 may obtain incoming DNS queries (e.g., from edge routers 190 and 191 , which may be configured to port mirror the incoming DNS queries), identify network addresses (e.g., IP addresses) of the sources of the incoming DNS queries, classify the incoming DNS queries according to target portions of the network addresses from which the incoming DNS queries come, and forward the incoming DNS queries to the appropriate collection servers (e.g., collection servers 192 and 193 ) based on the classifying.
- incoming DNS queries e.g., from edge routers 190 and 191 , which may be configured to port mirror the incoming DNS queries
- identify network addresses e.g., IP addresses
- the incoming DNS queries may originate with sources including the UE devices 110 , 112 , and/or 114 , or server(s) 116 .
- the processing system 104 may comprise a portion of a front end switch, a load balancer, or a collection server (e.g., a collection server that is independent or separate from the collection servers 192 - 193 ).
- the collection servers 192 - 2913 may comprise short term storage that retains the DNS queries until the DNS queries can be stored in appropriate Data Lakes (e.g., repositories of DNS queries that may be mined for data).
- each collection server 192 or 193 may correspond to one Data Lake.
- FIG. 2 illustrates two collection servers 192 - 193 , any number of collection servers that is a power of two (e.g., two, four, eight, sixteen, thirty-two, etc.) may be deployed in the network 102 .
- various techniques may be employed to provide load balancing among the collection servers and to organize incoming DNS queries. Several of these techniques are discussed in greater detail in connection with FIG. 2 .
- system 100 has been simplified. Thus, those skilled in the art will realize that the system 100 may be implemented in a different form than that which is illustrated in FIG. 1 , or may be expanded by including additional endpoint devices, access networks, network elements, application servers, etc. without altering the scope of the present disclosure.
- system 100 may be altered to omit various elements, substitute elements for devices that perform the same or similar functions, combine elements that are illustrated as separate devices, and/or implement network elements as functions that are spread across several devices that operate collectively as the respective network elements.
- the system 100 may include other network elements (not shown) such as border elements, routers, switches, policy servers, security devices, gateways, a content distribution network (CDN) and the like.
- CDN content distribution network
- portions of network 102 , access networks 120 and 122 , and/or Internet 160 may comprise a content distribution network (CDN) having ingest servers, edge servers, and the like for packet-based streaming of video, audio, or other content.
- CDN content distribution network
- access networks 120 and/or 122 may each comprise a plurality of different access networks that may interface with network 102 independently or in a chained manner.
- device 114 and server(s) 116 may communicate with network 102 via different access networks
- UE devices 110 and 112 may communicate with network 102 via different access networks, and so forth.
- one or more of DNS resolvers 182 - 183 may be deployed external to network 102 (e.g., a public DNS resolver), or the system 100 may include one or more additional DNS resolvers external to network 102 .
- network 102 e.g., a public DNS resolver
- the system 100 may include one or more additional DNS resolvers external to network 102 .
- FIG. 2 illustrates a flowchart of an example method 200 for filtering, distributing, and organizing domain name system queries, in accordance with the present disclosure.
- steps, functions and/or operations of the method 200 may be performed by a device as illustrated in FIG. 1 , e.g., processing system 104 , collection servers 192 - 193 , or any one or more components thereof, or collectively via a plurality devices in FIG. 1 , such as processing system 104 and collection servers 192 - 193 , and so forth.
- the steps, functions, or operations of method 200 may be performed by a computing device or system 300 , and/or a processing system 302 as described in connection with FIG. 3 below.
- the computing device 300 may represent at least a portion of processing system 104 and/or collection servers 192 - 193 in accordance with the present disclosure.
- the method 200 is described in greater detail below in connection with an example performed by a processing system, such as processing system 302 .
- the method 200 begins in step 202 and proceeds to step 204 .
- the processing system may receive a DNS query from an endpoint device.
- the DNS query may be forwarded to the processing system by an edge router, which may port mirror incoming DNS queries before simultaneously sending the queries on to DNS resolvers and to the processing system.
- the DNS query may be duplicated in another way (e.g., by a tap, redirection, or other methods) before being forwarded to the processing system.
- the DNS query may comprise, for example a domain name associated with a webpage that the endpoint device (or a user of the endpoint device) is trying to access (e.g., examplebank.com). While a DNS resolver attempts to provide the endpoint device with the webpage's IP address, the processing system may perform further processing on the DNS query in order to facilitate future data mining operations.
- the processing system may identify the network address of the endpoint device from the DNS query.
- the header of a data packet containing the DNS query as a payload may include a source IP address, which indicates the IP address of the endpoint device from which the DNS query originated.
- the processing system may isolate a target unit of the network address of the endpoint device.
- the target unit of the network address may depend on the classification scheme that is being used to organize incoming DNS queries.
- the target unit of the network address is the last address unit of the IP address. For instance, if the IP address is the IPv4 address of 123.45.67.89, then the last address unit (octet) would be 89. In other examples, however, different address units (e.g., the second to last or third to last address unit, etc.) of the IP address could serve as the target unit of the network address.
- the processing system may classify the DNS query based on the target unit.
- the number of potential classes may correspond to the number of collection servers in the network. For instance, if the network includes two collection servers (as illustrated in FIG. 1 ), then there may be two potential classes into which the DNS query may be classified. However, as discussed above, in one example, any number of collection servers that is a power of two may be deployed in the network. Thus, if there are four collection servers, there may be four classes into which the DNS query may be classified; if there are eight collection servers, there may be eight classes; and so on. In other examples, however, the number of servers may be a number that is not a power of two.
- each class of the plurality of classes is associated with a predefined numerical range.
- the DNS query may be sorted into the class whose predefined numerical range encompasses the target unit of the source's network address. For instance, if there are two classes into which the DNS query may be classified, the first class may include DNS queries where the last octet of the source IPv4 address is anywhere in the range of one to 255, and the second class may include DNS queries where the last octet of the source IPv4 address is greater than 255. In this case, if the last octet of the source IP address is 89, then the DNS query may be classified in the first class (i.e., 1 ⁇ 89 ⁇ 255).
- Different numerical ranges may be used for the classification of IPv4 addresses, as well as for the classification of IPv6 addresses which use a different addressing scheme (e.g., in the case of IPv6, different ranges of hexadecimal values may be associated with different classes).
- the processing system may forward the DNS query to a first collection server of a plurality of collection servers.
- the network may include a plurality of (i.e., at least two, and potentially any power of two) collection servers for temporarily storing DNS queries.
- Each collection server of the plurality of collection servers may be associated with a different class of DNS queries.
- each class may include DNS queries where the last address unit of the source IP address falls within a different predefined numerical range.
- the first collection server may be the collection server that is associated with the class into which the DNS query is classified in step 210 (e.g., a collection server associated with DNS queries where the last octet of the source IPv4 address is anywhere in the range of one to 255).
- the method 200 may end in step 214 . However, the method 200 may be repeated for each incoming DNS query that is received by the processing system.
- the method 200 therefore supports the large volumes and high bandwidth that have become typical when mining DNS queries for data, while minimizing the resource and computational costs of balancing and distributing the DNS queries among collection servers. For instance, while conventional techniques may parse the DNS queries to facilitate balancing and distribution of the queries among collection servers, the method and system disclosed herein accomplish the same quickly and efficiently by using the source addresses (e.g., network addresses) of the queries to direct the queries to the appropriate collection servers. Moreover, the DNS queries are effectively organized by the simple classification scheme, which minimizes the computation that downstream applications may have to perform when processing the DNS queries. In further examples, criteria other than source address may be used to sort or classify the DNS queries.
- source addresses e.g., network addresses
- criteria other than source address may be used to sort or classify the DNS queries.
- the method 200 may be expanded to include additional steps or may be modified to include additional operations with respect to the steps outlined above.
- one or more steps, functions, or operations of the method 200 may include a storing, displaying, and/or outputting step as required for a particular application.
- any data, records, fields, and/or intermediate results discussed in the method can be stored, displayed, and/or outputted either on the device executing the method or to another device, as required for a particular application.
- steps, blocks, functions or operations in FIG. 2 that recite a determining operation or involve a decision do not necessarily require that both branches of the determining operation be practiced.
- one of the branches of the determining operation can be deemed as an optional step.
- steps, blocks, functions or operations of the above described method can be combined, separated, and/or performed in a different order from that described above, without departing from the examples of the present disclosure.
- FIG. 3 depicts a high-level block diagram of a computing device or processing system specifically programmed to perform the functions described herein.
- the processing system 300 comprises one or more hardware processor elements 302 (e.g., a central processing unit (CPU), a microprocessor, or a multi-core processor), a memory 304 (e.g., random access memory (RAM) and/or read only memory (ROM)), a module 305 for filtering, distributing, and organizing domain name system queries, and various input/output devices 306 (e.g., storage devices, including but not limited to, a tape drive, a floppy drive, a hard disk drive or a compact disk drive, a receiver, a transmitter, a speaker, a display, a speech synthesizer, an output port, an input port and a user input device (such as a keyboard, a keypad, a mouse, a microphone and the like)).
- hardware processor elements 302 e.g., a central processing unit (CPU), a
- input/output devices 306 may also include antenna elements, antenna arrays, remote radio heads (RRHs), baseband units (BBUs), transceivers, power units, and so forth.
- RRHs remote radio heads
- BBUs baseband units
- transceivers power units, and so forth.
- processor elements only one processor element is shown, it should be noted that the computing device may employ a plurality of processor elements.
- the computing device may employ a plurality of processor elements.
- the computing device of this figure is intended to represent each of those multiple computing devices.
- one or more hardware processors can be utilized in supporting a virtualized or shared computing environment.
- the virtualized computing environment may support one or more virtual machines representing computers, servers, or other computing devices.
- hardware components such as hardware processors and computer-readable storage devices may be virtualized or logically represented.
- the hardware processor 302 can also be configured or programmed to cause other devices to perform one or more operations as discussed above. In other words, the hardware processor 302 may serve the function of a central controller directing other devices to perform the one or more operations as discussed above.
- the present disclosure can be implemented in software and/or in a combination of software and hardware, e.g., using application specific integrated circuits (ASIC), a programmable gate array (PGA) including a Field PGA, or a state machine deployed on a hardware device, a computing device or any other hardware equivalents, e.g., computer readable instructions pertaining to the method discussed above can be used to configure a hardware processor to perform the steps, functions and/or operations of the above disclosed method 200 .
- ASIC application specific integrated circuits
- PGA programmable gate array
- Field PGA programmable gate array
- a state machine deployed on a hardware device e.g., a hardware device or any other hardware equivalents, e.g., computer readable instructions pertaining to the method discussed above can be used to configure a hardware processor to perform the steps, functions and/or operations of the above disclosed method 200 .
- instructions and data for the present module or process 305 for filtering, distributing, and organizing domain name system queries can be loaded into memory 304 and executed by hardware processor element 302 to implement the steps, functions, or operations as discussed above in connection with the illustrative method 200 .
- a hardware processor executes instructions to perform “operations,” this could include the hardware processor performing the operations directly and/or facilitating, directing, or cooperating with another hardware device or component (e.g., a co-processor and the like) to perform the operations.
- the processor executing the computer readable or software instructions relating to the above described method can be perceived as a programmed processor or a specialized processor.
- the present module 305 for filtering, distributing, and organizing domain name system queries (including associated data structures) of the present disclosure can be stored on a tangible or physical (broadly non-transitory) computer-readable storage device or medium, e.g., volatile memory, non-volatile memory, ROM memory, RAM memory, magnetic or optical drive, device or diskette, and the like.
- a “tangible” computer-readable storage device or medium comprises a physical device, a hardware device, or a device that is discernible by the touch. More specifically, the computer-readable storage device may comprise any physical devices that provide the ability to store information such as data and/or instructions to be accessed by a processor or a computing device such as a computer or an application server.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
A method for filtering, distributing, and organizing domain name system queries in a communications network may include receiving a first domain name system query from a first endpoint device connected to the network, identifying a first network address of the first endpoint device from the first domain name system query, classifying the first domain name system query into a first class of a plurality of classes, wherein each class of the plurality of classes is associated with one predefined numerical range of a plurality of predefined numerical ranges, and wherein a target address unit of the first network address falls into the predefined numerical range associated with the first class, and forwarding the first domain name system query to a first collection server of a plurality of collection servers, wherein the first collection server is dedicated for collecting domain name system queries that are classified into the first class.
Description
- This application is a continuation of U.S. patent application Ser. No. 17/063,703, filed Oct. 5, 2020, now U.S. Pat. No. 11,405,354, which is a continuation of U.S. patent application Ser. No. 16/420,817, filed May 23, 2019, now U.S. Pat. No. 10,798,051, both of which are herein incorporated by reference in their entirety.
- The present disclosure relates generally to communication networks, and more particularly to devices, non-transitory computer-readable media, and methods for filtering, distributing, and organizing domain name system queries to facilitate collection and data mining.
- The Domain Name System (DNS) is one of the core building blocks of modern Internet infrastructure. For a given website, a record associating the website's uniform resource locator (URL) with one or more Internet Protocol (IP) addresses is maintained at a specific DNS authoritative server, or a DNS resolver. Thus, DNS resolvers conventionally play a key role in fulfilling DNS queries by translating readily memorized URLs into less readily memorized IP addresses. Moreover, queries submitted to DNS resolvers may contain a great deal of information about the Internet usage of Internet subscribers. This information, in turn, may help Internet service providers to improve service to their subscribers, e.g., by offering targeted services (such as advertisements) and/or by better understanding and engineering the Internet service provider networks.
- In one example, the present disclosure discloses a device, computer-readable medium, and method for filtering, distributing and organizing domain name system queries to facilitate collection and data mining. For example, a method may include receiving a first domain name system query from a first endpoint device connected to a communications network, identifying a first network address of the first endpoint device from the first domain name system query, classifying the first domain name system query into a first class of a plurality of classes, wherein each class of the plurality of classes is associated with one predefined numerical range of a plurality of predefined numerical ranges, and wherein a target address unit of the first network address falls into the predefined numerical range associated with the first class, and forwarding the first domain name system query to a first collection server of a plurality of collection servers, wherein the first collection server is dedicated for collecting domain name system queries that are classified into the first class.
- In another example, a non-transitory computer-readable medium may store instructions which, when executed by a processing system in a communications network, cause the processing system to perform operations. The operations may include receiving a first domain name system query from a first endpoint device connected to a communications network, identifying a first network address of the first endpoint device from the first domain name system query, classifying the first domain name system query into a first class of a plurality of classes, wherein each class of the plurality of classes is associated with one predefined numerical range of a plurality of predefined numerical ranges, and wherein a target address unit of the first network address falls into the predefined numerical range associated with the first class, and forwarding the first domain name system query to a first collection server of a plurality of collection servers, wherein the first collection server is dedicated for collecting domain name system queries that are classified into the first class.
- In another example, a device may include a processing system including at least one processor and a non-transitory computer-readable medium storing instructions which, when executed by the processing system when deployed in a communications network, cause the processing system to perform operations. The operations may include receiving a first domain name system query from a first endpoint device connected to a communications network, identifying a first network address of the first endpoint device from the first domain name system query, classifying the first domain name system query into a first class of a plurality of classes, wherein each class of the plurality of classes is associated with one predefined numerical range of a plurality of predefined numerical ranges, and wherein a target address unit of the first network address falls into the predefined numerical range associated with the first class, and forwarding the first domain name system query to a first collection server of a plurality of collection servers, wherein the first collection server is dedicated for collecting domain name system queries that are classified into the first class.
- The teachings of the present disclosure can be readily understood by considering the following detailed description in conjunction with the accompanying drawings, in which:
-
FIG. 1 illustrates an example network related to the present disclosure; -
FIG. 2 illustrates a flowchart of an example method for filtering, distributing, and organizing domain name system queries, in accordance with the present disclosure; and -
FIG. 3 illustrates an example of a computing device, or computing system, specifically programmed to perform the steps, functions, blocks, and/or operations described herein. - To facilitate understanding, similar reference numerals have been used, where possible, to designate elements that are common to the figures.
- The present disclosure broadly discloses methods, computer-readable media, and devices for filtering, distributing, and organizing domain name system queries to facilitate collection and data mining. As discussed above, queries submitted to DNS resolvers may contain a great deal of information about the Internet usage of Internet subscribers. This information, in turn, may help Internet service providers to improve service to their subscribers. For instance, the information may be used to create new sources of revenue, to reduce the costs of providing service (e.g., through network design), and the like.
- However, processing this information is a challenge, particularly as the query traffic volume at the DNS servers increases. For instance, in some cases, the query traffic volume may exceed one million queries per second, and the rate of increase is only expected to grow year over year. The resources needed to capture useful data from such a volume of queries (e.g., servers to receive and process the data, as well as additional resources to balance and distribute the load among the servers) tend to be very complicated and expensive. As an example, many current methods for distributing and balancing the incoming queries involve intrusive parsing of the captured queries, which consumes a large amount of processing power. The consumption of the processing power, in turn, may limit performance.
- Examples of the present disclosure distribute DNS records to servers or collectors for analysis in an efficient, coordinated manner based on the network addresses (e.g., IP address) of the records' sources. In one particular example, an incoming DNS query may be directed to a switch which is configured to identify a target address unit of the network address associated with the query's source. Within the context of the present disclosure, an “address unit” of an IP address is understood to refer to a grouping of bits in the IP address. For instance, in IP version 4 (IPv4), IP addresses are written in decimal form and comprises four octets. Each octet comprises eight bits and is separated from the next octet by a period. Thus, in an IPv4 address, an octet may be considered an address unit. However, in IPv6, IP addresses are written in hexadecimal form and comprise eight hextets. Each hextet comprises sixteen bits and is separated from the next hextet by a colon. Thus, in an IPv6 address, a hextet may be considered an address unit. Examples of the present disclosure are equally applicable to IPv4 and IPv6 addresses; thus, any reference herein to an “address unit” is understood to encompass both an IPv4 octet and an IPv6 hextet. However, examples of the present disclosure could be implemented to operate on units of network addresses other than IP addresses and on units of IP addresses that are not IPv4 or IPv6 addresses. Thus, use of the term “address unit” is not meant to limit the nature of the addressing scheme.
- In one example, if the value of the target address unit falls within a first predefined range, then the query may be directed to a first collection server for further analysis. If, however, the value of the target address unit falls within a second predefined range, then the query may be directed to a second collection server for further analysis. Load balancing is therefore performed in a simple but efficient manner that speeds up the processing and forwarding of queries while consuming minimal processing power. Moreover, the disclosed technique inherently organizes incoming DNS queries, which further reduces the processing that downstream applications might normally have to perform on the queries.
- Although examples of the disclosure are described within the context of DNS queries, it will be appreciated that the methods, computer-readable media, and devices described herein could be applied to a much broader range of Internet subscriber data. Moreover, the examples of the present disclosure are not limited to Internet Protocol, but could be used to process subscriber data using other, non-IP protocols. These and other aspects of the present disclosure are discussed in greater detail below in connection with the examples of
FIGS. 1-3 . - To further aid in understanding the present disclosure,
FIG. 1 illustrates anexample system 100 in which examples of the present disclosure for load balancing for domain name system query collection may operate. Thesystem 100 may include any one or more types of communication networks, such as a traditional circuit switched network (e.g., a public switched telephone network (PSTN)) or a packet network such as an Internet Protocol (IP) network (e.g., an IP Multimedia Subsystem (IMS) network), an asynchronous transfer mode (ATM) network, a wired network, a wireless network, and/or a cellular network (e.g., 2G-5G, a long term evolution (LTE) network, and the like) related to the current disclosure. It should be noted that an IP network is broadly defined as a network that uses Internet Protocol to exchange data packets. Additional example IP networks include Voice over IP (VoIP) networks, Service over IP (SoIP) networks, and the like. - In one example, the
system 100 may comprise anetwork 102. Thenetwork 102 may be in communication with one ormore access networks network 102 may combine core network components of a wired or cellular network with components of a triple play service network; where triple-play services include telephone services, Internet services and television services to subscribers. For example,network 102 may functionally comprise a fixed mobile convergence (FMC) network, e.g., an IP Multimedia Subsystem (IMS) network. In addition,network 102 may functionally comprise a telephony network, e.g., an Internet Protocol/Multi-Protocol Label Switching (IP/MPLS) backbone network utilizing Session Initiation Protocol (SIP) for circuit-switched and Voice over Internet Protocol (VoIP) telephony services. Network 102 may further comprise a broadcast television network, e.g., a traditional cable provider network or an Internet Protocol Television (IPTV) network, as well as an Internet Service Provider (ISP) network. In one example,network 102 may include a plurality of television (TV) servers (e.g., a broadcast server, a cable head-end), a plurality of content servers, an advertising server (AS), an interactive TV/video on demand (VoD) server, and so forth. As further illustrated inFIG. 1 ,network 102 may include aprocessing system 104, a database (DB) 106, a plurality of DNS resolvers 182-183, a plurality of edge routers 190-191, and a plurality of collection servers 192-193. For ease of illustration, various additional elements ofnetwork 102 are omitted fromFIG. 1 . - In one example, the
access networks network 102 may provide a cable television service, an IPTV service, or any other types of telecommunication services to subscribers viaaccess networks access networks network 102 may be operated by a telecommunication network service provider. Thenetwork 102 and theaccess networks access networks 120 and/or 122 may be operated by entities having core businesses that are not related to telecommunications services, e.g., corporate, governmental, or educational institution LANs, and the like. - In one example, the
access networks 120 may be in communication with one or more user endpoint (UE)devices access networks 122 may be in communication with one or more UE devices, e.g.,UE device 114.Access networks UE devices UE devices servers 116,servers 118, DNS resolvers 182-183, other components ofnetwork 102, devices reachable via the Internet in general, and so forth. In one example, each ofUE devices UE devices UE devices - In one example, the
access network 122 may also be in communication with one ormore servers 116. Similarly, one ormore servers 118 may be accessible toUE devices servers 116, and so forth viaInternet 160 in general. Each of the one ormore servers 116 and one ormore servers 118 may be associated with one or more IP addresses to enable communications with other devices via one or more networks. Each of the server(s) 116 and server(s) 118 may be associated with, for example, a merchant, a service business, a news source, a weather source, a school, a college or university, or other educational content providers, a social media site, a content distribution network, a cloud storage provider, a cloud computing application host, and so forth. - In accordance with the present disclosure, each of server(s) 116 and server(s) 118 may comprise a computing system or server, such as
computing system 300 depicted inFIG. 3 , and may be configured to provide one or more operations or functions in connection with examples of the present disclosure for filtering, distributing, and organizing domain name system queries, as described herein. For instance, although examples of the present disclosure are described primarily in connection with DNS traffic records, in other, further, and different examples, network traffic records may relate to other types of network traffic, such as: server connection request messages at one or more servers of one or more domains, e.g., transmission control protocol (TCP) SYN/ACK messaging, Uniform Datagram Protocol (UDP) messaging, IP packets for streaming video, streaming audio, or general Internet traffic, and so forth. Accordingly, in one example, network traffic data may be gathered and/or provided by server(s) 116 and/or server(s) 118. For instance, server(s) 116 and/or server(s) 118 may maintain server logs and may provide the servers logs or log summaries periodically or by request, may transmit exception messages or error messages, and so forth (e.g., to processing system 104). - In an illustrative example,
UE device 110 may seek to obtain access to a webpage for a banking service, which may be hosted on one of theservers 118, but which may be unknown to theUE device 110 and/or a user of thedevice 110. To access the webpage, a DNS query from theUE device 110 may comprise, for example, the domain name “examplebank.com” and may be submitted toDNS resolver 182.DNS resolver 182 may provide the current IP address fordevice 110 to access examplebank.com if there is an associated record in a cache atDNS resolver 182. For instance,DNS resolver 182 may maintain records for domains that have been recently queried (e.g., within the last 12 hours, the last 24 hours, etc.), may maintain records for certain designated domains (e.g., the most popular 10,000 and/or the 10,000 most queried domains over the last six months), and so forth. Otherwise,DNS resolver 182 may seek the IP address from one or more other DNS resolvers (e.g., DNS resolver 183) or from a DNS authoritative server. - It should be noted that DNS architectures may include multiple layers (e.g., hierarchical layers) of DNS resolvers. In one example, DNS resolvers 182-183 may follow a recursive process for obtaining an IP address for a submitted query, by accessing other DNS resolvers and/or DNS authoritative servers. For ease of illustration,
FIG. 1 illustrates a single layer of DNS resolvers including two DNS resolvers 182-183 is shown. However, any number of DNS resolvers and any number of layers of DNS resolvers may be deployed in thenetwork 102 without departing from the scope of the present disclosure. - In accordance with the present disclosure,
processing system 104 may comprise one or more physical devices, e.g., one or more computing systems or servers, such ascomputing system 300 depicted inFIG. 3 , and may be configured to provide one or more operations for load balancing for domain name system query collection, as described herein. It should be noted that as used herein, the terms “configure,” and “reconfigure” may refer to programming or loading a processing system with computer-readable/computer-executable instructions, code, and/or programs, e.g., in a distributed or non-distributed memory, which when executed by a processor, or processors, of the processing system within a same device or within distributed devices, may cause the processing system to perform various functions. Such terms may also encompass providing variables, data values, tables, objects, or other data structures or the like which may cause a processing system executing computer-readable instructions, code, and/or programs to function differently depending upon the values of the variables or other data structures that are provided. As referred to herein a “processing system” may comprise a computing device including one or more processors, or cores (e.g., as illustrated inFIG. 3 and discussed below) or multiple computing devices collectively configured to perform various steps, functions, and/or operations in accordance with the present disclosure. - In one example, database (DB) 106 may comprise a physical storage device integrated with processing system 104 (e.g., a database server), or attached or coupled to the
processing system 104, to store various types of information in support of systems for load balancing for domain name system query collection, in accordance with the present disclosure. For example,DB 106 may store network traffic data, or other records from which network traffic data may be derived, may store mappings or tables that indicate ranges or groupings of DNS queries that are collected by the individual collection servers, such ascollection servers processing system 104 may load instructions into a memory, or one or more distributed memory units, and execute the instructions for load balancing for domain name system query collection, as described herein. An example method for load balancing for domain name system query collection is described in greater detail below in connection withFIG. 2 . - In one example,
processing system 104 andcollection servers processing system 104 may obtain incoming DNS queries (e.g., fromedge routers 190 and 191, which may be configured to port mirror the incoming DNS queries), identify network addresses (e.g., IP addresses) of the sources of the incoming DNS queries, classify the incoming DNS queries according to target portions of the network addresses from which the incoming DNS queries come, and forward the incoming DNS queries to the appropriate collection servers (e.g.,collection servers 192 and 193) based on the classifying. The incoming DNS queries may originate with sources including theUE devices processing system 104 may comprise a portion of a front end switch, a load balancer, or a collection server (e.g., a collection server that is independent or separate from the collection servers 192-193). - The collection servers 192-2913 may comprise short term storage that retains the DNS queries until the DNS queries can be stored in appropriate Data Lakes (e.g., repositories of DNS queries that may be mined for data). In one example, each
collection server FIG. 2 illustrates two collection servers 192-193, any number of collection servers that is a power of two (e.g., two, four, eight, sixteen, thirty-two, etc.) may be deployed in thenetwork 102. Moreover, it should be noted that various techniques may be employed to provide load balancing among the collection servers and to organize incoming DNS queries. Several of these techniques are discussed in greater detail in connection withFIG. 2 . - It should be noted that the
system 100 has been simplified. Thus, those skilled in the art will realize that thesystem 100 may be implemented in a different form than that which is illustrated inFIG. 1 , or may be expanded by including additional endpoint devices, access networks, network elements, application servers, etc. without altering the scope of the present disclosure. In addition,system 100 may be altered to omit various elements, substitute elements for devices that perform the same or similar functions, combine elements that are illustrated as separate devices, and/or implement network elements as functions that are spread across several devices that operate collectively as the respective network elements. For example, thesystem 100 may include other network elements (not shown) such as border elements, routers, switches, policy servers, security devices, gateways, a content distribution network (CDN) and the like. For example, portions ofnetwork 102,access networks Internet 160 may comprise a content distribution network (CDN) having ingest servers, edge servers, and the like for packet-based streaming of video, audio, or other content. Similarly, although only two access networks, 120 and 122 are shown, in other examples,access networks 120 and/or 122 may each comprise a plurality of different access networks that may interface withnetwork 102 independently or in a chained manner. For example,device 114 and server(s) 116 may communicate withnetwork 102 via different access networks,UE devices network 102 via different access networks, and so forth. In still another example, one or more of DNS resolvers 182-183 may be deployed external to network 102 (e.g., a public DNS resolver), or thesystem 100 may include one or more additional DNS resolvers external tonetwork 102. Thus, these and other modifications are all contemplated within the scope of the present disclosure. -
FIG. 2 illustrates a flowchart of anexample method 200 for filtering, distributing, and organizing domain name system queries, in accordance with the present disclosure. In one example, steps, functions and/or operations of themethod 200 may be performed by a device as illustrated inFIG. 1 , e.g.,processing system 104, collection servers 192-193, or any one or more components thereof, or collectively via a plurality devices inFIG. 1 , such asprocessing system 104 and collection servers 192-193, and so forth. In one example, the steps, functions, or operations ofmethod 200 may be performed by a computing device orsystem 300, and/or aprocessing system 302 as described in connection withFIG. 3 below. For instance, thecomputing device 300 may represent at least a portion ofprocessing system 104 and/or collection servers 192-193 in accordance with the present disclosure. For illustrative purposes, themethod 200 is described in greater detail below in connection with an example performed by a processing system, such asprocessing system 302. Themethod 200 begins instep 202 and proceeds to step 204. - At
step 204, the processing system (of a domain name system) may receive a DNS query from an endpoint device. The DNS query may be forwarded to the processing system by an edge router, which may port mirror incoming DNS queries before simultaneously sending the queries on to DNS resolvers and to the processing system. In other examples, the DNS query may be duplicated in another way (e.g., by a tap, redirection, or other methods) before being forwarded to the processing system. The DNS query may comprise, for example a domain name associated with a webpage that the endpoint device (or a user of the endpoint device) is trying to access (e.g., examplebank.com). While a DNS resolver attempts to provide the endpoint device with the webpage's IP address, the processing system may perform further processing on the DNS query in order to facilitate future data mining operations. - In
step 206, the processing system may identify the network address of the endpoint device from the DNS query. For instance, the header of a data packet containing the DNS query as a payload may include a source IP address, which indicates the IP address of the endpoint device from which the DNS query originated. - In
step 208, the processing system may isolate a target unit of the network address of the endpoint device. As discussed in further detail below, the target unit of the network address may depend on the classification scheme that is being used to organize incoming DNS queries. In one example, where the network address is an IP address, the target unit of the network address is the last address unit of the IP address. For instance, if the IP address is the IPv4 address of 123.45.67.89, then the last address unit (octet) would be 89. In other examples, however, different address units (e.g., the second to last or third to last address unit, etc.) of the IP address could serve as the target unit of the network address. - In
step 210, the processing system may classify the DNS query based on the target unit. In one example, there are a plurality of potential classes into which the DNS query may be classified. The number of potential classes may correspond to the number of collection servers in the network. For instance, if the network includes two collection servers (as illustrated inFIG. 1 ), then there may be two potential classes into which the DNS query may be classified. However, as discussed above, in one example, any number of collection servers that is a power of two may be deployed in the network. Thus, if there are four collection servers, there may be four classes into which the DNS query may be classified; if there are eight collection servers, there may be eight classes; and so on. In other examples, however, the number of servers may be a number that is not a power of two. - In one example, each class of the plurality of classes is associated with a predefined numerical range. In this case, the DNS query may be sorted into the class whose predefined numerical range encompasses the target unit of the source's network address. For instance, if there are two classes into which the DNS query may be classified, the first class may include DNS queries where the last octet of the source IPv4 address is anywhere in the range of one to 255, and the second class may include DNS queries where the last octet of the source IPv4 address is greater than 255. In this case, if the last octet of the source IP address is 89, then the DNS query may be classified in the first class (i.e., 1<89<255). Different numerical ranges may be used for the classification of IPv4 addresses, as well as for the classification of IPv6 addresses which use a different addressing scheme (e.g., in the case of IPv6, different ranges of hexadecimal values may be associated with different classes).
- In
step 212, the processing system may forward the DNS query to a first collection server of a plurality of collection servers. As discussed above, the network may include a plurality of (i.e., at least two, and potentially any power of two) collection servers for temporarily storing DNS queries. Each collection server of the plurality of collection servers may be associated with a different class of DNS queries. As also discussed above, each class may include DNS queries where the last address unit of the source IP address falls within a different predefined numerical range. Thus, instep 212, the first collection server may be the collection server that is associated with the class into which the DNS query is classified in step 210 (e.g., a collection server associated with DNS queries where the last octet of the source IPv4 address is anywhere in the range of one to 255). - The
method 200 may end instep 214. However, themethod 200 may be repeated for each incoming DNS query that is received by the processing system. - The
method 200 therefore supports the large volumes and high bandwidth that have become typical when mining DNS queries for data, while minimizing the resource and computational costs of balancing and distributing the DNS queries among collection servers. For instance, while conventional techniques may parse the DNS queries to facilitate balancing and distribution of the queries among collection servers, the method and system disclosed herein accomplish the same quickly and efficiently by using the source addresses (e.g., network addresses) of the queries to direct the queries to the appropriate collection servers. Moreover, the DNS queries are effectively organized by the simple classification scheme, which minimizes the computation that downstream applications may have to perform when processing the DNS queries. In further examples, criteria other than source address may be used to sort or classify the DNS queries. - It should be noted that the
method 200 may be expanded to include additional steps or may be modified to include additional operations with respect to the steps outlined above. In addition, although not specifically specified, one or more steps, functions, or operations of themethod 200 may include a storing, displaying, and/or outputting step as required for a particular application. In other words, any data, records, fields, and/or intermediate results discussed in the method can be stored, displayed, and/or outputted either on the device executing the method or to another device, as required for a particular application. Furthermore, steps, blocks, functions or operations inFIG. 2 that recite a determining operation or involve a decision do not necessarily require that both branches of the determining operation be practiced. In other words, one of the branches of the determining operation can be deemed as an optional step. Furthermore, steps, blocks, functions or operations of the above described method can be combined, separated, and/or performed in a different order from that described above, without departing from the examples of the present disclosure. -
FIG. 3 depicts a high-level block diagram of a computing device or processing system specifically programmed to perform the functions described herein. As depicted inFIG. 3 , theprocessing system 300 comprises one or more hardware processor elements 302 (e.g., a central processing unit (CPU), a microprocessor, or a multi-core processor), a memory 304 (e.g., random access memory (RAM) and/or read only memory (ROM)), amodule 305 for filtering, distributing, and organizing domain name system queries, and various input/output devices 306 (e.g., storage devices, including but not limited to, a tape drive, a floppy drive, a hard disk drive or a compact disk drive, a receiver, a transmitter, a speaker, a display, a speech synthesizer, an output port, an input port and a user input device (such as a keyboard, a keypad, a mouse, a microphone and the like)). In accordance with the present disclosure input/output devices 306 may also include antenna elements, antenna arrays, remote radio heads (RRHs), baseband units (BBUs), transceivers, power units, and so forth. Although only one processor element is shown, it should be noted that the computing device may employ a plurality of processor elements. Furthermore, although only one computing device is shown in the figure, if themethod 200 as discussed above is implemented in a distributed or parallel manner for a particular illustrative example, i.e., the steps of theabove method 200, or theentire method 200 is implemented across multiple or parallel computing devices, e.g., a processing system, then the computing device of this figure is intended to represent each of those multiple computing devices. - Furthermore, one or more hardware processors can be utilized in supporting a virtualized or shared computing environment. The virtualized computing environment may support one or more virtual machines representing computers, servers, or other computing devices. In such virtualized virtual machines, hardware components such as hardware processors and computer-readable storage devices may be virtualized or logically represented. The
hardware processor 302 can also be configured or programmed to cause other devices to perform one or more operations as discussed above. In other words, thehardware processor 302 may serve the function of a central controller directing other devices to perform the one or more operations as discussed above. - It should be noted that the present disclosure can be implemented in software and/or in a combination of software and hardware, e.g., using application specific integrated circuits (ASIC), a programmable gate array (PGA) including a Field PGA, or a state machine deployed on a hardware device, a computing device or any other hardware equivalents, e.g., computer readable instructions pertaining to the method discussed above can be used to configure a hardware processor to perform the steps, functions and/or operations of the above disclosed
method 200. In one example, instructions and data for the present module orprocess 305 for filtering, distributing, and organizing domain name system queries (e.g., a software program comprising computer-executable instructions) can be loaded intomemory 304 and executed byhardware processor element 302 to implement the steps, functions, or operations as discussed above in connection with theillustrative method 200. Furthermore, when a hardware processor executes instructions to perform “operations,” this could include the hardware processor performing the operations directly and/or facilitating, directing, or cooperating with another hardware device or component (e.g., a co-processor and the like) to perform the operations. - The processor executing the computer readable or software instructions relating to the above described method can be perceived as a programmed processor or a specialized processor. As such, the
present module 305 for filtering, distributing, and organizing domain name system queries (including associated data structures) of the present disclosure can be stored on a tangible or physical (broadly non-transitory) computer-readable storage device or medium, e.g., volatile memory, non-volatile memory, ROM memory, RAM memory, magnetic or optical drive, device or diskette, and the like. Furthermore, a “tangible” computer-readable storage device or medium comprises a physical device, a hardware device, or a device that is discernible by the touch. More specifically, the computer-readable storage device may comprise any physical devices that provide the ability to store information such as data and/or instructions to be accessed by a processor or a computing device such as a computer or an application server. - While various examples have been described above, it should be understood that they have been presented by way of illustration only, and not a limitation. Thus, the breadth and scope of any aspect of the present disclosure should not be limited by any of the above-described examples, but should be defined only in accordance with the following claims and their equivalents.
Claims (20)
1. A method comprising:
receiving, by a processing system in a communications network, a first domain name system query from an edge router connected to the communications network, where the first domain name system query is associated with a first endpoint device;
identifying, by the processing system, a first network address of the first endpoint device from the first domain name system query;
classifying, by the processing system, the first domain name system query into a first class of a plurality of classes, wherein each class of the plurality of classes is associated with one predefined numerical range of a plurality of predefined numerical ranges, and wherein a target address unit of the first network address falls into a first predefined numerical range of the plurality of predefined numerical ranges that is associated with the first class; and
forwarding, by the processing system, the first domain name system query to a first repository of a plurality of repositories, wherein the first repository is dedicated for storing domain name system queries that are classified into the first class.
2. The method of claim 1 , wherein the first domain name system query is duplicated by the edge router in the communications network prior to being received by the processing system.
3. The method of claim 1 , wherein the first network address is an internet protocol address.
4. The method of claim 3 , wherein the target address unit of the first network address is a last address unit of the internet protocol address.
5. The method of claim 3 , wherein the internet protocol address is an internet protocol version 4 address, and the target address unit is an octet of the internet protocol address.
6. The method of claim 5 , wherein the plurality of classes comprises two classes, and the first predefined numerical range comprises a range from one to 255.
7. The method of claim 6 , further comprising:
receiving, by the processing system, a second domain name system query from the edge router connected to the communications network, where the second domain name system query is associated with a second endpoint device;
identifying, by the processing system, a second network address of the second endpoint device from the second domain name system query;
classifying, by the processing system, the second domain name system query into a second class of the plurality of classes, wherein a target address unit of the second network address falls into a second predefined numerical range of the plurality of predefined numerical ranges associated with the second class; and
forwarding, by the processing system, the second domain name system query to a second repository of the plurality of repositories, wherein the second repository is dedicated for collecting domain name system queries that are classified into the second class.
8. The method of claim 7 , wherein the second predefined numerical range comprises a range greater than 255.
9. The method of claim 3 , wherein the internet protocol address is an internet protocol version 6 address, and the target address unit is a hextet of the internet protocol address.
10. The method of claim 1 , wherein a number of the plurality of repositories is a power of two.
11. The method of claim 10 , wherein a number of the plurality of classes is equal to the number of the plurality of repositories.
12. The method of claim 1 , wherein each repository of the plurality of repositories corresponds to a different data lake.
13. The method of claim 1 , wherein the processing system is implemented in a switch.
14. The method of claim 1 , wherein the processing system is implemented in a collection server that is independent of the plurality of repositories.
15. The method of claim 1 , wherein the processing system is implemented in a load balancer.
16. A non-transitory computer-readable medium storing instructions which, when executed by a processing system in a communications network, cause the processing system to perform operations, the operations comprising:
receiving a first domain name system query from an edge router connected to the communications network, where the first domain name system query is associated with a first endpoint device;
identifying a first network address of the first endpoint device from the first domain name system query;
classifying the first domain name system query into a first class of a plurality of classes, wherein each class of the plurality of classes is associated with one predefined numerical range of a plurality of predefined numerical ranges, and wherein a target address unit of the first network address falls into a first predefined numerical range of the plurality of predefined numerical ranges that is associated with the first class; and
forwarding the first domain name system query to a first repository of a plurality of repositories, wherein the first repository is dedicated for collecting domain name system queries that are classified into the first class.
17. The non-transitory computer-readable medium of claim 16 , wherein the first network address is an internet protocol address.
18. The non-transitory computer-readable medium of claim 17 , wherein the target address unit of the first network address is a last address unit of the internet protocol address.
19. The non-transitory computer-readable medium of claim 18 , wherein the internet protocol address is an internet protocol version 4 address, the plurality of classes comprises two classes, and the first predefined numerical range comprises a range from one to 255.
20. A device comprising:
a processing system including at least one processor; and
a non-transitory computer-readable medium storing instructions which, when executed by the processing system when deployed in a communications network, cause the processing system to perform operations, the operations comprising:
receiving a first domain name system query from an edge router connected to the communications network, where the first domain name system query is associated with a first endpoint device;
identifying a first network address of the first endpoint device from the first domain name system query;
classifying the first domain name system query into a first class of a plurality of classes, wherein each class of the plurality of classes is associated with one predefined numerical range of a plurality of predefined numerical ranges, and wherein a target address unit of the first network address falls into a first predefined numerical range of the plurality of predefined numerical ranges that is associated with the first class; and
forwarding the first domain name system query to a first repository of a plurality of repositories, wherein the first repository is dedicated for collecting domain name system queries that are classified into the first class.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/816,680 US20220368669A1 (en) | 2019-05-23 | 2022-08-01 | Filtering and organizing process for domain name system query collection |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/420,817 US10798051B1 (en) | 2019-05-23 | 2019-05-23 | Filtering and organizing process for domain name system query collection |
US17/063,703 US11405354B2 (en) | 2019-05-23 | 2020-10-05 | Filtering and organizing process for domain name system query collection |
US17/816,680 US20220368669A1 (en) | 2019-05-23 | 2022-08-01 | Filtering and organizing process for domain name system query collection |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/063,703 Continuation US11405354B2 (en) | 2019-05-23 | 2020-10-05 | Filtering and organizing process for domain name system query collection |
Publications (1)
Publication Number | Publication Date |
---|---|
US20220368669A1 true US20220368669A1 (en) | 2022-11-17 |
Family
ID=72664046
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/420,817 Active US10798051B1 (en) | 2019-05-23 | 2019-05-23 | Filtering and organizing process for domain name system query collection |
US17/063,703 Active 2039-05-27 US11405354B2 (en) | 2019-05-23 | 2020-10-05 | Filtering and organizing process for domain name system query collection |
US17/816,680 Abandoned US20220368669A1 (en) | 2019-05-23 | 2022-08-01 | Filtering and organizing process for domain name system query collection |
Family Applications Before (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/420,817 Active US10798051B1 (en) | 2019-05-23 | 2019-05-23 | Filtering and organizing process for domain name system query collection |
US17/063,703 Active 2039-05-27 US11405354B2 (en) | 2019-05-23 | 2020-10-05 | Filtering and organizing process for domain name system query collection |
Country Status (1)
Country | Link |
---|---|
US (3) | US10798051B1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10798051B1 (en) * | 2019-05-23 | 2020-10-06 | At&T Intellectual Property I, L.P. | Filtering and organizing process for domain name system query collection |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040095962A1 (en) * | 2002-11-14 | 2004-05-20 | Allied Telesis K.K. | Data routing device, method for determining a destination of a request, and a computer program product for realizing the method |
US20070014241A1 (en) * | 2005-07-14 | 2007-01-18 | Banerjee Dwip N | Resolver caching of a shortest path to a multihomed server as determined by a router |
US20070055784A1 (en) * | 2005-09-08 | 2007-03-08 | Pancholi Ketan P | Method to reduce the learning curve of a transmission control protocol connection |
US7715329B1 (en) * | 2005-12-14 | 2010-05-11 | At&T Intellectual Property Ii, L.P. | Method and system for compiling multicast router data |
US7886075B2 (en) * | 2003-05-16 | 2011-02-08 | Cisco Technology, Inc. | Arrangement for retrieving routing information for establishing a bidirectional tunnel between a mobile router and a correspondent router |
US9137217B1 (en) * | 2014-05-16 | 2015-09-15 | Iboss, Inc. | Manage encrypted network traffic using DNS responses |
US20160191387A1 (en) * | 2011-11-11 | 2016-06-30 | Fujitsu Limited | Routing method and network transmission apparatus |
US10798051B1 (en) * | 2019-05-23 | 2020-10-06 | At&T Intellectual Property I, L.P. | Filtering and organizing process for domain name system query collection |
US11522829B2 (en) * | 2018-03-06 | 2022-12-06 | Afilias Limited | Determining traceability of network traffic over a communications network |
Family Cites Families (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7289519B1 (en) * | 2002-05-01 | 2007-10-30 | Cisco Technology, Inc. | Methods and apparatus for processing content requests using domain name service |
KR20060011533A (en) * | 2004-07-30 | 2006-02-03 | 엘지전자 주식회사 | Method for managing ipv6 subnet on local area network |
CN101803343B (en) * | 2007-09-18 | 2014-02-12 | 惠普开发有限公司 | Identifying subnet address range from DNS information |
FR2958104A1 (en) * | 2010-03-26 | 2011-09-30 | France Telecom | DNS SERVER, GATEWAYS AND METHODS FOR MANAGING AN IDENTIFIER OF A PORTS RANGE IN DATA TRANSMISSION. |
US8452874B2 (en) * | 2010-11-22 | 2013-05-28 | Amazon Technologies, Inc. | Request routing processing |
US8825839B2 (en) * | 2010-11-24 | 2014-09-02 | Unisys Corporation | Snooping DNS messages in a server hosting system providing overlapping address and name spaces |
US20120233351A1 (en) * | 2011-03-11 | 2012-09-13 | Richard Gorgens | Method of directing network traffic |
US9667590B2 (en) * | 2013-12-30 | 2017-05-30 | Cellco Partnership | APN-based DNS query resolution in wireless data networks |
US9729565B2 (en) * | 2014-09-17 | 2017-08-08 | Cisco Technology, Inc. | Provisional bot activity recognition |
US10992678B1 (en) * | 2015-09-15 | 2021-04-27 | Sean Gilman | Internet access control and reporting system and method |
US10178065B2 (en) * | 2015-10-01 | 2019-01-08 | Fastly Inc. | Enhanced domain name translation in content delivery networks |
US10243918B2 (en) * | 2015-10-19 | 2019-03-26 | Time Warner Cable Enterprises Llc | Method and apparatus for automatic geoaware access point provisioning |
US9967227B2 (en) * | 2015-11-11 | 2018-05-08 | Fastly, Inc. | Enhanced content route selection in content delivery networks |
US10645057B2 (en) * | 2016-06-22 | 2020-05-05 | Cisco Technology, Inc. | Domain name system identification and attribution |
US10395040B2 (en) * | 2016-07-18 | 2019-08-27 | vThreat, Inc. | System and method for identifying network security threats and assessing network security |
US10284520B2 (en) * | 2017-02-02 | 2019-05-07 | Cisco Technology, Inc. | Mitigation against domain name system (DNS) amplification attack |
US10659543B2 (en) * | 2017-10-04 | 2020-05-19 | Toluna Israel Limited | System and methods for tracking the usage of digital services |
US11005929B1 (en) * | 2019-01-30 | 2021-05-11 | Cisco Technology, Inc. | Dynamic data center load balancing using border gateway protocol |
-
2019
- 2019-05-23 US US16/420,817 patent/US10798051B1/en active Active
-
2020
- 2020-10-05 US US17/063,703 patent/US11405354B2/en active Active
-
2022
- 2022-08-01 US US17/816,680 patent/US20220368669A1/en not_active Abandoned
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040095962A1 (en) * | 2002-11-14 | 2004-05-20 | Allied Telesis K.K. | Data routing device, method for determining a destination of a request, and a computer program product for realizing the method |
US7886075B2 (en) * | 2003-05-16 | 2011-02-08 | Cisco Technology, Inc. | Arrangement for retrieving routing information for establishing a bidirectional tunnel between a mobile router and a correspondent router |
US20070014241A1 (en) * | 2005-07-14 | 2007-01-18 | Banerjee Dwip N | Resolver caching of a shortest path to a multihomed server as determined by a router |
US8364824B2 (en) * | 2005-09-08 | 2013-01-29 | International Business Machines Corporation | Reducing the learning curve of a transmission control protocol connection |
US20080228931A1 (en) * | 2005-09-08 | 2008-09-18 | International Business Machines Corporation | Method to Reduce the Learning Curve of a Transmission Control Protocol Connection |
US20070055784A1 (en) * | 2005-09-08 | 2007-03-08 | Pancholi Ketan P | Method to reduce the learning curve of a transmission control protocol connection |
US7715329B1 (en) * | 2005-12-14 | 2010-05-11 | At&T Intellectual Property Ii, L.P. | Method and system for compiling multicast router data |
US20160191387A1 (en) * | 2011-11-11 | 2016-06-30 | Fujitsu Limited | Routing method and network transmission apparatus |
US10009271B2 (en) * | 2011-11-11 | 2018-06-26 | Fujitsu Limited | Routing method and network transmission apparatus |
US9137217B1 (en) * | 2014-05-16 | 2015-09-15 | Iboss, Inc. | Manage encrypted network traffic using DNS responses |
US11522829B2 (en) * | 2018-03-06 | 2022-12-06 | Afilias Limited | Determining traceability of network traffic over a communications network |
US10798051B1 (en) * | 2019-05-23 | 2020-10-06 | At&T Intellectual Property I, L.P. | Filtering and organizing process for domain name system query collection |
US11405354B2 (en) * | 2019-05-23 | 2022-08-02 | At&T Intellectual Property I, L.P. | Filtering and organizing process for domain name system query collection |
Also Published As
Publication number | Publication date |
---|---|
US20210021567A1 (en) | 2021-01-21 |
US11405354B2 (en) | 2022-08-02 |
US10798051B1 (en) | 2020-10-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10574772B2 (en) | Content engine for mobile communications systems | |
US9467461B2 (en) | Countering security threats with the domain name system | |
US20200112571A1 (en) | Network security event detection via normalized distance based clustering | |
US20200112574A1 (en) | Unsupervised encoder-decoder neural network security event detection | |
EP2088719B1 (en) | Method and device for distributing file data | |
CN103051725B (en) | Application and identification method, data digging method, Apparatus and system | |
US10348646B2 (en) | Two-stage port-channel resolution in a multistage fabric switch | |
US20090259736A1 (en) | Label-based target host configuration for a server load balancer | |
US11463281B2 (en) | Managing network packet flows based on device information | |
WO2019062593A1 (en) | Packet transmission method and device, and computer readable storage medium | |
CN103401800A (en) | Link load balancing method and link load balancing device | |
US20220368669A1 (en) | Filtering and organizing process for domain name system query collection | |
US11652900B2 (en) | Separating intended and non-intended browsing traffic in browsing history | |
CN102857547A (en) | Distributed caching method and device | |
US9642169B2 (en) | Methods, circuits, devices, systems and associated computer executable code for facilitating access to a content source through a wireless mobile network | |
CN111935336B (en) | IPv 6-based network governance method and system | |
CN110958186A (en) | Network equipment data processing method and system | |
US20140330942A1 (en) | Method and apparatus for providing content according to type of communication network | |
US8656449B1 (en) | Applying policy attributes to events | |
US11956302B1 (en) | Internet protocol version 4-to-version 6 redirect for application function-specific user endpoint identifiers | |
US20230396553A1 (en) | Method and device for setting priority of packet transmission | |
US20240179510A1 (en) | Dynamically generating application function-specific user endpoint identifiers | |
Čiča et al. | Frugal IP lookup based on a parallel search |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO PAY ISSUE FEE |