US20200322227A1 - Methods and systems for device grouping with interactive clustering using hierarchical distance across protocols - Google Patents
Methods and systems for device grouping with interactive clustering using hierarchical distance across protocols Download PDFInfo
- Publication number
- US20200322227A1 US20200322227A1 US16/374,728 US201916374728A US2020322227A1 US 20200322227 A1 US20200322227 A1 US 20200322227A1 US 201916374728 A US201916374728 A US 201916374728A US 2020322227 A1 US2020322227 A1 US 2020322227A1
- Authority
- US
- United States
- Prior art keywords
- devices
- distance
- similarity
- text
- network device
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 77
- 230000002452 interceptive effect Effects 0.000 title claims description 36
- 238000012800 visualization Methods 0.000 claims abstract description 51
- 238000004891 communication Methods 0.000 claims abstract description 31
- 238000004458 analytical method Methods 0.000 claims abstract description 30
- 238000004422 calculation algorithm Methods 0.000 claims description 46
- 238000003860 storage Methods 0.000 claims description 14
- 238000012546 transfer Methods 0.000 claims description 5
- 239000002131 composite material Substances 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 35
- 238000013459 approach Methods 0.000 description 34
- 238000003058 natural language processing Methods 0.000 description 16
- 230000008569 process Effects 0.000 description 14
- 238000005259 measurement Methods 0.000 description 10
- 101000826116 Homo sapiens Single-stranded DNA-binding protein 3 Proteins 0.000 description 9
- 102100023008 Single-stranded DNA-binding protein 3 Human genes 0.000 description 9
- 230000007246 mechanism Effects 0.000 description 9
- 230000004044 response Effects 0.000 description 7
- 230000006870 function Effects 0.000 description 6
- 230000005540 biological transmission Effects 0.000 description 5
- 238000013507 mapping Methods 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 238000003012 network analysis Methods 0.000 description 4
- 230000006855 networking Effects 0.000 description 4
- 239000003795 chemical substances by application Substances 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 2
- 230000000670 limiting effect Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000036961 partial effect Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 101000772194 Homo sapiens Transthyretin Proteins 0.000 description 1
- 235000008694 Humulus lupulus Nutrition 0.000 description 1
- 102100029290 Transthyretin Human genes 0.000 description 1
- 108700042768 University of Wisconsin-lactobionate solution Proteins 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000001174 ascending effect Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000002860 competitive effect Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000012517 data analytics Methods 0.000 description 1
- 230000006837 decompression Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 229920000747 poly(lactic acid) Polymers 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000012913 prioritisation Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000011524 similarity measure Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000001502 supplementing effect Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 210000003813 thumb Anatomy 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 238000007794 visualization technique Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/08—Configuration management of networks or network elements
- H04L41/0893—Assignment of logical groups to network elements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/147—Network analysis or design for predicting network behaviour
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/142—Network analysis or design using statistical or mathematical methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/04—Processing captured monitoring data, e.g. for logfile generation
- H04L43/045—Processing captured monitoring data, e.g. for logfile generation for graphical visualisation of monitoring data
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/06—Generation of reports
- H04L43/062—Generation of reports related to network traffic
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
- H04L43/0876—Network utilisation, e.g. volume of load or congestion level
-
- H04L61/1511—
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L61/00—Network arrangements, protocols or services for addressing or naming
- H04L61/45—Network directories; Name-to-address mapping
- H04L61/4505—Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
- H04L61/4511—Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L61/00—Network arrangements, protocols or services for addressing or naming
- H04L61/45—Network directories; Name-to-address mapping
- H04L61/4541—Directories for service discovery
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/02—Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/16—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/22—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks comprising specially adapted graphical user interfaces [GUI]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
Definitions
- Clustering can be described as assigning a set of objects to groups, such that the objects within the same cluster are more similar (according to a property) to each other than to those objects in other clusters.
- the concept of creating, or otherwise identifying, clusters of nodes is applied in many fields, including computer networking, statistics, data analysis, and bioinformatics, for example.
- clustering nodes using the concept of “similarity” is often based on the physical topology of the network.
- Some network clustering algorithms capture the intuitive notion that nodes may be clustered with other nodes that are proximally located, such as clustering devices sharing a local area network (LAN). Accordingly, distance is a property that often governs the clustering of nodes in computer networking technologies.
- LAN local area network
- anycasting services employ a set of anycast resolvers that can measure the response times of replicated servers on behalf of clients to determine a distance therebetween (e.g., longer response time indicates larger distance between nodes).
- distance frequently serves as an anchor for determining “similar” nodes, and further for forming clusters of node that are present in a network. Nonetheless, it may be desirable to use clustering techniques driven by properties other than distance, that may be less tied to the physicality of the network.
- FIG. 1 illustrates an example of a system distributed across a communications network and including a network device implementing techniques for device grouping with interactive clustering using hierarchical distance, according to some embodiments.
- FIG. 2 illustrates examples of domain name services (DNS) entries that can be collected and analyzed by the network device shown in FIG. 1 to identify host associations utilized by the device group techniques (e.g., host name-to-node association for visualization of clustering results), according to some embodiments.
- DNS domain name services
- FIG. 3A illustrates an example of mapping of services to network addresses in accordance with a multicast domain name service (mDNS) protocol, according to some embodiments.
- mDNS domain name service
- FIG. 3B illustrates a graphical representation of a linked set of records associated with in accordance with a mDNS protocol, according to some embodiments.
- FIG. 3C is a conceptual diagram illustrating examples of the hierarchal distance features as applied for extensibility across multiple protocols, according to some embodiments.
- FIG. 4 is a conceptual diagram depicting examples of relationships between distances and similarities between device groups, according to some embodiments.
- FIG. 5 is an operation flow diagram illustrating an example of a process for executing device grouping with interactive clustering using hierarchical distance, according to some embodiments.
- FIGS. 6A-6C depict examples of network graphs generated using visualization aspects of the device grouping system disclosed herein, according to some embodiments.
- FIG. 6D depicts an example of a user interface for configuring the interactive clustering using hierarchal distance features disclosed herein for adaptability and extensibility to multiple protocols, according to some embodiments.
- FIG. 7 illustrates an example computing device that may be used in implementing various device grouping with interactive clustering using hierarchical distance features relating to the embodiments of the disclosed technology.
- Various embodiments described herein are directed to techniques and systems for device grouping with interactive clustering using hierarchical distance.
- a clustering technique that is not driven solely by distance measurements.
- an administrator may request network analytics that require devices of a common type, such as Apple Macintosh® (Mac®) computers, to be grouped in the same cluster.
- the clustering techniques disclosed herein can be configured to measure similarities (or dissimilarities) based on various properties, such as common services, common attributes, same resource type, and the like. Accordingly, the clustering techniques disclosed herein may provide advantages of flexibility and configurability over conventional network clustering mechanisms, which are often limited to analysis based on the physical topology and/or performance characteristics of the network.
- NLP natural language processing
- text-based analysis such as natural language processing (NLP)
- NLP techniques involve using dictionary encoding to remove biases that may be inherent in text-based distance approaches (e.g., length of the parameter names). For example, comparing a first parameter name set: Deviceid, pk, pn to a second parameter name set: Deviceid, pk, mn using the hierarchal distance algorithm may result in substantially close (e.g., two out of three) distance measurement.
- the hierarchal distance algorithm allows a lexical analysis to be done with NLP to segment parameter names with a value.
- analysis using the hierarchal distance approach can determine that two parameters (out the set containing three parameters), namely Deviceid and pk are common amongst both parameters sets.
- a text-based measurement may calculate an closer distance due to the length of the text string for matching parameters, Deviceid (having eight characters) is longer than the length of the text string for matching parameters, pn vs. mn (having two characters).
- NLP analysis can establish a way to normalize text that is analyzed in the hierarchal distance algorithm. Because of this normalization, similarities are measured from the context of the text, for instance determining the same type of service is conveyed by text, rather than measuring the text strings themselves.
- the examples can achieve improved distance measurements and optimal clustering, by adapting the hierarchal distance algorithm to remove biases related to text.
- Tools like language models whose use is typically delegated to NLP-based applications, can be applied to discovery protocol traffic in a manner that ascertains similarities between network devices based on properties that are recognizable as text.
- the disclosed techniques may realize improved performance and efficiency over mechanisms that primarily use statistical (or mathematical) measurements, which may also require greater computational complexity.
- the systems and techniques disclosed herein implement interactive clustering features.
- the embodiments include a graphical user interface (GUI) that allows a user to interact with and configure aspects of the clustering techniques.
- GUI graphical user interface
- user interactions can include configuring parameters related to a distance algorithm, which in turn impacts how clustering is performed.
- the interactive clustering aspects of the embodiments provide flexibility, such that a user can adapt clustering to be performed as deemed appropriate (or optimal) for the prevailing application (e.g., network environment, analytics, etc.).
- a user can set a more restricting threshold for identifying devices as “similar” (e.g., decreasing the potential of finding similarities), or conversely a less restricting threshold for identifying devices as “similar” (e.g., increasing the potential of finding similarities).
- the configurable parameters may be adjusted based on multiple factors, such as a desire for a larger number of groups in a cluster (e.g., larger clusters), and the like. This approach is similar to looking at fractals where the self-similarity could be displayed at by zooming in and out. Distance can be approximated, as customized by a user, for number of groups.
- the disclosed techniques involve a hierarchical distance approach.
- the hierarchical distance approach determines a quantitative measurement of distance between nodes that is governed by a prioritization (e.g., hierarchical order) of characteristics that may be used to ascertain similarities between the nodes that is more qualitative.
- the hierarchical distance approach is the underlying concept which allows for a set of parameters (e.g., retrieved from discovery protocol traffic) to be used as a measure of similarity.
- text-based analysis such as NLP, can be utilized for measuring similarities between devices.
- the hierarchal distance approach adds a dimension to text-based analysis that extends beyond flat string distance algorithms.
- each hierarchical level can be assigned to properties that are recognizable as text, namely parameters of the discovery protocol traffic.
- each hierarchical level corresponds to both a parameter and a distance value (or a degree of similarity) relating to the particular parameter, which serves as a link for subsequently measuring distances based on these parameters.
- a fully text-based analysis using a flat string distance algorithm may determine a large distance based on the pre-shared keys, even in the case of a complete similarity (e.g., same device).
- a complete similarity e.g., same device.
- the embodiments include mechanisms for passively collecting and analyzing discovery traffic.
- the device grouping system disclosed herein leverages edge devices to listen to discovery traffic within the network, rather than employing mechanisms that inject additional traffic into the network that is solely for the purpose of analysis.
- the system provides a minimal footprint by deploying fewer packet processing devices at strategic points in the network architecture (e.g., edge devices).
- metadata from collected packets can be analyzed such that the information can be used to derive network analytics, namely the disclosed device grouping techniques.
- Discovery protocols consistent with the present disclosure may include a dynamic host configuration protocol (DHCP), a domain name service (DNS), a multicast DNS (mDNS) protocol, a link layer discovery (LLDP) protocol, a CISCO discovery protocol (CDP), and many more that are low in volume, but high in information content about the network.
- Discovery protocols include information that allows devices to operate in the network.
- the information included in a service advertisement message has analytical value.
- a mDNS message includes text that particularly corresponds to network and device characteristics, such as domain names, and services, that can be used as measure of similarity to perform clustering. Applying text-based analysis, namely NLP, to network traffic that has high informational content about the network and the devices thereon is the underlying concept for the distance algorithm.
- FIG. 1 illustrates an example of a system 100 distributed across a communications network 110 and including a network device, shown as analyzer 140 .
- the analyzer 140 can be configured for implementing techniques for device grouping with interactive clustering using hierarchical distance, according to some embodiments.
- an example network architecture including clients devices 110 A- 110 N and packet processor 130 that can be proximately located, for instance within the same customer premises. Additionally, the client devices 110 A- 110 N and the packet processor 130 can be communicatively connected to each other as part of a local area network (LAN) 101 (indicated by dashed lines).
- LAN local area network
- LAN 101 may be installed at the customer premises, such as in retail a store, a business (e.g., restaurants, shopping malls, and the like), a factory, an office building, and the like.
- LAN 101 may include one or more of the clients devices 110 A- 110 N.
- Client devices 110 A- 110 N may include a desktop, a laptop, or a mobile device such as a smart phone, a tablet, or any other portable computing device capable to communicate through LAN 101 .
- client device 110 may include several types of devices, which, even in the case that client device 110 is mobile, may be loosely or less often associated or co-located with a user.
- Another type of client device 110 may be more often or almost always associated or co-located with a user (e.g., a smart phone or another wearable device).
- the network of FIG. 1 may include tap points, which can be points on the network to monitor, by a packet processor 130 , network devices and data between the tap points and client devices 110 A- 110 N.
- tap points can be at a “network edge.” Tap points can be described as locations that can have a visibility of local multicast based discovery that may not be routed beyond the network segment. Also, in some cases, tap points can be locations where a unique address can be obtained in all network layers, such as Media Access Control (MAC) and Internet Protocol (IP) in their respective layers.
- Network edges can provide predictable endpoints, e.g., tap points, from where to extract sample packets with a packet processor 130 .
- Network edge can be a sensitive area where the pulse of the network, which is the LAN in FIG.
- tap points may be placed at any point in LAN 101 (e.g., not at the network edge).
- tap points may be placed in the LAN 101 in order to provide visibility (e.g., to packet processor 130 via SPAN tunnels) of source IPs and/or discovery protocol traffic.
- a tap point can be placed between a router (not shown) and the LAN 101 to monitor discovery protocol traffic that are not routed beyond the router due to the nature of the discovery protocol or due to the broadcast discovery technique.
- placing tap points between router and the LANs 101 may enable snapshotting of packets prior to any network address translation performed by router, thereby preserving the client IP and the frequency of resolution by each client.
- such SPAN tunnels may connect router to tap points, such as when LAN 101 is switched.
- Network topology is highly dynamic (e.g., transient) on network edge 120 ; accordingly, placing the packet processor 130 at the edge enables packet processor 130 to determine how edge devices (e.g., APs and client devices 110 A- 110 N) continue to connect, authenticate, and access to perform routine functions.
- embodiments as disclosed herein include accurately determining the number and location of tap points in which to place one or more packet processor 130 to handle network volume.
- Some embodiments include the use of discovery tools, which operate within network edge and provide high-value, but low volume data traffic.
- packet processor 130 uses discovery tools in addition to deep packet inspection metadata extraction operations to handle network analysis before the first hop protocols seen at the level of network edge 120 . Access to this is obtained by either configuring a router to locally SPAN to a co-located or remote SPAN through a network to setup remotely but routable (e.g., into DNS server 100 ). This approach substantially reduces the bandwidth strain imposed in network resources by typical network analysis devices.
- packet processor 130 may absorb less than 0.05% to 1% of the network traffic volume, opening up a wide bandwidth for other network resources and/or compute/storage resources. For instance, storage resources with the capacity to store a month of data collected via previous techniques may be able to store two years of data collected via the present techniques.
- the client devices 110 A- 110 N can communicate various intent to access (ITA) messages 120 A- 120 N.
- ITA messages 120 A- 120 N can be generally described as packets, records, or messages, that enable devices on a network to announce information related to its configurability (e.g., services and associated parameters) and accessibility in a manner that allows the devices to discover, connect, and communicate with each other on the network.
- device discovery can be accomplished in accordance with a discovery protocol, namely mDNS.
- ITA messages 120 A- 120 N can be messages that indicate an intent to access using various other discovery protocols, such as DTP, DNS, SSDP, and the like.
- An mDNS transaction can be indicative of intent to access, and thus are also referred to as ITA messages herein, and illustrated as ITA messages 120 A- 120 N in FIG. 1 .
- an mDNS transaction includes communicating mDNS records (shown in FIG. 2 ) that advertise types of services related to a particular device within a network, or that are more visible more widely.
- a client device 110 A can communicate an mDNS record, such as an ITA message 120 A, when the client device 110 A becomes available to the network (e.g., after establishing connection to LAN 101 ).
- the ITA message 120 a including an mDNS record, can allow the client device 120 a to advertise its capabilities (e.g., services) on LAN 101 .
- an ITA messages can be referred to herein as service advertisement messages.
- the client device 110 A can transmit the ITA message 120 A to one or more other devices connected to the LAN 121 as part of a discovery process.
- the other client devices 110 B- 110 N upon receiving the ITA message 120 A, can discover the client device 110 A, its advertised services, and the associated parameters.
- Client devices 11013 - 110 N that may be consumers of the advertised services can use the parameters indicated in the ITA 120 A message to evaluate interoperability, connection methods, and other runtime operational compatibility to enable the services in the network.
- mDNS based service discovery can allow client device 110 A to query the network to determine services that are available (e.g., services advertised by client devices 110 B- 110 N).
- mDNS can accomplish service discovery with zero configuration (also known as “zeroconf”). It should be appreciated that although only client device 110 A is described in reference to FIG.
- any of the other client devices 110 B- 110 N on the LAN 101 are capable of communicating the ITA messages 120 A- 120 N, for instance discovery messages in accordance with the mDNS protocol.
- different mDNS records may have different configuration settings in terms of requirements and capabilities, access and privileges, based on the specification of LAN 121 , and intended purpose.
- the packet processor 130 situated at a tap point, as described above in detail, can intercept, or otherwise collect, ITA messages 120 A- 120 N that may be communicated via LAN 101 .
- the embodiments as disclosed herein require a comparatively small portion of the network traffic, namely the discovery traffic, to implement the device grouping to be further used in data analytics.
- this capacity is enhanced by implementation at the network edge.
- the packet processor 130 can transmit the collected ITA messages 120 A- 120 N, also referred to as discovery traffic, to an analyzer 140 which is a separate network device employed for analyzing the collected discovery traffic for analytics.
- the analyzer 140 implements the device grouping and interactive clustering using distance features disclosed herein.
- packet processor 130 inspects the discovery traffic that may be initiated by client devices 110 A- 110 N to discover the network resources with an application layer protocol (APP) or browser-based application installed on the client devices 110 A- 110 N.
- APP application layer protocol
- the same application that discovers the network resources may initiate hypertext transfer protocol (HTTP), or HTTP-secure (HTTPS) or other application protocol to access the network resource from client devices 110 A- 110 N.
- HTTP hypertext transfer protocol
- HTTPS HTTP-secure
- packet processor 130 may use mDNS to resolve host names to IP addresses.
- Other protocols that can be used by packet processor 130 can include DNS.
- a DNS server can provide a DNS to the operating system of client devices 110 A- 110 N, to map a network resource name configured in the APP to an IP address in network architecture.
- a DNS server transmits resolution requests to client devices 110 A- 110 N through DNS responses.
- SSDP tools may be used for resources co-located at the edge (e.g., plug and play devices, and the like). More specifically, some embodiments use the request part of discovery tools (e.g., protocols including memory devices storing commands and processors to execute the commands) for identification/discovery of client devices 110 A- 110 N, which are typically multicast, thereby facilitating access to at least one copy.
- the host responses (or server/protocol proxy node's responses) carry equally critical info that provide the “network view,” but may involve more network resources to track.
- an analyzer 130 can be configured with an interactive clustering module 141 that enables a network administrator to leverage discovery traffic for measuring similarities between devices on the LAN 101 in a hierarchical and configurable manner.
- the network administrator can generate a topological view of the network, namely LAN 101 , showing devices on the network that are grouped together based on the characteristics of the devices (e.g., services and parameters) rather than conventional distance measurements, such as determining the a number of device groups statistically.
- FIG. 1 shows the analyzer 140 as being a device that is remotely located from LAN 101 on customer premises (e.g., “cloud” deployment).
- analyzer 140 can been located on LAN 101 rather than external to the network.
- discovery traffic in the mDNS protocol that has been collected by packet processor 130 can be communicated, via communication network 170 , to the analyzer.
- Communication network 170 can include, for example, a wide area network (WAN), the Internet, and the like.
- analyzer 140 has full access to an associated database 142 .
- Database 142 may store information related to discovery traffic and protocols for the analytics performed by the analyzer 140 .
- database 142 may be a distributed network accessible database (e.g., Hadoop-like distributed network accessible database) that can process workflows, discovery tools, and the like.
- the analyzer 140 may perform other forms of network monitoring and analytics. For instance, analyzer 140 can apply machine-learning algorithms (e.g., neural networks, artificial intelligence, and the like) to build multiple user profiles and other network patterns (e.g., identify potentially harmful IP addresses or suspicious traffic behavior) that are stored in database 142 .
- a user profile may include the type of client device 110 A- 110 N used to log into LAN 101 , the period of time that the connectivity lasted (latency), patterns of connectivity, and the like.
- database 142 may also include DPI libraries to maintain flow states including handshake states between client devices 110 A- 110 N and access points.
- DPI libraries to maintain flow states including handshake states between client devices 110 A- 110 N and access points.
- at least a portion of the analyzer 140 may be deployed within network edge of the LAN 101 (e.g., “on-premises” deployment).
- the interactive clustering module 141 includes executable instructions, computer components, or a combination of both that implement the specific functions of the interactive device clustering and device grouping aspects of the embodiments.
- the interactive device clustering module 141 can include a graphical user interface (GUI) for receiving various user-configurable parameters entered by user, such as the network administrator associated with LAN 101 . Therefore, the embodiments can provide an end user with the capability to adapt the hierarchical distance algorithm to function in a manner consistent with their intended analytics application.
- GUI graphical user interface
- a user can configure the hierarchical distance algorithm by assigning a respective value to each of the hierarchal levels of the algorithm.
- the user can set which discovery parameters (that correspond to a particular hierarchical level) serve as a greater indicator of similarity by the algorithm.
- the network administrator can enter input place values that respectively correspond to a discovery parameter into the GUI of the interactive cluster module 141 .
- a user can assign a higher place value to a particular discovery parameter, such as setting “services” to correspond with the thousandths place, while assigning a lower place value to another discovery parameter, such as setting “attributes” to the tenths place.
- the settings effectively adjust the clustering approach to utilize “service” as the discovery parameter having the highest weight in measuring similarity between devices.
- the levels of hierarchy can be even broader than the protocol specific approach discussed above (levels restricted to parameters within a certain protocol).
- the interactive clustering module 141 can be configured to include a hierarchal level for different protocols (e.g., mDNS, DNS, SSDP, etc.), thereby allowing devices that communicate in a common protocol to be considered a property for similarity.
- a hierarchical level for message type e.g., discovery, advertisement
- the interactive clustering module 141 can include other configurable parameters that are described in further detail herein, for example in reference to FIGS. 6A-6D . the some cases, this can signify that the network administrator is tuning a threshold a similarity such that the device grouping generates larger device groups (e.g., having a greater number of devices in each group). Furthermore, the interactive cluster module 141 can implement various NLP techniques that can be used in extracting text from the discovery traffic in the mDNS protocol, and then applying text-based analysis for measuring similarity between devices. Thereafter, the interactive cluster module 141 can use these degrees of similarity to calculate a distance measurement. For instance, interactive cluster module 141 can be configured to calculate a greater distance between two device on the network that have less similarities with each other.
- a shorter distance between two devices on the network may be calculated by the interactive cluster module 141 , when the devices are more similar to each other.
- a threshold of similarity which governs the degree necessary for devices to qualify as similar (or dissimilar) is also a configurable parameter of the hierarchical distance algorithm. Accordingly, the interactive cluster module 141 provides the adjustability for a user to either restrict or broaden the requirement for a cluster, thereby configuring the distance algorithm to be predisposed for generating larger groups of devices (e.g., more devices in a group) or smaller groups of devices (e.g., less devices in a group).
- the interactive clustering and device grouping techniques disclosed can be adaptable for use with various other discovery protocols, and thus is not limited to applications using the mDNS protocol. Embodiments that are extended for use with other discovery protocols is discussed in greater detail in reference to FIG. 3C , for example.
- FIG. 1 shows a network visualization client 150 including a network visualization module 151 .
- the analyzer 140 can be a centralized computer, such as a server, having a processing capacity that is suitable to support the data processing and analysis necessary to implement the interactive clustering and device grouping features disclosed.
- the visualization client 150 may be a client device having network analysis applications, such as the visualization interface 152 , that consumes the analytical data processed by analyzer 140 .
- the visualization device 150 can be a desktop, a laptop, or a mobile device such as a smart phone, a tablet, or any other portable computing device that can be used by a network administrator for monitoring a network, such as LAN 101 .
- the visualization client 150 and the analyzer 140 are communicatively connected via a network (not shown) allowing communication of data between the devices.
- the network visualization module 151 includes executable instructions, computer components, or a combination of both that implement a visualization of the network.
- the visualization shown in FIG. 1 as output of an visualization interface 152 , can include graphical representations of the device groups generated by the interactive clustering module 141 .
- the visualization interface is illustrated in FIG. 1 as displaying a network graph as a result of the clustering and grouping performed by the analyzer 140 , in accordance with the embodiments.
- the network graph can be a visual representation of the topology of a network, having visual cues, such as nodes, for identifying client devices, and traffic in a network having devices that utilize the mDNS protocol for resolving host names and IP addresses, according to some embodiments.
- Nodes can represent various types of network devices, including by not limited to, a client device, a router, an AP, a host server, a database, or any network device in a network architecture as disclosed herein (e.g., client devices 110 a - 110 n ).
- the visualization interface 152 may present a graph which represents client devices 110 a - 110 n on LAN 101 that are measured as having small distances from each other, as determined by the distance algorithm, as a cluster of nodes.
- the graph displayed within visualization interface 152 can show client devices 110 a - 110 n on LAN 101 that are measured as having large distances from each other, as determined by the distance algorithm, as individual nodes separated by edges (having a length that is commensurate with the calculated distance).
- the visualization can be generated in an interactive manner.
- the visualization interface 152 can receive input from a user (e.g., merge device groups) that adds clusters to the visualization.
- the visualization client 150 can include an input device, and an output device.
- Input device may include a mouse, a keyboard, a touchscreen, and the like, that can be utilized by the user to interact with the visualization.
- An output device of the visualization client 150 may include a display, a touchscreen, a microphone, and the like, which displays a visualization.
- input device and output device of the visualization client 150 may be included in the same unit (e.g., a touchscreen).
- FIG. 2 illustrates examples of domain name services (DNS) entries 205 , 210 , and 215 that can be included in discovery traffic and analyzed in accordance with the interactive clustering and device grouping by the analyzer 140 shown in FIG. 1 .
- the records may communicated in accordance with the mDNS protocol, for instance by a device advertising its services to the network.
- each of the DNS entries 205 , 210 , and 215 include text, namely the entry parameters 206 , 211 , and 216 , which indicate certain network and capability attributes for a device on the network.
- the embodiments can leverage text-based analysis of these DNS records 205 , 210 , and 215 , for instance applying NLP techniques to the text of the entry parameters 206 , 211 , and 216 .
- DNS records various properties that can be identified by DNS records, such as common services, common attributes, same resource type, and the like, can be used as a measure of similarity between devices.
- a set of records can be linked based on identified commonalities to generate a graph with nodes and edges that correspond to particular entry parameters.
- the set of records 205 , 210 , and 215 can be described as common host, shown as “Cali”, which is advertising a service, shown as “airplay.”
- a network architecture may include a DNS server having a cleanup tool and a tf-idf tool configured to operate on traffic at the network edge.
- the records can be organized in tuples 205 , 210 , and 215 (hereinafter, collectively referred to as “tuples 200 ”).
- Tuples 200 include DNS names 201 - 1 , 201 - 2 , and 201 - 3 (hereinafter, collectively referred to as “DNS names 201 ”), a resource type 202 - 1 , 202 - 2 , and 202 - 3 (hereinafter, collectively referred to as “resources 202 ”), its associated Host/IP addresses 203 - 1 , 203 - 2 , and 203 - 3 (hereinafter, collectively referred to as “Host/IP addresses 203 ”) and time to live (TTL) 204 - 1 , 204 - 2 , and 204 - 3 (hereinafter, collectively referred to as “TTL”).
- DNS names 201 DNS names 201 - 1 , 201 - 2 , and 201 - 3
- resources 202 resource type 202 - 1 , 202 - 2 , and 202 - 3
- Host/IP addresses 203 its associated Host/
- the disclosed hierarchical distance techniques can use the text based distances between these entry parameters 206 , 211 , and 216 , and other entry parameters contained in other records, in order to measure for similarities. For instance, analyzing DNS entries 205 , 210 , and 215 and other records that may include text that indicate the common name “airplay” (corresponding to a shared service) can be considered as similar, and viewed as a graph with the name as a node.
- FIG. 3A illustrates an example mapping 305 of service 310 to a network address 330 in accordance with a multicast domain name service (mDNS) protocol, according to some embodiments.
- Mapping 305 may be performed by a mapping tool in a DNS server.
- the DNS server can be at a location that is close to a tap point and the packet processor in the network (e.g., packet processor 130 shown in FIG. 1 ).
- FIG. 3A will be described in relation to FIG. 3B , which illustrates a graphical representation of a linked set of records that can be linked together based on the mapping.
- mDNS conveys a set of relationships within the advertisement of records. For each node, their advertised services and attributes can be collected as multiple records. Thus, for a single node, a set of associated records for that node would list all of its services and attributes. Also, a record indicating a service instance can be used to link a node to a service and attribute. These relationships, which can be inferred from mDNS based records, are leveraged by the embodiments to further measure similarities between nodes for clustering. Referring now to FIG. 3A , a record can include an advertised service, which is indicated by service type (shown in FIG. 3B as “_airplay_tcp_local”).
- a pointer record 311 transfers service type 310 to a service instance 321 (shown in FIG. 3B as “Cali._airplay.tcp_local”).
- a text pointer 321 associates a text record with the attributes 320 of the service (shown in FIG.
- a service record 316 transfers service instance 315 to a node 325 (shown in FIG. 3B as “Cali-mm.local”).
- the request for IP address 330 from node 325 may use two types of requests, 326 and 327 (hereinafter, collectively referred to as “requests”), or may follow an indirect request through a CNAME.
- a request 326 may include an IPv 4 address record and a request 327 may include an IPv6 address record leading to IP address (shown in FIG. 3B as “26.222.290.270” through IPv4, or “fe80::92ac:3fff:fe09:735b” through IPv6).
- the data can be prepared for further text-based analysis that is performed during interactive clustering.
- NLP techniques applied to analyzed records can include: dictionary encoding a service type extracted from a record, tokenizing text records, and dictionary encoding attributes (and attribute values) within their respective name spaces.
- FIG. 3C is a conceptual diagram illustrating examples of the hierarchical distance feature as applied for extensibility across multiple protocol.
- the concept of hierarchal distance adds a dimension to text-based analysis that extends beyond flat string distance algorithms.
- the approach can include setting multiple hierarchical levels illustrated by graph 350 , which can be described as generally having a tree structure.
- graph 350 includes multiple hierarchal levels that are arranged in a descending order from a first level 351 , a second level 352 , a third level 353 , a fourth level 354 , and a fifth level.
- the levels in the hierarchy can reflect the significance of that level in determining similarities between devices.
- the first level 351 in the hierarchy may be assigned to properties that are intended to be the greatest indicator that devices are indeed similar.
- the second level 352 in the hierarchy may be assigned to properties that are slightly lower indicators of similarity, and so on down the hierarchy.
- the fifth level in the hierarchy may be assigned to properties that are considered the lowest indicator of similarity.
- FIG. 3C shows that each hierarchal level 351 - 355 can correspond to properties that are recognizable as text, namely parameters of the discovery protocol traffic. Even further, FIG. 3C serves to illustrate that this hierarchy can be applied to different protocols, and their corresponding parameters. That is, the hierarchal distance algorithm can be configured to a level of abstraction that accommodates discovery traffic having a plurality of different protocols. As such, clustering can be achieved by the embodiments whether mDNS, DNS, SSDP, or other protocols are used in the network environment. In some cases, the hierarchal distance algorithm can be configured in a manner that is protocol-specific, where the parameters in the hierarchy specifically correspond to a certain protocol.
- graph 350 shows a hierarchy in which the second hierarchal level 352 is assigned to different protocols. Even further, the lower hierarchal levels 353 - 355 of the hierarchy include parameters that are conventionally used in each of the protocols of level 352 . Due to this hierarchal arrangement, the hierarchal distance algorithm can execute a scheme that is capable of measuring distance for each of the designated protocols.
- the hierarchy of graph 350 includes a message type “Discovery” and “Advertisement” in the first hierarchal level 351 .
- Protocols are included in the next level of hierarchy, showing “mDNS”, “DNS”, “SSDP”, “LLDP”, “HTTP”, and “HTTPS” in hierarchal level 352 .
- the descending third level 353 includes: “SVC1” and “SVC” (associated with mDNS); “Resolved Enterprise Domains” (associated with DNS); “Domain” (associated with SSDP); “ATTR” (associated with LLDP); “User Agent” (associated with HTTP); and “Certificates” (associated with HTTPS).
- the further descending fourth level 354 includes: “Feature” (associated with mDNS); “Device Type” and “Service Type” (associated with SSDP); “Value” (associated with LLDP); “Attributes” (associated with HTTP); and “Issuers” (associated with HTTPS).
- the last, and fifth level 355 includes: “Attributes (associated with mDNS); and “Domains” (associated with HTTPS). As seen, it is not required for each of the protocols to have parameters that extend to each of the levels of the hierarchy.
- place values 361 - 365 correspond to the hierarchy in graph 350 .
- each of the place values 361 - 365 can be assigned to one of the hierarchal levels 351 - 355 , respectively.
- the example hierarchy in FIG. 3C has a higher place value 361 , shown as the thousandths place, that corresponds to the first hierarchal level 351 .
- a comparatively smaller place value 362 shown as the hundredths place, corresponds to the second hierarchal level 352 for the protocols.
- a place value 363 shown as the tenths place, corresponds to the third hierarchal level 353 .
- a place value 364 shown as the ones place, corresponds to the fourth hierarchal level 354 .
- a place value 365 corresponds to the fifth hierarchal level 355 .
- each hierarchical level 351 - 355 corresponds to both a parameter and a place value 361 - 365 relating to the particular parameter, which serves as a link for subsequently measuring distances based on these parameters.
- the levels (top-down) in the hierarchy reflect a decreasing significance of a parameter in determining similarities between devices
- each of the place values 361 - 365 for a lower level also decreases (with respect to the place value at the higher level).
- each hierarchal level 351 - 355 contributes a value, vis-à-vis its place value 361 - 365 to the distance value that reflects the significance of that level.
- devices having a complete overlap in protocols in hierarchal level 352 will contribute a value that has a more significant impact on the total distance, in accordance with the hierarchal distance algorithm, than devices having a complete overlap of attributes in hierarchal level 355 (0 the 1/tenths place).
- the hierarchy of FIG. 3C is an example for purposes of illustration, and is not intended to limit the scope of the embodiments.
- the variables relating to the disclosed hierarchical distance aspects are intended to be configurable as deemed necessary or appropriate.
- the hierarchy in FIG. 3C can be implemented based on user inputs, such as settings (e.g., GUI shown in FIG. 6D ) that are received by the system to configure, and subsequently implement, the hierarchal distance algorithm.
- protocols and associated parameters are not limited to advertisements.
- DNS attributes can be seen in transactions during discovery of internal servers by end-users.
- the communication can be considered to include the text data necessary for the text-based distance measuring techniques disclosed herein.
- the hierarchal approach can be extended to other protocols that include device attributes or parameters in their headers, such as HTTP and/or HTTPS.
- preparation of the data specific to each application can be done specific to the application itself.
- HTTP and HTTPS the approach can use a filtered set of user-agent, certificates respectively based on the higher level application transported by HTTP and/or HTTPS.
- configurable weights based on the application can provide additional flexibility in changing the relative contribution of each application to distances. With collection of data from various networks over a period of time with enough labels for device types, all these configurability can be trained through a neural net making it extensible even further.
- the hierarchal aspects can be extended to scenarios involving devices searching for services (e.g., the discovery of services by consumers).
- the hierarchy can be configured to assign a hierarchal level for measuring a similarity regarding a number of resource advertisements and discovery is attributed to the device.
- FIG. 4 is a graph 400 depicting calculated distances in relation to similarities between groups of devices.
- the hierarchal distance algorithm is configured to calculate a distance between two devices, by measuring similarities (or dissimilarities) based on various properties related to the devices, such as common services, common attributes, same resource type, and the like.
- the calculated distance reflects this similarity (or dissimilarity), which is further utilized by the clustering approach to form device groups.
- the graphical representation 400 serves to illustrate this relationship between the calculated distance and a degree of similarity that can be used for clustering.
- the graphical representation 400 can be described as multiple Venn Diagrams including circles representing a set, or group of devices. The common elements of the sets being represented by the areas of overlap among the circles, where the overlap can represent a degree of similarity.
- FIG. 4 illustrates that a calculated distance can have an inversely proportional relationship to similarity, as the degrees of similarity between sets in each of the Venn Diagrams steadily decreases, as the distance value increases (indicated by the arrow from
- diagram 405 in FIG. 4 the relationship between a small distance and its associated similarity between sets are shown. For example, a distance value that is approximately zero may reflect a high degree of similarity.
- This is shown in diagram 405 as there is a complete overlapping of the two sets (or subsets) of devices. Thus, diagram 405 appears as one circle, as the two sets share all common features. Furthermore, as the sets are the same size (e.g., same number of devices), there is substantially no symmetric difference (e.g., negligible difference outside of the area of overlap).
- the scenario represented by diagram 405 can be considered as one device in some clustering implementations.
- Diagram 410 a represents a relationship between a slightly larger calculated distance (with respect to the distance for diagram 405 ) and the associated degree of similarity.
- set 411 a shaded circle
- set 412 a is of a smaller size than set 412 a.
- the circular area of set 411 a is completely contained within set 412 a, there is some difference between the sets, which is the area of 412 a that is outside of its partial overlap with 411 a.
- set 411 a can be described as a subset of set 412 a. This can indicate there are some features of a device (e.g., set 412 a ) that are not present the compared device (e.g., set 411 a ), as opposed to being different.
- Diagram 410 b illustrates a similar scenario, however there is a greater difference between the sizes of sets 411 b (shaded circle) and 412 b.
- the size of set 411 b is smaller as compared to 411 a in diagram 410 a. Consequently, set 412 b has a larger area that exists outside of its overlap with 412 b.
- Diagram 415 a represents a relationship between a substantially large calculated distance, and the associated degree of similarity.
- Diagram 415 a shows a set 416 a (shaded circle), having a primarily same size as set 417 a. Again, there is partial overlap between the sets 416 a and 417 a.
- Set 416 a and set 417 a primarily overlap with each other, indicating that there is a high degree of similarity. However, there is a portion of set 416 a that is outside of the area of overlap, and portion of set 417 a that is outside of the area of overlap. Restated, both set 416 a and set 417 a have some dissimilarities with respect to each other (as opposed to being a subset).
- diagram 415 b a scenario that is similar to 415 a is illustrated. But, in diagram 415 b, there is a greater symmetric difference between set 416 b (shaded circle) and 417 b, as compared to diagram 415 a. Specifically, the area of overlap in 415 b is smaller than the areas that are not common in the independent sets 416 b and 417 b. Diagram 415 b represents a case where there is more dissimilarity than similarity present between the devices. Therefore, diagrams 415 a and 415 b illustrate an even smaller degree of similarity, as it related to a distance value has increased.
- clustering can involve adding weights that can reflect a ratio between the area of overlap versus the area of non-overlap. For instance, in reference to diagrams 415 a and 415 b, there may be a larger weight added to similarity in the scenario of diagram 415 a, as there is a more overlap than difference present. In some cases, the sizes of the set are also weighted in determining similarity for clustering.
- Diagram 420 a can be generally described as the of converse of diagram 405 .
- the distance value is the largest shown in the graph 400 .
- Such a large distance can indicate that there is substantially no similarity that can be measured between the device, namely the devices have no common features.
- This relationship is illustrated in diagram 420 a as sets 421 a and 422 a are completely disjointed, having no area of overlap.
- diagram 420 b illustrates disjointed sets 421 b and 422 b. Nonetheless, the sizes of sets 421 b, 422 b in diagram 420 b are larger than the sizes of the sets 421 a, 422 a in diagram 420 a.
- diagram 420 b may be considered to show less similarity between its sets 421 b, 422 b than diagram 420 a.
- a disjointed set with 20 elements has a higher degree of dissimilarity than a disjointed set of only two elements.
- various other weights, parameters, and factors not described in refence to FIG. 4 may be implemented by the embodiments for generating clusters.
- the actual measure of the distance could be modified from using a simple encoded set comparison to embedding based measure or other NLP techniques like Latent Dirichlet Allocation (LDA).
- LDA Latent Dirichlet Allocation
- the extracted topic across various hierarchies could itself provide enough information about the network.
- the topic measure for mDNS could be the service proxies in the network, and for HTTP it could be the device types identifiers like iPad, iPod or type of android devices.
- FIG. 5 is an operation flow diagram illustrating an example of a process 500 for executing device grouping with interactive clustering using hierarchical distance, according to some embodiments.
- Process 500 is illustrated as a series of executable operations performed by processor 501 , which can be the analyzer (shown in FIG. 1 ), as described above.
- processor 501 executes the operations of process 500 , thereby implementing the disclosed interactive clustering and device grouping techniques described herein.
- a plurality of ITA messages that are being communicated by a plurality devices on a network can be collected.
- ITA messages can be communicated during device discovery and/or advertisement, and collected in a manner that is passive (e.g., listening, intercepting).
- the ITA messages are formatted in accordance with the mDNS discovery protocol.
- the embodiments can be configured such that the interactive clustering approach is applicable to various other discovery protocols, such as DNS, SSDP, LLDP, HTTP, HTTPS, and the like.
- ITA messages may be considered consumer messages. For example, in the case of a device that is a consumer of a particular service. Here, a consumer can use the parameters of the consumer messages to evaluate interoperability, and connection methods with other devices on the network, so as to be able to utilize the service.
- ITA messages can include text, or records (as shown in FIG. 2 ), that convey discovery parameters that are specific to the particular protocol of the ITA message.
- discovery parameters can be used to communicate the capabilities of the device.
- a message can include text that indicates a service, attributes, and attribute values related to a certain device.
- HTTP the parameters can include text that indicates a user agent, and attributes. It should be appreciated that which parameters are analyzed in operation 510 can be a user configurable feature that can be tailored for the network environment.
- a system administrator can designate that the system can examine ITA messages to extract (and subsequently analyze) text for domain, device type, and service type parameters, when the network is known to primarily utilize SSDP, for instance.
- the discovery protocols and discovery parameters that are analyzed in the process 500 can be automatically determine by the system, in a manner that does not require user input.
- operation 510 can involve extracting text that corresponds to certain discovery parameters that are specific to the protocol.
- operation 510 can include analyzing parameters of ITA messages in a generic manner (e.g., protocol independent), that can be easily applied across different discovery protocols.
- operation 510 involves NLP aspects that can remove biases that may be inherent in text-based analysis, thereby improving in measuring degrees of similarities between devices.
- text extracted from the ITA messages can be dictionary encoded for each of the respective discovery parameters. Therefore, biases associated with a length of a string for the text may be negligible for the purposes of measuring a degree of similarity between devices.
- text for each of the identified protocols, attributes, and attribute values can be separately encoded, in manner that allows commonalities in their respective text to be treated similarly (as a single entity) irrespective of the length.
- a cardinality threshold can be used to decide whether text for a discovery parameter is dictionary encoded. In some instances, the embodiments apply a cardinality threshold, and text with high cardinality, such as encrypted values and text strings having the same length, are not encoded. Conversely, text with lower cardinality, with respect to the cardinality threshold, are encoded.
- NLP techniques in operation 510 can address keys that may be included in text.
- Some records use pre-shared keys (“pk”) to communicate a security code.
- pk pre-shared keys
- a projecting device can communicate a security code, via a key, to verify its proximity to an device that has advertise an “Airplay” service.
- measuring a distances between two keys using text-based distance may result in an unproportionable measure of dissimilarity.
- the same device communicating two different keys has a potential of being measured at the same distance as two separate devices due to the bias.
- NLP can employ an approach that splits the text for other parameters, such as attributes, from text that particularly corresponding to keys. Thereafter, the comparisons between the keys can be performed separately from the remaining text. In some cases, keys are compared first. Then, when keys are found to overlap, a comparison of the other parameters are performed. This key approach can be done on the per-service level, for example, for a common set of services that may be identified by the analysis.
- configurable hierarchal parameters for the distance algorithm may be received.
- a hierarchal approach is used to designate which discovery parameters (extract from ITA messages) are considered to be greater indicators of similarities between devices.
- a particular parameter at a higher level in the hierarchy is more indicative of similarity, and thus has a heavier (or weighted) contribution in calculating the distance value.
- the hierarchy implemented by the system can be tuned, or otherwise configured, by a user, thereby providing greater flexibility of the clustering.
- operation 515 can include receiving a first hierarchal level corresponding to services, receiving a second hierarchal configurable level corresponding to attributes, and a third configurable hierarchal level corresponding to attribute values. Accordingly, in this case, services can be considered the highest level in the hierarchy, or the most significant property for determining similarities.
- a hierarchal level can be assigned to protocols.
- a number of hierarchal levels that are used can also be a configurable parameter of the system. For instance, a hierarchal distance algorithm can be set to consider three levels, and then adjusted to use five levels in a hierarchy. Accordingly, ITA messages of the same discovery protocol can be a measure of similarity.
- a hierarchal level can even be assigned to type of message. The system can have the capability to automatically set the abovementioned hierarchal levels itself, based on a known discovery protocol that may be primarily used in the network environment.
- a user such as a network administrator, can provide user input to the system (e.g., GUI shown in FIG. 6D ) that assigns the hierarchal levels to particular discovery parameters.
- GUI GUI shown in FIG. 6D
- the parameters, protocols, and messages that are described above as aspect of the hierarchal approach should not be considered exhaustive. It should be appreciated that other text-identifiable characteristics related to network traffic can be used to form a hierarchy used by the hierarchal distance algorithm.
- configurable parameters of the distance algorithm can include place values.
- Operation 515 can involve receiving a specified place value that corresponds to the each of the abovementioned hierarchal levels.
- the place values increase in an ascending order, as the hierarchal levels increase.
- each place value contributes a value to the total distance that is consistent with the hierarchal level's significance in indicating similarity.
- the first (e.g., most significant for determining similarity) hierarchal level which corresponds to service in the mDNS based embodiment, can be set to an order of 10s power to a first decimal place value, such as 1000.
- a second (e.g., less significant in determining similarity) hierarchal level, which corresponds to attribute in the mDNS based embodiment, can be set to a descending order of 100s power to a second decimal place, such as 1.
- a third (e.g., lest significant in determining similarity) hierarchal level, which corresponds to attribute values in the mDNS based embodiment, can be set to a further descending order of 10s power to a third decimal place, such as 10.
- operation 515 can include receiving a threshold of similarity.
- the threshold of similarity can be a level that must be met (or exceeded) to satisfy a degree necessary for devices to qualify as similar for the intended purposes of clustering.
- a user can input a value for the threshold of similarity at operation 515 .
- a distance value calculated between two devices can be compared to the threshold of similarity, as entered, and used to determine whether the devices have a degree of similarity to be clustered together.
- clustering can be tuned as deemed optimal for the particular application, based on the threshold of similarity.
- the threshold of similarity is a variable that can be configured based on various factors related to the devices or the clustering, such as services advertised by the devices, or the visualization technique.
- a distance separating each of the plurality of devices according to a calculated distance value is determined.
- the disclosed hierarchal distance algorithm is implemented at operation 520 .
- the hierarchal distance algorithm leverages text-based properties, and incorporates a hierarchal scheme, in order to measure distance between devices on a network. Calculating a distance between two devices, for example, can involve generating a distance value for each of the hierarchal levels used by the algorithm. Restated, a degree of similarity for each level in the hierarchy can be determined, by applying text-based measurements to the parameters within each of the hierarchal levels.
- each distance value at the respective hierarchal level is placed in its assigned place value for a total composite of distance.
- the total, comprised of each of the distance values at each hierarchal level, is considered the distance between the two devices.
- the distance value at each hierarchal level is a value between 0 and 1, with a distance of 1 being the largest disjoint set, and 0 being complete overlap (e.g., same device).
- a distance value at the hierarchal level for service can be 0.3
- a distance value for the hierarchal level for attribute can be 0.1
- a distance value at the hierarchal level for attribute value can be 0.2.
- the process 500 can generate clusters of similar devices based on the determined distance.
- Each of the clusters can comprise a subset of devices having small distances between them, as calculated by the hierarchical distance algorithm.
- devices that are clustered together serve as an indication that the devices possess similarities of some form.
- operation 525 can involve employing a clustering approach, that can be used to group clusters flexibly.
- the clustering approach can generate two merged clusters with the distance determined in operation 520 between them.
- the clustering approach can also provide a total number of devices among the two groups. This can be used to generate the clusters, in an iterative manner. At each iteration, there can be an decision on whether to the groups are clustered. Also, based on the number of devices in the merged group, it can be further determined whether to merge leaf nodes (e.g., base data) or aggregate clusters.
- leaf nodes e.g., base data
- the clustering approach implemented at operation 525 may begin with forming basic cluster groups, which include the same devices, or perfect overlaps as discussed in greater detail in reference to FIG. 4 .
- the basic cluster group are formed by devices with zero distance between them.
- similar devices with minimal distances can be merged within a threshold, depending on the final number of groups formed.
- This threshold can be used to form other heterogenous clusters to generate an initial set of clusters that are easy to visualize.
- Additional clusters can be formed by a combination of configuration and visualization, similar to a feedback loop. For instance, the initial set of clusters can be generated at half of the height of a linkage tree. Recursive visualization models could be used to zoom in, and view sub-clusters or leaf nodes to determine merge decisions, thus making for an interactive approach in determining groups.
- K-Means approach is utilized for clustering.
- K-Means approach can be generally described as where each device is assigned to a cluster randomly and an iterative best effort is employed to regroup them among the existing clusters.
- this K-Means approach it is typically expected to see some number of large clusters and a very long tail of devices that would form a cluster of ungroupable devices. Distribution of such devices in the clusters remain separately with a bottoms-up clustering model as described in this embodiment, compared to being sprinkled around various clusters in a top-down approach.
- the approach described provides a better user-experience in all compared to competitive approaches.
- a visualization of the network including clusters of devices therein can be generated.
- the visualizations can be displayed to a user, for instance within a interface (e.g., visualization interface shown in FIG. 1 ) on a display device of a computer device.
- Visualizations can be displayed having various graph topologies, that can convey the clustering and distance results to a user in an intelligible and visually discernable way.
- the visualization aspects can enhance the user experience, as well as improve the ease of use.
- the visualization can be presented to the user in an interactive manner.
- User interactions with the visualization can allow the user to provide input that impacts the clusters that are rendered in the visualization. For instance, as described in operation 525 , interactions with the visualization can cause additional clusters to be formed.
- FIGS. 6A-6C Various examples of visualizations that may be presented to a user, in accordance with the embodiments, are depicted in FIGS. 6A-6C .
- FIGS. 6A-6C examples of visualizations that can be generated as a result of the hierarchal distance techniques are shown.
- FIG. 6A illustrates an example of a visualization 600 the shows individual devices 601 and clusters 602 as close to each other. Visualization also displays arrows 603 that indicate the direction of aggregation.
- FIG. 6B shows another example of a visualization 620 .
- the visualization 620 places devices at the periphery of a circle 621 with center being the groups with all of the device as a group.
- the radius determined the height of the tree.
- FIG. 6C the groupings as a tree 631 . As seen, the tree has a top 632 .
- the top 632 of the tree 631 can be a group with all of the devices on a network included therein.
- FIG. 6D illustrates an example of a GUI 650 that can implemented for receiving user-configurable settings for the hierarchal distance algorithm (and the clustering approach).
- the GUI 650 may be implemented as an element of the visualization interface (shown in FIG. 1 ), in some instances.
- discovery paraments can be assigned to a hierarchal level as a configurable feature.
- the GUI 650 can include a window 652 for selecting which parameter is being assigned to a first hierarchal level.
- a user has selected “service” as the parameter corresponding to the first hierarchal level.
- a second window 653 shows “attribute” selected as the parameter that corresponds to the second hierarchal level.
- the window 652 includes an input for a place value to be assigned to the first hierarchal level.
- the first hierarchal level is set to the thousandths place.
- a place value assigned to the second hierarchal value is set in window 653 .
- the second hierarchal level is set to the hundredths place.
- the hierarchal distance settings can be received from the user, by interacting with the respective window 652 , 653 (or other elements) using a form of input deemed appropriate, such as keyboard entry, pull down menu, radio button, and the like.
- the hierarchal distance algorithm has configured by the user to consider service as the predominant property in measuring similarities.
- any distance value that is measured based on common services will be placed set in the thousandths place in the total distance.
- attributes has been selected as a less significant property in measuring similarities.
- Distance values that are measured based on common attributes will be set in the hundredths place in the total distance.
- the number of hierarchal levels can be configured, which may result in additional windows being displayed to receive the corresponding settings.
- settings that are related to extending the embodiments to multiple discovery protocols may be used.
- the GUI 650 can include an input mechanism for entering one or more protocols that may be applicable.
- FIG. 6D shows a setting for the threshold of similarity.
- FIG. 6D shows the threshold of similarity as a sliding bar input.
- the GUI 650 is particularly designed for a user to easily configure the clustering an hierarchal distance functions. That is, the GUI 650 allows a user to enter settings via simple input mechanisms, which do not require complicated user interactions or a deep knowledge of the algorithms applied. Having a general understanding of the hierarchy approach and some knowledge of the network environment, a user can, by in large, appropriately configure the system as desired.
- some of the configurable settings can be automatically populated, or allows the user to select from a group of provided settings, as a smart feature that further simplifies configuring the system.
- the examples of hierarchical distance settings shown in FIG. 6D are not meant to be exhaustive, and can include for other configurable features of the techniques disclosed herein.
- the configurable settings may be automatically set by the system either in whole or in part,.
- FIG. 7 depicts a block diagram of an example computer system 700 in which may be used in implementing various device grouping with interactive clustering using hierarchical distance features relating to the embodiments of the disclosed technology.
- the computer system 700 includes a bus 702 or other communication mechanism for communicating information, one or more hardware processors 704 coupled with bus 702 for processing information.
- Hardware processor(s) 704 may be, for example, one or more general purpose microprocessors.
- the computer system 700 also includes a main memory 708 , such as a random-access memory (RAM), cache and/or other dynamic storage devices, coupled to bus 702 for storing information and instructions to be executed by processor 704 .
- Main memory 708 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 704 .
- Such instructions when stored in storage media accessible to processor 704 , render computer system 700 into a special-purpose machine that is customized to perform the operations specified in the instructions.
- the computer system 700 further includes storage devices 710 such as a read only memory (ROM) or other static storage device coupled to bus 702 for storing static information and instructions for processor 704 .
- storage devices 710 such as a read only memory (ROM) or other static storage device coupled to bus 702 for storing static information and instructions for processor 704 .
- a storage device 710 such as a magnetic disk, optical disk, or USB thumb drive (Flash drive), etc., is provided and coupled to bus 702 for storing information and instructions.
- the computer system 700 may be coupled via bus 702 to a display 712 , such as a liquid crystal display (LCD) (or touch screen), for displaying information to a computer user.
- a display 712 such as a liquid crystal display (LCD) (or touch screen)
- An input device 714 is coupled to bus 702 for communicating information and command selections to processor 704 .
- cursor control 716 is Another type of user input device
- cursor control 716 such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 704 and for controlling cursor movement on display 712 .
- the same direction information and command selections as cursor control may be implemented via receiving touches on a touch screen without a cursor.
- the computing system 700 may include a user interface module to implement a GUI that may be stored in a mass storage device as executable software codes that are executed by the computing device(s).
- This and other modules may include, by way of example, components, such as software components, object-oriented software components, class components and task components, processes, functions, attributes, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables.
- the word “component,” “engine,” “system,” “database,” data store,” and the like, as used herein, can refer to logic embodied in hardware or firmware, or to a collection of software instructions, possibly having entry and exit points, written in a programming language, such as, for example, Java, C or C++.
- a software component may be compiled and linked into an executable program, installed in a dynamic link library, or may be written in an interpreted programming language such as, for example, BASIC, Perl, or Python. It will be appreciated that software components may be callable from other components or from themselves, and/or may be invoked in response to detected events or interrupts.
- Software components configured for execution on computing devices may be provided on a computer readable medium, such as a compact disc, digital video disc, flash drive, magnetic disc, or any other tangible medium, or as a digital download (and may be originally stored in a compressed or installable format that requires installation, decompression or decryption prior to execution).
- a computer readable medium such as a compact disc, digital video disc, flash drive, magnetic disc, or any other tangible medium, or as a digital download (and may be originally stored in a compressed or installable format that requires installation, decompression or decryption prior to execution).
- Such software code may be stored, partially or fully, on a memory device of the executing computing device, for execution by the computing device.
- Software instructions may be embedded in firmware, such as an EPROM.
- hardware components may be comprised of connected logic units, such as gates and flip-flops, and/or may be comprised of programmable units, such as programmable gate arrays or processors.
- the computer system 700 may implement the techniques described herein using customized hard-wired logic, one or more ASICs or FPGAs, firmware and/or program logic which in combination with the computer system causes or programs computer system 700 to be a special-purpose machine. According to one embodiment, the techniques herein are performed by computer system 700 in response to processor(s) 704 executing one or more sequences of one or more instructions contained in main memory 708 . Such instructions may be read into main memory 708 from another storage medium, such as storage device 710 . Execution of the sequences of instructions contained in main memory 708 causes processor(s) 704 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions.
- non-transitory media refers to any media that store data and/or instructions that cause a machine to operate in a specific fashion. Such non-transitory media may comprise non-volatile media and/or volatile media.
- Non-volatile media includes, for example, optical or magnetic disks, such as storage device 510 .
- Volatile media includes dynamic memory, such as main memory 508 .
- non-transitory media include, for example, a floppy disk, a flexible disk, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge, and networked versions of the same.
- Non-transitory media is distinct from but may be used in conjunction with transmission media.
- Transmission media participates in transferring information between non-transitory media.
- transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 702 .
- transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.
- the computer system 700 also includes a communication interface 718 coupled to bus 702 .
- Network interface 718 provides a two-way data communication coupling to one or more network links that are connected to one or more local networks.
- communication interface 718 may be an integrated service digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line.
- ISDN integrated service digital network
- network interface 718 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN (or WAN component to communicated with a WAN).
- LAN local area network
- Wireless links may also be implemented.
- network interface 718 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
- a network link typically provides data communication through one or more networks to other data devices.
- a network link may provide a connection through local network to a host computer or to data equipment operated by an Internet Service Provider (ISP).
- ISP Internet Service Provider
- the ISP in turn provides data communication services through the world-wide packet data communication network now commonly referred to as the “Internet.”
- Internet Internet
- Local network and Internet both use electrical, electromagnetic or optical signals that carry digital data streams.
- the signals through the various networks and the signals on network link and through communication interface 718 which carry the digital data to and from computer system 700 , are example forms of transmission media.
- the computer system 700 can send messages and receive data, including program code, through the network(s), network link and communication interface 718 .
- a server might transmit a requested code for an application program through the Internet, the ISP, the local network and the communication interface 718 .
- the received code may be executed by processor 704 as it is received, and/or stored in storage device 710 , or other non-volatile storage for later execution.
- Each of the processes, methods, and algorithms described in the preceding sections may be embodied in, and fully or partially automated by, code components executed by one or more computer systems or computer processors comprising computer hardware.
- the one or more computer systems or computer processors may also operate to support performance of the relevant operations in a “cloud computing” environment or as a “software as a service” (SaaS).
- SaaS software as a service
- the processes and algorithms may be implemented partially or wholly in application-specific circuitry.
- the various features and processes described above may be used independently of one another or may be combined in various ways. Different combinations and sub-combinations are intended to fall within the scope of this disclosure, and certain method or process blocks may be omitted in some implementations.
- a circuit might be implemented utilizing any form of hardware, software, or a combination thereof.
- processors, controllers, ASICs, PLAs, PALs, CPLDs, FPGAs, logical components, software routines or other mechanisms might be implemented to make up a circuit.
- the various circuits described herein might be implemented as discrete circuits or the functions and features described can be shared in part or in total among one or more circuits. Even though various features or elements of functionality may be individually described or claimed as separate circuits, these features and functionality can be shared among one or more common circuits, and such description shall not require or imply that separate circuits are required to implement such features or functionality.
- a circuit is implemented in whole or in part using software, such software can be implemented to operate with a computing or processing system capable of carrying out the functionality described with respect thereto, such as computer system 700 .
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Algebra (AREA)
- Physics & Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Pure & Applied Mathematics (AREA)
- Environmental & Geological Engineering (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Description
- Clustering can be described as assigning a set of objects to groups, such that the objects within the same cluster are more similar (according to a property) to each other than to those objects in other clusters. The concept of creating, or otherwise identifying, clusters of nodes is applied in many fields, including computer networking, statistics, data analysis, and bioinformatics, for example. Particularly in the realm of computer networking, clustering nodes using the concept of “similarity” is often based on the physical topology of the network. Some network clustering algorithms capture the intuitive notion that nodes may be clustered with other nodes that are proximally located, such as clustering devices sharing a local area network (LAN). Accordingly, distance is a property that often governs the clustering of nodes in computer networking technologies.
- There has been extensive work relating to distance measuring in the area of computer networks. Many existing distance measurement mechanisms are designed for obtaining distance related metrics that may be primarily dictated by the network topology, such as path delay, number of hops, and the like. As an example, some anycasting services employ a set of anycast resolvers that can measure the response times of replicated servers on behalf of clients to determine a distance therebetween (e.g., longer response time indicates larger distance between nodes). Thus, distance frequently serves as an anchor for determining “similar” nodes, and further for forming clusters of node that are present in a network. Nonetheless, it may be desirable to use clustering techniques driven by properties other than distance, that may be less tied to the physicality of the network.
- The present disclosure, in accordance with one or more various embodiments, is described in detail with reference to the following figures. The figures are provided for purposes of illustration only and merely depict typical or example embodiments.
-
FIG. 1 illustrates an example of a system distributed across a communications network and including a network device implementing techniques for device grouping with interactive clustering using hierarchical distance, according to some embodiments. -
FIG. 2 illustrates examples of domain name services (DNS) entries that can be collected and analyzed by the network device shown inFIG. 1 to identify host associations utilized by the device group techniques (e.g., host name-to-node association for visualization of clustering results), according to some embodiments. -
FIG. 3A illustrates an example of mapping of services to network addresses in accordance with a multicast domain name service (mDNS) protocol, according to some embodiments. -
FIG. 3B illustrates a graphical representation of a linked set of records associated with in accordance with a mDNS protocol, according to some embodiments. -
FIG. 3C is a conceptual diagram illustrating examples of the hierarchal distance features as applied for extensibility across multiple protocols, according to some embodiments. -
FIG. 4 is a conceptual diagram depicting examples of relationships between distances and similarities between device groups, according to some embodiments. -
FIG. 5 is an operation flow diagram illustrating an example of a process for executing device grouping with interactive clustering using hierarchical distance, according to some embodiments. -
FIGS. 6A-6C depict examples of network graphs generated using visualization aspects of the device grouping system disclosed herein, according to some embodiments. -
FIG. 6D depicts an example of a user interface for configuring the interactive clustering using hierarchal distance features disclosed herein for adaptability and extensibility to multiple protocols, according to some embodiments. -
FIG. 7 illustrates an example computing device that may be used in implementing various device grouping with interactive clustering using hierarchical distance features relating to the embodiments of the disclosed technology. - The figures are not exhaustive and do not limit the present disclosure to the precise form disclosed.
- Various embodiments described herein are directed to techniques and systems for device grouping with interactive clustering using hierarchical distance. As alluded to above, it may be desirable to use a clustering technique that is not driven solely by distance measurements. For example, an administrator may request network analytics that require devices of a common type, such as Apple Macintosh® (Mac®) computers, to be grouped in the same cluster. The clustering techniques disclosed herein can be configured to measure similarities (or dissimilarities) based on various properties, such as common services, common attributes, same resource type, and the like. Accordingly, the clustering techniques disclosed herein may provide advantages of flexibility and configurability over conventional network clustering mechanisms, which are often limited to analysis based on the physical topology and/or performance characteristics of the network. Furthermore, these properties can be retrieved from metadata associated with discovery protocol traffic. In other words, text-based analysis, such as natural language processing (NLP), can be used to analyze metadata and ultimately measure similarities between devices for clustering. In some embodiments, NLP techniques involve using dictionary encoding to remove biases that may be inherent in text-based distance approaches (e.g., length of the parameter names). For example, comparing a first parameter name set: Deviceid, pk, pn to a second parameter name set: Deviceid, pk, mn using the hierarchal distance algorithm may result in substantially close (e.g., two out of three) distance measurement. In some embodiments, the hierarchal distance algorithm allows a lexical analysis to be done with NLP to segment parameter names with a value. That is, analysis using the hierarchal distance approach can determine that two parameters (out the set containing three parameters), namely Deviceid and pk are common amongst both parameters sets. In contrast, a text-based measurement may calculate an closer distance due to the length of the text string for matching parameters, Deviceid (having eight characters) is longer than the length of the text string for matching parameters, pn vs. mn (having two characters). Thus, NLP analysis can establish a way to normalize text that is analyzed in the hierarchal distance algorithm. Because of this normalization, similarities are measured from the context of the text, for instance determining the same type of service is conveyed by text, rather than measuring the text strings themselves. The examples can achieve improved distance measurements and optimal clustering, by adapting the hierarchal distance algorithm to remove biases related to text.
- Tools like language models, whose use is typically delegated to NLP-based applications, can be applied to discovery protocol traffic in a manner that ascertains similarities between network devices based on properties that are recognizable as text. Thus, the disclosed techniques may realize improved performance and efficiency over mechanisms that primarily use statistical (or mathematical) measurements, which may also require greater computational complexity.
- Furthermore, the systems and techniques disclosed herein implement interactive clustering features. For example, the embodiments include a graphical user interface (GUI) that allows a user to interact with and configure aspects of the clustering techniques. According to the embodiments, user interactions can include configuring parameters related to a distance algorithm, which in turn impacts how clustering is performed. The interactive clustering aspects of the embodiments provide flexibility, such that a user can adapt clustering to be performed as deemed appropriate (or optimal) for the prevailing application (e.g., network environment, analytics, etc.). For example, a user can set a more restricting threshold for identifying devices as “similar” (e.g., decreasing the potential of finding similarities), or conversely a less restricting threshold for identifying devices as “similar” (e.g., increasing the potential of finding similarities). The configurable parameters may be adjusted based on multiple factors, such as a desire for a larger number of groups in a cluster (e.g., larger clusters), and the like. This approach is similar to looking at fractals where the self-similarity could be displayed at by zooming in and out. Distance can be approximated, as customized by a user, for number of groups.
- Furthermore, the disclosed techniques involve a hierarchical distance approach. As a general description, the hierarchical distance approach determines a quantitative measurement of distance between nodes that is governed by a prioritization (e.g., hierarchical order) of characteristics that may be used to ascertain similarities between the nodes that is more qualitative. According to the embodiments, the hierarchical distance approach is the underlying concept which allows for a set of parameters (e.g., retrieved from discovery protocol traffic) to be used as a measure of similarity. As alluded to above, text-based analysis, such as NLP, can be utilized for measuring similarities between devices. The hierarchal distance approach adds a dimension to text-based analysis that extends beyond flat string distance algorithms. In the embodiments, multiple hierarchical levels can be assigned to properties that are recognizable as text, namely parameters of the discovery protocol traffic. Thus, each hierarchical level corresponds to both a parameter and a distance value (or a degree of similarity) relating to the particular parameter, which serves as a link for subsequently measuring distances based on these parameters. By employing text-based analysis in part, rather than relying fully on text, the hierarchical approach mitigates some drawbacks that are associated with implicit biases that can be implicit in text-based analysis. For example, a length of text string can largely impact the flat string distance algorithm. Text strings that are longer and may change frequently can indicate a greater distance. However, in some networking scenarios, the same device may communicate multiple pre-shared keys that have these characteristics. In this example, a fully text-based analysis using a flat string distance algorithm, for instance, may determine a large distance based on the pre-shared keys, even in the case of a complete similarity (e.g., same device). Thus, supplementing NPL techniques with the hierarchical distance approach in the manner of the disclosed embodiments, may optimize the trade-offs between limitations and advantages associated with text-based analysis.
- The embodiments include mechanisms for passively collecting and analyzing discovery traffic. For example, the device grouping system disclosed herein leverages edge devices to listen to discovery traffic within the network, rather than employing mechanisms that inject additional traffic into the network that is solely for the purpose of analysis. Additionally, the system provides a minimal footprint by deploying fewer packet processing devices at strategic points in the network architecture (e.g., edge devices). As discussed herein, metadata from collected packets can be analyzed such that the information can be used to derive network analytics, namely the disclosed device grouping techniques. Discovery protocols consistent with the present disclosure may include a dynamic host configuration protocol (DHCP), a domain name service (DNS), a multicast DNS (mDNS) protocol, a link layer discovery (LLDP) protocol, a CISCO discovery protocol (CDP), and many more that are low in volume, but high in information content about the network. Discovery protocols include information that allows devices to operate in the network. Furthermore, the information included in a service advertisement message has analytical value. For instance, a mDNS message includes text that particularly corresponds to network and device characteristics, such as domain names, and services, that can be used as measure of similarity to perform clustering. Applying text-based analysis, namely NLP, to network traffic that has high informational content about the network and the devices thereon is the underlying concept for the distance algorithm.
-
FIG. 1 illustrates an example of asystem 100 distributed across a communications network 110 and including a network device, shown asanalyzer 140. Theanalyzer 140 can be configured for implementing techniques for device grouping with interactive clustering using hierarchical distance, according to some embodiments. InFIG. 1 , an example network architecture includingclients devices 110A-110N andpacket processor 130 that can be proximately located, for instance within the same customer premises. Additionally, theclient devices 110A-110N and thepacket processor 130 can be communicatively connected to each other as part of a local area network (LAN) 101 (indicated by dashed lines).LAN 101 may be installed at the customer premises, such as in retail a store, a business (e.g., restaurants, shopping malls, and the like), a factory, an office building, and the like. In that regard,LAN 101 may include one or more of theclients devices 110A-110N. -
Client devices 110A-110N may include a desktop, a laptop, or a mobile device such as a smart phone, a tablet, or any other portable computing device capable to communicate throughLAN 101. In that regard, client device 110 may include several types of devices, which, even in the case that client device 110 is mobile, may be loosely or less often associated or co-located with a user. Another type of client device 110 may be more often or almost always associated or co-located with a user (e.g., a smart phone or another wearable device). - The network of
FIG. 1 may include tap points, which can be points on the network to monitor, by apacket processor 130, network devices and data between the tap points andclient devices 110A-110N. In some cases, tap points can be at a “network edge.” Tap points can be described as locations that can have a visibility of local multicast based discovery that may not be routed beyond the network segment. Also, in some cases, tap points can be locations where a unique address can be obtained in all network layers, such as Media Access Control (MAC) and Internet Protocol (IP) in their respective layers. Network edges can provide predictable endpoints, e.g., tap points, from where to extract sample packets with apacket processor 130. Network edge can be a sensitive area where the pulse of the network, which is the LAN inFIG. 1 , may be accurately registered and diagnosed. Alternatively, one or more tap points may be placed at any point in LAN 101 (e.g., not at the network edge). In some examples, tap points may be placed in theLAN 101 in order to provide visibility (e.g., topacket processor 130 via SPAN tunnels) of source IPs and/or discovery protocol traffic. As an example, a tap point can be placed between a router (not shown) and theLAN 101 to monitor discovery protocol traffic that are not routed beyond the router due to the nature of the discovery protocol or due to the broadcast discovery technique. In some examples, placing tap points between router and theLANs 101 may enable snapshotting of packets prior to any network address translation performed by router, thereby preserving the client IP and the frequency of resolution by each client. In some examples, such SPAN tunnels (e.g., SPAN tunnel 170) may connect router to tap points, such as whenLAN 101 is switched. Network topology is highly dynamic (e.g., transient) on network edge 120; accordingly, placing thepacket processor 130 at the edge enablespacket processor 130 to determine how edge devices (e.g., APs andclient devices 110A-110N) continue to connect, authenticate, and access to perform routine functions. - Accordingly, embodiments as disclosed herein include accurately determining the number and location of tap points in which to place one or
more packet processor 130 to handle network volume. Some embodiments include the use of discovery tools, which operate within network edge and provide high-value, but low volume data traffic. Thus, in some embodiments,packet processor 130 uses discovery tools in addition to deep packet inspection metadata extraction operations to handle network analysis before the first hop protocols seen at the level of network edge 120. Access to this is obtained by either configuring a router to locally SPAN to a co-located or remote SPAN through a network to setup remotely but routable (e.g., into DNS server 100). This approach substantially reduces the bandwidth strain imposed in network resources by typical network analysis devices. In some embodiments,packet processor 130 may absorb less than 0.05% to 1% of the network traffic volume, opening up a wide bandwidth for other network resources and/or compute/storage resources. For instance, storage resources with the capacity to store a month of data collected via previous techniques may be able to store two years of data collected via the present techniques. - As seen in
FIG. 1 , theclient devices 110A-110N can communicate various intent to access (ITA)messages 120A-120N. For purposes of discussion,ITA messages 120A-120N can be generally described as packets, records, or messages, that enable devices on a network to announce information related to its configurability (e.g., services and associated parameters) and accessibility in a manner that allows the devices to discover, connect, and communicate with each other on the network. In the example ofFIG. 1 , device discovery can be accomplished in accordance with a discovery protocol, namely mDNS. However, it should be understood thatITA messages 120A-120N can be messages that indicate an intent to access using various other discovery protocols, such as DTP, DNS, SSDP, and the like. An mDNS transaction can be indicative of intent to access, and thus are also referred to as ITA messages herein, and illustrated asITA messages 120A-120N inFIG. 1 . In some example, an mDNS transaction includes communicating mDNS records (shown inFIG. 2 ) that advertise types of services related to a particular device within a network, or that are more visible more widely. - As an example of a service discovery protocol using mDNS in
FIG. 1 , aclient device 110A can communicate an mDNS record, such as anITA message 120A, when theclient device 110A becomes available to the network (e.g., after establishing connection to LAN 101). The ITA message 120 a, including an mDNS record, can allow the client device 120 a to advertise its capabilities (e.g., services) onLAN 101. In the case of a device advertising, an ITA messages can be referred to herein as service advertisement messages. Theclient device 110A can transmit theITA message 120A to one or more other devices connected to the LAN 121 as part of a discovery process. Thus, theother client devices 110B-110N, upon receiving theITA message 120A, can discover theclient device 110A, its advertised services, and the associated parameters. Client devices 11013-110N that may be consumers of the advertised services can use the parameters indicated in theITA 120A message to evaluate interoperability, connection methods, and other runtime operational compatibility to enable the services in the network. Alternatively, mDNS based service discovery can allowclient device 110A to query the network to determine services that are available (e.g., services advertised byclient devices 110B-110N). In some cases, mDNS can accomplish service discovery with zero configuration (also known as “zeroconf”). It should be appreciated that althoughonly client device 110A is described in reference toFIG. 1 , any of theother client devices 110B-110N on theLAN 101 are capable of communicating theITA messages 120A-120N, for instance discovery messages in accordance with the mDNS protocol. Moreover, different mDNS records may have different configuration settings in terms of requirements and capabilities, access and privileges, based on the specification of LAN 121, and intended purpose. - The
packet processor 130 situated at a tap point, as described above in detail, can intercept, or otherwise collect,ITA messages 120A-120N that may be communicated viaLAN 101. Thus, the embodiments as disclosed herein require a comparatively small portion of the network traffic, namely the discovery traffic, to implement the device grouping to be further used in data analytics. In addition, some embodiments this capacity is enhanced by implementation at the network edge. In the illustrated example, thepacket processor 130 can transmit the collectedITA messages 120A-120N, also referred to as discovery traffic, to ananalyzer 140 which is a separate network device employed for analyzing the collected discovery traffic for analytics. In accordance with the embodiments, theanalyzer 140 implements the device grouping and interactive clustering using distance features disclosed herein. - In some embodiments,
packet processor 130 inspects the discovery traffic that may be initiated byclient devices 110A-110N to discover the network resources with an application layer protocol (APP) or browser-based application installed on theclient devices 110A-110N. The same application that discovers the network resources may initiate hypertext transfer protocol (HTTP), or HTTP-secure (HTTPS) or other application protocol to access the network resource fromclient devices 110A-110N. In some embodiments,packet processor 130 may use mDNS to resolve host names to IP addresses. Other protocols that can be used bypacket processor 130 can include DNS. In the case of DNS, a DNS server can provide a DNS to the operating system ofclient devices 110A-110N, to map a network resource name configured in the APP to an IP address in network architecture. In some embodiments, a DNS server transmits resolution requests toclient devices 110A-110N through DNS responses. SSDP tools may be used for resources co-located at the edge (e.g., plug and play devices, and the like). More specifically, some embodiments use the request part of discovery tools (e.g., protocols including memory devices storing commands and processors to execute the commands) for identification/discovery ofclient devices 110A-110N, which are typically multicast, thereby facilitating access to at least one copy. The host responses (or server/protocol proxy node's responses) carry equally critical info that provide the “network view,” but may involve more network resources to track. - Network administrators monitor traffic to identify anomalies and deficiencies before major problems arise, e.g., loss of connectivity or network services for a client device in a wireless network (e.g., Wi-Fi and the like), or a local area network (LAN), or the spread of malware, data theft, security breaches, and the like. In embodiments as disclosed herein, an
analyzer 130 can be configured with aninteractive clustering module 141 that enables a network administrator to leverage discovery traffic for measuring similarities between devices on theLAN 101 in a hierarchical and configurable manner. Accordingly, as part of network analysis, the network administrator can generate a topological view of the network, namelyLAN 101, showing devices on the network that are grouped together based on the characteristics of the devices (e.g., services and parameters) rather than conventional distance measurements, such as determining the a number of device groups statistically. -
FIG. 1 shows theanalyzer 140 as being a device that is remotely located fromLAN 101 on customer premises (e.g., “cloud” deployment). In some embodiments,analyzer 140 can been located onLAN 101 rather than external to the network. As seen inFIG. 1 , discovery traffic in the mDNS protocol that has been collected bypacket processor 130 can be communicated, viacommunication network 170, to the analyzer.Communication network 170 can include, for example, a wide area network (WAN), the Internet, and the like. In some embodiments,analyzer 140 has full access to an associateddatabase 142.Database 142 may store information related to discovery traffic and protocols for the analytics performed by theanalyzer 140. In some examples,database 142 may be a distributed network accessible database (e.g., Hadoop-like distributed network accessible database) that can process workflows, discovery tools, and the like. In some embodiment, in addition to the interactive clustering and device grouping features, theanalyzer 140 may perform other forms of network monitoring and analytics. For instance,analyzer 140 can apply machine-learning algorithms (e.g., neural networks, artificial intelligence, and the like) to build multiple user profiles and other network patterns (e.g., identify potentially harmful IP addresses or suspicious traffic behavior) that are stored indatabase 142. A user profile may include the type ofclient device 110A-110N used to log intoLAN 101, the period of time that the connectivity lasted (latency), patterns of connectivity, and the like. In that regard,database 142 may also include DPI libraries to maintain flow states including handshake states betweenclient devices 110A-110N and access points. In some embodiments, at least a portion of theanalyzer 140 may be deployed within network edge of the LAN 101(e.g., “on-premises” deployment). - In the embodiments, the
interactive clustering module 141 includes executable instructions, computer components, or a combination of both that implement the specific functions of the interactive device clustering and device grouping aspects of the embodiments. For example, the interactivedevice clustering module 141 can include a graphical user interface (GUI) for receiving various user-configurable parameters entered by user, such as the network administrator associated withLAN 101. Therefore, the embodiments can provide an end user with the capability to adapt the hierarchical distance algorithm to function in a manner consistent with their intended analytics application. In some embodiments, a user can configure the hierarchical distance algorithm by assigning a respective value to each of the hierarchal levels of the algorithm. Consequently, by adjusting the values, the user can set which discovery parameters (that correspond to a particular hierarchical level) serve as a greater indicator of similarity by the algorithm. As an example, the network administrator can enter input place values that respectively correspond to a discovery parameter into the GUI of theinteractive cluster module 141. In the case of mDNS protocol, a user can assign a higher place value to a particular discovery parameter, such as setting “services” to correspond with the thousandths place, while assigning a lower place value to another discovery parameter, such as setting “attributes” to the tenths place. Thus, the settings effectively adjust the clustering approach to utilize “service” as the discovery parameter having the highest weight in measuring similarity between devices. These abovementioned values used in configuring the hierarchal distance algorithm can range from place values (also referred to herein as decimal values), various orders of magnitude, or other mathematically related groupings as deemed appropriate. In some embodiments, the levels of hierarchy can be even broader than the protocol specific approach discussed above (levels restricted to parameters within a certain protocol). For instance, theinteractive clustering module 141 can be configured to include a hierarchal level for different protocols (e.g., mDNS, DNS, SSDP, etc.), thereby allowing devices that communicate in a common protocol to be considered a property for similarity. In some cases, a hierarchical level for message type (e.g., discovery, advertisement) can be used. Details regarding the association between hierarchical levels and values, as applied by the hierarchal distance algorithm are discussed further in reference toFIG. 3C . - The
interactive clustering module 141 can include other configurable parameters that are described in further detail herein, for example in reference toFIGS. 6A-6D . the some cases, this can signify that the network administrator is tuning a threshold a similarity such that the device grouping generates larger device groups (e.g., having a greater number of devices in each group). Furthermore, theinteractive cluster module 141 can implement various NLP techniques that can be used in extracting text from the discovery traffic in the mDNS protocol, and then applying text-based analysis for measuring similarity between devices. Thereafter, theinteractive cluster module 141 can use these degrees of similarity to calculate a distance measurement. For instance,interactive cluster module 141 can be configured to calculate a greater distance between two device on the network that have less similarities with each other. Conversely, a shorter distance between two devices on the network may be calculated by theinteractive cluster module 141, when the devices are more similar to each other. Furthermore, a threshold of similarity, which governs the degree necessary for devices to qualify as similar (or dissimilar), is also a configurable parameter of the hierarchical distance algorithm. Accordingly, theinteractive cluster module 141 provides the adjustability for a user to either restrict or broaden the requirement for a cluster, thereby configuring the distance algorithm to be predisposed for generating larger groups of devices (e.g., more devices in a group) or smaller groups of devices (e.g., less devices in a group). It should be appreciated that the interactive clustering and device grouping techniques disclosed can be adaptable for use with various other discovery protocols, and thus is not limited to applications using the mDNS protocol. Embodiments that are extended for use with other discovery protocols is discussed in greater detail in reference toFIG. 3C , for example. - Additionally,
FIG. 1 shows anetwork visualization client 150 including anetwork visualization module 151. For example, theanalyzer 140 can be a centralized computer, such as a server, having a processing capacity that is suitable to support the data processing and analysis necessary to implement the interactive clustering and device grouping features disclosed. Thevisualization client 150 may be a client device having network analysis applications, such as thevisualization interface 152, that consumes the analytical data processed byanalyzer 140. As an example, thevisualization device 150 can be a desktop, a laptop, or a mobile device such as a smart phone, a tablet, or any other portable computing device that can be used by a network administrator for monitoring a network, such asLAN 101. In some instances, thevisualization client 150 and theanalyzer 140 are communicatively connected via a network (not shown) allowing communication of data between the devices. In the embodiments, thenetwork visualization module 151 includes executable instructions, computer components, or a combination of both that implement a visualization of the network. The visualization, shown inFIG. 1 as output of anvisualization interface 152, can include graphical representations of the device groups generated by theinteractive clustering module 141. - The visualization interface is illustrated in
FIG. 1 as displaying a network graph as a result of the clustering and grouping performed by theanalyzer 140, in accordance with the embodiments. The network graph can be a visual representation of the topology of a network, having visual cues, such as nodes, for identifying client devices, and traffic in a network having devices that utilize the mDNS protocol for resolving host names and IP addresses, according to some embodiments. Nodes can represent various types of network devices, including by not limited to, a client device, a router, an AP, a host server, a database, or any network device in a network architecture as disclosed herein (e.g., client devices 110 a-110 n). For example, thevisualization interface 152 may present a graph which represents client devices 110 a-110 n onLAN 101 that are measured as having small distances from each other, as determined by the distance algorithm, as a cluster of nodes. Alternatively, the graph displayed withinvisualization interface 152 can show client devices 110 a-110 n onLAN 101 that are measured as having large distances from each other, as determined by the distance algorithm, as individual nodes separated by edges (having a length that is commensurate with the calculated distance). Furthermore, as described in greater detail in reference toFIG. 5 , the visualization can be generated in an interactive manner. For instance, thevisualization interface 152 can receive input from a user (e.g., merge device groups) that adds clusters to the visualization. Thevisualization client 150 can include an input device, and an output device. Input device may include a mouse, a keyboard, a touchscreen, and the like, that can be utilized by the user to interact with the visualization. An output device of thevisualization client 150 may include a display, a touchscreen, a microphone, and the like, which displays a visualization. In some embodiments, input device and output device of thevisualization client 150 may be included in the same unit (e.g., a touchscreen). -
FIG. 2 illustrates examples of domain name services (DNS)entries analyzer 140 shown inFIG. 1 . In some embodiments, the records may communicated in accordance with the mDNS protocol, for instance by a device advertising its services to the network. As seen inFIG. 4 , each of theDNS entries entry parameters DNS records entry parameters - In the illustrated example of
FIG. 2 , the set ofrecords tuples tuples 200”).Tuples 200 include DNS names 201-1, 201-2, and 201-3 (hereinafter, collectively referred to as “DNS names 201”), a resource type 202-1, 202-2, and 202-3 (hereinafter, collectively referred to as “resources 202”), its associated Host/IP addresses 203-1, 203-2, and 203-3 (hereinafter, collectively referred to as “Host/IP addresses 203”) and time to live (TTL) 204-1, 204-2, and 204-3 (hereinafter, collectively referred to as “TTL”). In particular,FIG. 2 illustrates that by inspecting the text of entry parameters of DNS names 201 and DNS Host/IP addresses 203, it can be determined that the service corresponds to “airplay” and the hosts corresponds to “Cali.” Accordingly, the disclosed hierarchical distance techniques can use the text based distances between theseentry parameters DNS entries -
FIG. 3A illustrates anexample mapping 305 ofservice 310 to anetwork address 330 in accordance with a multicast domain name service (mDNS) protocol, according to some embodiments.Mapping 305 may be performed by a mapping tool in a DNS server. For example, the DNS server can be at a location that is close to a tap point and the packet processor in the network (e.g.,packet processor 130 shown inFIG. 1 ). For purposes of discussion,FIG. 3A will be described in relation toFIG. 3B , which illustrates a graphical representation of a linked set of records that can be linked together based on the mapping. - As a general description, mDNS conveys a set of relationships within the advertisement of records. For each node, their advertised services and attributes can be collected as multiple records. Thus, for a single node, a set of associated records for that node would list all of its services and attributes. Also, a record indicating a service instance can be used to link a node to a service and attribute. These relationships, which can be inferred from mDNS based records, are leveraged by the embodiments to further measure similarities between nodes for clustering. Referring now to
FIG. 3A , a record can include an advertised service, which is indicated by service type (shown inFIG. 3B as “_airplay_tcp_local”). Apointer record 311transfers service type 310 to a service instance 321 (shown inFIG. 3B as “Cali._airplay.tcp_local”). Atext pointer 321 associates a text record with theattributes 320 of the service (shown inFIG. 3B as “deviceid=a8:60:b6:12:ef:f5;features=0x4a7ffff7,0xe;flags=0xc;model=appletv5,3;pin=1;pk=271 e7ccc629ee96a1eeeb2a12f7cc7203c1ea1dc5dd80d27c91c03127f762987;srcvers=220.68;vv=2”). Aservice record 316transfers service instance 315 to a node 325 (shown inFIG. 3B as “Cali-mm.local”). The request forIP address 330 fromnode 325 may use two types of requests, 326 and 327 (hereinafter, collectively referred to as “requests”), or may follow an indirect request through a CNAME. Arequest 326 may include an IPv4 address record and arequest 327 may include an IPv6 address record leading to IP address (shown inFIG. 3B as “26.222.290.270” through IPv4, or “fe80::92ac:3fff:fe09:735b” through IPv6). As a result of analyzing the mDNS records, and the relationships conveyed therein, the data can be prepared for further text-based analysis that is performed during interactive clustering. For instance, NLP techniques applied to analyzed records can include: dictionary encoding a service type extracted from a record, tokenizing text records, and dictionary encoding attributes (and attribute values) within their respective name spaces. -
FIG. 3C is a conceptual diagram illustrating examples of the hierarchical distance feature as applied for extensibility across multiple protocol. As alluded to above, the concept of hierarchal distance adds a dimension to text-based analysis that extends beyond flat string distance algorithms. As seen inFIG. 3C , the approach can include setting multiple hierarchical levels illustrated bygraph 350, which can be described as generally having a tree structure. In the example,graph 350 includes multiple hierarchal levels that are arranged in a descending order from afirst level 351, asecond level 352, athird level 353, afourth level 354, and a fifth level. The levels in the hierarchy can reflect the significance of that level in determining similarities between devices. For example, thefirst level 351 in the hierarchy may be assigned to properties that are intended to be the greatest indicator that devices are indeed similar. Thesecond level 352 in the hierarchy may be assigned to properties that are slightly lower indicators of similarity, and so on down the hierarchy. Finally, the fifth level in the hierarchy may be assigned to properties that are considered the lowest indicator of similarity. - Also,
FIG. 3C shows that each hierarchal level 351-355 can correspond to properties that are recognizable as text, namely parameters of the discovery protocol traffic. Even further,FIG. 3C serves to illustrate that this hierarchy can be applied to different protocols, and their corresponding parameters. That is, the hierarchal distance algorithm can be configured to a level of abstraction that accommodates discovery traffic having a plurality of different protocols. As such, clustering can be achieved by the embodiments whether mDNS, DNS, SSDP, or other protocols are used in the network environment. In some cases, the hierarchal distance algorithm can be configured in a manner that is protocol-specific, where the parameters in the hierarchy specifically correspond to a certain protocol. In accordance with the adaptability features of the system, the number and/or type of protocols that can be analyzed for clustering and device grouping is intended to be dynamically tunable as deemed necessary or appropriate. In the illustrated example,graph 350 shows a hierarchy in which the secondhierarchal level 352 is assigned to different protocols. Even further, the lower hierarchal levels 353-355 of the hierarchy include parameters that are conventionally used in each of the protocols oflevel 352. Due to this hierarchal arrangement, the hierarchal distance algorithm can execute a scheme that is capable of measuring distance for each of the designated protocols. The hierarchy ofgraph 350 includes a message type “Discovery” and “Advertisement” in the firsthierarchal level 351. Protocols are included in the next level of hierarchy, showing “mDNS”, “DNS”, “SSDP”, “LLDP”, “HTTP”, and “HTTPS” inhierarchal level 352. The descendingthird level 353 includes: “SVC1” and “SVC” (associated with mDNS); “Resolved Enterprise Domains” (associated with DNS); “Domain” (associated with SSDP); “ATTR” (associated with LLDP); “User Agent” (associated with HTTP); and “Certificates” (associated with HTTPS). The further descendingfourth level 354 includes: “Feature” (associated with mDNS); “Device Type” and “Service Type” (associated with SSDP); “Value” (associated with LLDP); “Attributes” (associated with HTTP); and “Issuers” (associated with HTTPS). The last, andfifth level 355 includes: “Attributes (associated with mDNS); and “Domains” (associated with HTTPS). As seen, it is not required for each of the protocols to have parameters that extend to each of the levels of the hierarchy. - Furthermore, it is illustrated that place values 361-365 correspond to the hierarchy in
graph 350. For example, each of the place values 361-365 can be assigned to one of the hierarchal levels 351-355, respectively. The example hierarchy inFIG. 3C has ahigher place value 361, shown as the thousandths place, that corresponds to the firsthierarchal level 351. A comparativelysmaller place value 362, shown as the hundredths place, corresponds to the secondhierarchal level 352 for the protocols. A place value 363, shown as the tenths place, corresponds to the thirdhierarchal level 353. Aplace value 364, shown as the ones place, corresponds to the fourthhierarchal level 354. Aplace value 365, shown as the 1/tenths place, corresponds to the fifthhierarchal level 355. Thus, each hierarchical level 351-355 corresponds to both a parameter and a place value 361-365 relating to the particular parameter, which serves as a link for subsequently measuring distances based on these parameters. As the levels (top-down) in the hierarchy reflect a decreasing significance of a parameter in determining similarities between devices, each of the place values 361-365 for a lower level also decreases (with respect to the place value at the higher level). In other words, each hierarchal level 351-355 contributes a value, vis-à-vis its place value 361-365 to the distance value that reflects the significance of that level. As an example with respect to the hierarchy inFIG. 3C , devices having a complete overlap in protocols in hierarchal level 352 (e.g., value of 0 in the thousandths place), will contribute a value that has a more significant impact on the total distance, in accordance with the hierarchal distance algorithm, than devices having a complete overlap of attributes in hierarchal level 355 (0 the 1/tenths place). The hierarchy ofFIG. 3C is an example for purposes of illustration, and is not intended to limit the scope of the embodiments. The variables relating to the disclosed hierarchical distance aspects are intended to be configurable as deemed necessary or appropriate. In some embodiments, the hierarchy inFIG. 3C can be implemented based on user inputs, such as settings (e.g., GUI shown inFIG. 6D ) that are received by the system to configure, and subsequently implement, the hierarchal distance algorithm. - It should be appreciate that all of the protocols and associated parameters are not limited to advertisements. For instance, DNS attributes can be seen in transactions during discovery of internal servers by end-users. While not an advertisement, the communication can be considered to include the text data necessary for the text-based distance measuring techniques disclosed herein. Additionally, the hierarchal approach can be extended to other protocols that include device attributes or parameters in their headers, such as HTTP and/or HTTPS. In some cases, preparation of the data specific to each application can be done specific to the application itself. For the case of HTTP and HTTPS, the approach can use a filtered set of user-agent, certificates respectively based on the higher level application transported by HTTP and/or HTTPS.
- In addition configurable weights based on the application can provide additional flexibility in changing the relative contribution of each application to distances. With collection of data from various networks over a period of time with enough labels for device types, all these configurability can be trained through a neural net making it extensible even further.
- In some cases, the hierarchal aspects can be extended to scenarios involving devices searching for services (e.g., the discovery of services by consumers). In this case, the hierarchy can be configured to assign a hierarchal level for measuring a similarity regarding a number of resource advertisements and discovery is attributed to the device.
-
FIG. 4 is agraph 400 depicting calculated distances in relation to similarities between groups of devices. As previously described, the hierarchal distance algorithm is configured to calculate a distance between two devices, by measuring similarities (or dissimilarities) based on various properties related to the devices, such as common services, common attributes, same resource type, and the like. Thus, the calculated distance reflects this similarity (or dissimilarity), which is further utilized by the clustering approach to form device groups. Thegraphical representation 400 serves to illustrate this relationship between the calculated distance and a degree of similarity that can be used for clustering. Thegraphical representation 400 can be described as multiple Venn Diagrams including circles representing a set, or group of devices. The common elements of the sets being represented by the areas of overlap among the circles, where the overlap can represent a degree of similarity.FIG. 4 illustrates that a calculated distance can have an inversely proportional relationship to similarity, as the degrees of similarity between sets in each of the Venn Diagrams steadily decreases, as the distance value increases (indicated by the arrow from left to right). - Referring now to diagram 405 in
FIG. 4 , the relationship between a small distance and its associated similarity between sets are shown. For example, a distance value that is approximately zero may reflect a high degree of similarity. This is shown in diagram 405 as there is a complete overlapping of the two sets (or subsets) of devices. Thus, diagram 405 appears as one circle, as the two sets share all common features. Furthermore, as the sets are the same size (e.g., same number of devices), there is substantially no symmetric difference (e.g., negligible difference outside of the area of overlap). The scenario represented by diagram 405 can be considered as one device in some clustering implementations. - Diagram 410 a represents a relationship between a slightly larger calculated distance (with respect to the distance for diagram 405) and the associated degree of similarity. As seen, set 411 a (shaded circle) is of a smaller size than set 412 a. Although the circular area of set 411 a is completely contained within set 412 a, there is some difference between the sets, which is the area of 412 a that is outside of its partial overlap with 411 a. In this scenario, set 411 a can be described as a subset of set 412 a. This can indicate there are some features of a device (e.g., set 412 a) that are not present the compared device (e.g., set 411 a), as opposed to being different. Diagram 410 b illustrates a similar scenario, however there is a greater difference between the sizes of sets 411 b (shaded circle) and 412 b. The size of set 411 b is smaller as compared to 411 a in diagram 410 a. Consequently, set 412 b has a larger area that exists outside of its overlap with 412 b. There is some dissimilarity between the sets 411 a, 412 a and 411 b, 412 b, thereby illustrating that there is a smaller degree of similarity represented in diagrams 410 a, 410 b (distance is substantially small) than in diagram 405 (distance approximately 0) as the distance value has increased.
- Diagram 415 a represents a relationship between a substantially large calculated distance, and the associated degree of similarity. Diagram 415 a shows a set 416 a (shaded circle), having a primarily same size as set 417 a. Again, there is partial overlap between the sets 416 a and 417 a. Set 416 a and set 417 a primarily overlap with each other, indicating that there is a high degree of similarity. However, there is a portion of set 416 a that is outside of the area of overlap, and portion of set 417 a that is outside of the area of overlap. Restated, both set 416 a and set 417 a have some dissimilarities with respect to each other (as opposed to being a subset). Now referring to diagram 415 b, a scenario that is similar to 415 a is illustrated. But, in diagram 415 b, there is a greater symmetric difference between set 416 b (shaded circle) and 417 b, as compared to diagram 415 a. Specifically, the area of overlap in 415 b is smaller than the areas that are not common in the independent sets 416 b and 417 b. Diagram 415 b represents a case where there is more dissimilarity than similarity present between the devices. Therefore, diagrams 415 a and 415 b illustrate an even smaller degree of similarity, as it related to a distance value has increased. In some cases, clustering can involve adding weights that can reflect a ratio between the area of overlap versus the area of non-overlap. For instance, in reference to diagrams 415 a and 415 b, there may be a larger weight added to similarity in the scenario of diagram 415 a, as there is a more overlap than difference present. In some cases, the sizes of the set are also weighted in determining similarity for clustering.
- Diagram 420 a can be generally described as the of converse of diagram 405. Here, the distance value is the largest shown in the
graph 400. Such a large distance can indicate that there is substantially no similarity that can be measured between the device, namely the devices have no common features. This relationship is illustrated in diagram 420 a as sets 421 a and 422 a are completely disjointed, having no area of overlap. Also, diagram 420 b illustrates disjointed sets 421 b and 422 b. Nonetheless, the sizes of sets 421 b, 422 b in diagram 420 b are larger than the sizes of the sets 421 a, 422 a in diagram 420 a. As a result of weighting the set sizes, diagram 420 b may be considered to show less similarity between its sets 421 b, 422 b than diagram 420 a. For instance, a disjointed set with 20 elements has a higher degree of dissimilarity than a disjointed set of only two elements. It should be understood that various other weights, parameters, and factors not described in refence toFIG. 4 may be implemented by the embodiments for generating clusters. Furthermore, while maintaining the similarity measure with reference to the overlaps, the actual measure of the distance could be modified from using a simple encoded set comparison to embedding based measure or other NLP techniques like Latent Dirichlet Allocation (LDA). With respect to LDA, the extracted topic across various hierarchies could itself provide enough information about the network. For example, similar servers resolved through DNS elicits departmental workflow, the topic measure for mDNS could be the service proxies in the network, and for HTTP it could be the device types identifiers like iPad, iPod or type of android devices. -
FIG. 5 is an operation flow diagram illustrating an example of aprocess 500 for executing device grouping with interactive clustering using hierarchical distance, according to some embodiments.Process 500 is illustrated as a series of executable operations performed byprocessor 501, which can be the analyzer (shown inFIG. 1 ), as described above.Processor 501 executes the operations ofprocess 500, thereby implementing the disclosed interactive clustering and device grouping techniques described herein. - In an
operation 505, a plurality of ITA messages (or service advertisement messages) that are being communicated by a plurality devices on a network can be collected. As previously described, ITA messages can be communicated during device discovery and/or advertisement, and collected in a manner that is passive (e.g., listening, intercepting). In some embodiments, the ITA messages are formatted in accordance with the mDNS discovery protocol. However, it should be appreciated that the embodiments can be configured such that the interactive clustering approach is applicable to various other discovery protocols, such as DNS, SSDP, LLDP, HTTP, HTTPS, and the like. In some instances, ITA messages may be considered consumer messages. For example, in the case of a device that is a consumer of a particular service. Here, a consumer can use the parameters of the consumer messages to evaluate interoperability, and connection methods with other devices on the network, so as to be able to utilize the service. - Next, at an
operation 510, the collected ITA messages are analyzed using various text-based analysis techniques. ITA messages can include text, or records (as shown inFIG. 2 ), that convey discovery parameters that are specific to the particular protocol of the ITA message. Furthermore, the discovery parameters can be used to communicate the capabilities of the device. For instance, in the case of mDNS, a message can include text that indicates a service, attributes, and attribute values related to a certain device. Alternatively, in the case HTTP, the parameters can include text that indicates a user agent, and attributes. It should be appreciated that which parameters are analyzed inoperation 510 can be a user configurable feature that can be tailored for the network environment. As an example, a system administrator can designate that the system can examine ITA messages to extract (and subsequently analyze) text for domain, device type, and service type parameters, when the network is known to primarily utilize SSDP, for instance. Alternatively, the discovery protocols and discovery parameters that are analyzed in theprocess 500 can be automatically determine by the system, in a manner that does not require user input. Accordingly, based on the particular discovery protocol,operation 510 can involve extracting text that corresponds to certain discovery parameters that are specific to the protocol. In some embodiments,operation 510 can include analyzing parameters of ITA messages in a generic manner (e.g., protocol independent), that can be easily applied across different discovery protocols. - In some embodiments,
operation 510 involves NLP aspects that can remove biases that may be inherent in text-based analysis, thereby improving in measuring degrees of similarities between devices. For example, text extracted from the ITA messages can be dictionary encoded for each of the respective discovery parameters. Therefore, biases associated with a length of a string for the text may be negligible for the purposes of measuring a degree of similarity between devices. For example, text for each of the identified protocols, attributes, and attribute values can be separately encoded, in manner that allows commonalities in their respective text to be treated similarly (as a single entity) irrespective of the length. In some embodiments, a cardinality threshold can be used to decide whether text for a discovery parameter is dictionary encoded. In some instances, the embodiments apply a cardinality threshold, and text with high cardinality, such as encrypted values and text strings having the same length, are not encoded. Conversely, text with lower cardinality, with respect to the cardinality threshold, are encoded. - Furthermore, NLP techniques in
operation 510 can address keys that may be included in text. Some records use pre-shared keys (“pk”) to communicate a security code. For instance, a projecting device can communicate a security code, via a key, to verify its proximity to an device that has advertise an “Airplay” service. Given the large length of the string and temporal volatility (e.g., keys change often) associated with keys, as compared to other parameters, measuring a distances between two keys using text-based distance may result in an unproportionable measure of dissimilarity. Using text-based distance, the same device communicating two different keys has a potential of being measured at the same distance as two separate devices due to the bias. NLP can employ an approach that splits the text for other parameters, such as attributes, from text that particularly corresponding to keys. Thereafter, the comparisons between the keys can be performed separately from the remaining text. In some cases, keys are compared first. Then, when keys are found to overlap, a comparison of the other parameters are performed. This key approach can be done on the per-service level, for example, for a common set of services that may be identified by the analysis. - Thereafter, at operation 515, configurable hierarchal parameters for the distance algorithm may be received. A hierarchal approach, as alluded above, is used to designate which discovery parameters (extract from ITA messages) are considered to be greater indicators of similarities between devices. As part of the hierarchal approach, a particular parameter at a higher level in the hierarchy is more indicative of similarity, and thus has a heavier (or weighted) contribution in calculating the distance value. In some instances, the hierarchy implemented by the system can be tuned, or otherwise configured, by a user, thereby providing greater flexibility of the clustering. In an embodiment that may be based on an mDNS environment, operation 515 can include receiving a first hierarchal level corresponding to services, receiving a second hierarchal configurable level corresponding to attributes, and a third configurable hierarchal level corresponding to attribute values. Accordingly, in this case, services can be considered the highest level in the hierarchy, or the most significant property for determining similarities.
- In other embodiments, for instance implementations that are extended to multiple different discovery protocols, a hierarchal level can be assigned to protocols. Even further, a number of hierarchal levels that are used, can also be a configurable parameter of the system. For instance, a hierarchal distance algorithm can be set to consider three levels, and then adjusted to use five levels in a hierarchy. Accordingly, ITA messages of the same discovery protocol can be a measure of similarity. In some cases, a hierarchal level can even be assigned to type of message. The system can have the capability to automatically set the abovementioned hierarchal levels itself, based on a known discovery protocol that may be primarily used in the network environment. In other embodiments, a user, such as a network administrator, can provide user input to the system (e.g., GUI shown in
FIG. 6D ) that assigns the hierarchal levels to particular discovery parameters. The parameters, protocols, and messages that are described above as aspect of the hierarchal approach should not be considered exhaustive. It should be appreciated that other text-identifiable characteristics related to network traffic can be used to form a hierarchy used by the hierarchal distance algorithm. - Furthermore, configurable parameters of the distance algorithm can include place values. Operation 515 can involve receiving a specified place value that corresponds to the each of the abovementioned hierarchal levels. As a general description, the place values increase in an ascending order, as the hierarchal levels increase. Thus, each place value contributes a value to the total distance that is consistent with the hierarchal level's significance in indicating similarity. As an example, the first (e.g., most significant for determining similarity) hierarchal level, which corresponds to service in the mDNS based embodiment, can be set to an order of 10s power to a first decimal place value, such as 1000. A second (e.g., less significant in determining similarity) hierarchal level, which corresponds to attribute in the mDNS based embodiment, can be set to a descending order of 100s power to a second decimal place, such as 1. A third (e.g., lest significant in determining similarity) hierarchal level, which corresponds to attribute values in the mDNS based embodiment, can be set to a further descending order of 10s power to a third decimal place, such as 10.
- Additionally, operation 515 can include receiving a threshold of similarity. The threshold of similarity can be a level that must be met (or exceeded) to satisfy a degree necessary for devices to qualify as similar for the intended purposes of clustering. For example, a user can input a value for the threshold of similarity at operation 515. A distance value calculated between two devices can be compared to the threshold of similarity, as entered, and used to determine whether the devices have a degree of similarity to be clustered together. Thus, as alluded to above, clustering can be tuned as deemed optimal for the particular application, based on the threshold of similarity. In some cases, the threshold of similarity is a variable that can be configured based on various factors related to the devices or the clustering, such as services advertised by the devices, or the visualization technique.
- Subsequently, at
operation 520, a distance separating each of the plurality of devices according to a calculated distance value is determined. In the embodiments, the disclosed hierarchal distance algorithm is implemented atoperation 520. As alluded to above, the hierarchal distance algorithm leverages text-based properties, and incorporates a hierarchal scheme, in order to measure distance between devices on a network. Calculating a distance between two devices, for example, can involve generating a distance value for each of the hierarchal levels used by the algorithm. Restated, a degree of similarity for each level in the hierarchy can be determined, by applying text-based measurements to the parameters within each of the hierarchal levels. - Then, each distance value at the respective hierarchal level, is placed in its assigned place value for a total composite of distance. The total, comprised of each of the distance values at each hierarchal level, is considered the distance between the two devices. In some cases, the distance value at each hierarchal level is a value between 0 and 1, with a distance of 1 being the largest disjoint set, and 0 being complete overlap (e.g., same device). As an example, a distance value at the hierarchal level for service can be 0.3, a distance value for the hierarchal level for attribute can be 0.1, and a distance value at the hierarchal level for attribute value can be 0.2. Placing each of the aforementioned distance values at the respective place value assigned to the hierarchal level, results in the total distance 3120 between the two device.
- At an
operation 525, theprocess 500 can generate clusters of similar devices based on the determined distance. Each of the clusters can comprise a subset of devices having small distances between them, as calculated by the hierarchical distance algorithm. Thus, devices that are clustered together serve as an indication that the devices possess similarities of some form. In some embodiments,operation 525 can involve employing a clustering approach, that can be used to group clusters flexibly. The clustering approachcan generate two merged clusters with the distance determined inoperation 520 between them. In some cases, the clustering approach can also provide a total number of devices among the two groups. This can be used to generate the clusters, in an iterative manner. At each iteration, there can be an decision on whether to the groups are clustered. Also, based on the number of devices in the merged group, it can be further determined whether to merge leaf nodes (e.g., base data) or aggregate clusters. - In some embodiment, the clustering approach implemented at
operation 525 may begin with forming basic cluster groups, which include the same devices, or perfect overlaps as discussed in greater detail in reference toFIG. 4 . In other words, the basic cluster group are formed by devices with zero distance between them. Then, similar devices with minimal distances can be merged within a threshold, depending on the final number of groups formed. This threshold can be used to form other heterogenous clusters to generate an initial set of clusters that are easy to visualize. Additional clusters can be formed by a combination of configuration and visualization, similar to a feedback loop. For instance, the initial set of clusters can be generated at half of the height of a linkage tree. Recursive visualization models could be used to zoom in, and view sub-clusters or leaf nodes to determine merge decisions, thus making for an interactive approach in determining groups. - In some embodiments, K-Means approach is utilized for clustering. K-Means approach can be generally described as where each device is assigned to a cluster randomly and an iterative best effort is employed to regroup them among the existing clusters. In this K-Means approach, it is typically expected to see some number of large clusters and a very long tail of devices that would form a cluster of ungroupable devices. Distribution of such devices in the clusters remain separately with a bottoms-up clustering model as described in this embodiment, compared to being sprinkled around various clusters in a top-down approach. The approach described provides a better user-experience in all compared to competitive approaches.
- Subsequently, at an operation 530, a visualization of the network including clusters of devices therein can be generated. The visualizations can be displayed to a user, for instance within a interface (e.g., visualization interface shown in
FIG. 1 ) on a display device of a computer device. Visualizations can be displayed having various graph topologies, that can convey the clustering and distance results to a user in an intelligible and visually discernable way. Thus, the visualization aspects can enhance the user experience, as well as improve the ease of use. As previously described, the visualization can be presented to the user in an interactive manner. User interactions with the visualization can allow the user to provide input that impacts the clusters that are rendered in the visualization. For instance, as described inoperation 525, interactions with the visualization can cause additional clusters to be formed. Various examples of visualizations that may be presented to a user, in accordance with the embodiments, are depicted inFIGS. 6A-6C . - Now referring to
FIGS. 6A-6C , examples of visualizations that can be generated as a result of the hierarchal distance techniques are shown.FIG. 6A illustrates an example of avisualization 600 the showsindividual devices 601 andclusters 602 as close to each other. Visualization also displaysarrows 603 that indicate the direction of aggregation.FIG. 6B shows another example of avisualization 620. Thevisualization 620 places devices at the periphery of acircle 621 with center being the groups with all of the device as a group. In the example ofFIG. 6B , the radius determined the height of the tree. In yet another example of avisualization 630,FIG. 6C the groupings as atree 631. As seen, the tree has a top 632. The top 632 of thetree 631 can be a group with all of the devices on a network included therein. -
FIG. 6D illustrates an example of aGUI 650 that can implemented for receiving user-configurable settings for the hierarchal distance algorithm (and the clustering approach). TheGUI 650 may be implemented as an element of the visualization interface (shown inFIG. 1 ), in some instances. As discussed above in reference toFIG. 5 , discovery paraments can be assigned to a hierarchal level as a configurable feature. Accordingly, as seen inFIG. 6D , theGUI 650 can include awindow 652 for selecting which parameter is being assigned to a first hierarchal level. In the example, a user has selected “service” as the parameter corresponding to the first hierarchal level. Also, asecond window 653 shows “attribute” selected as the parameter that corresponds to the second hierarchal level. Furthermore, thewindow 652 includes an input for a place value to be assigned to the first hierarchal level. In the example, the first hierarchal level is set to the thousandths place. Similarly, a place value assigned to the second hierarchal value is set inwindow 653. The second hierarchal level is set to the hundredths place. - The hierarchal distance settings can be received from the user, by interacting with the
respective window 652, 653 (or other elements) using a form of input deemed appropriate, such as keyboard entry, pull down menu, radio button, and the like. By entering the particular settings shown inFIG. 6D , the hierarchal distance algorithm, in this case, has configured by the user to consider service as the predominant property in measuring similarities. Furthermore, any distance value that is measured based on common services will be placed set in the thousandths place in the total distance. Additionally, attributes has been selected as a less significant property in measuring similarities. Distance values that are measured based on common attributes will be set in the hundredths place in the total distance. In some cases, the number of hierarchal levels can be configured, which may result in additional windows being displayed to receive the corresponding settings. Additionally, settings that are related to extending the embodiments to multiple discovery protocols may be used. For example, theGUI 650 can include an input mechanism for entering one or more protocols that may be applicable. - Also,
FIG. 6D shows a setting for the threshold of similarity.FIG. 6D shows the threshold of similarity as a sliding bar input. This serves to illustrate that theGUI 650 is particularly designed for a user to easily configure the clustering an hierarchal distance functions. That is, theGUI 650 allows a user to enter settings via simple input mechanisms, which do not require complicated user interactions or a deep knowledge of the algorithms applied. Having a general understanding of the hierarchy approach and some knowledge of the network environment, a user can, by in large, appropriately configure the system as desired. In an embodiment, some of the configurable settings can be automatically populated, or allows the user to select from a group of provided settings, as a smart feature that further simplifies configuring the system. It should be appreciated that the examples of hierarchical distance settings shown inFIG. 6D are not meant to be exhaustive, and can include for other configurable features of the techniques disclosed herein. Moreover, in some embodiments, the configurable settings may be automatically set by the system either in whole or in part,. -
FIG. 7 depicts a block diagram of anexample computer system 700 in which may be used in implementing various device grouping with interactive clustering using hierarchical distance features relating to the embodiments of the disclosed technology. Thecomputer system 700 includes abus 702 or other communication mechanism for communicating information, one ormore hardware processors 704 coupled withbus 702 for processing information. Hardware processor(s) 704 may be, for example, one or more general purpose microprocessors. - The
computer system 700 also includes amain memory 708, such as a random-access memory (RAM), cache and/or other dynamic storage devices, coupled tobus 702 for storing information and instructions to be executed byprocessor 704.Main memory 708 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed byprocessor 704. Such instructions, when stored in storage media accessible toprocessor 704, rendercomputer system 700 into a special-purpose machine that is customized to perform the operations specified in the instructions. - The
computer system 700 further includesstorage devices 710 such as a read only memory (ROM) or other static storage device coupled tobus 702 for storing static information and instructions forprocessor 704. Astorage device 710, such as a magnetic disk, optical disk, or USB thumb drive (Flash drive), etc., is provided and coupled tobus 702 for storing information and instructions. - The
computer system 700 may be coupled viabus 702 to adisplay 712, such as a liquid crystal display (LCD) (or touch screen), for displaying information to a computer user. Aninput device 714, including alphanumeric and other keys, is coupled tobus 702 for communicating information and command selections toprocessor 704. Another type of user input device is cursor control 716, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections toprocessor 704 and for controlling cursor movement ondisplay 712. In some embodiments, the same direction information and command selections as cursor control may be implemented via receiving touches on a touch screen without a cursor. - The
computing system 700 may include a user interface module to implement a GUI that may be stored in a mass storage device as executable software codes that are executed by the computing device(s). This and other modules may include, by way of example, components, such as software components, object-oriented software components, class components and task components, processes, functions, attributes, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables. - In general, the word “component,” “engine,” “system,” “database,” data store,” and the like, as used herein, can refer to logic embodied in hardware or firmware, or to a collection of software instructions, possibly having entry and exit points, written in a programming language, such as, for example, Java, C or C++. A software component may be compiled and linked into an executable program, installed in a dynamic link library, or may be written in an interpreted programming language such as, for example, BASIC, Perl, or Python. It will be appreciated that software components may be callable from other components or from themselves, and/or may be invoked in response to detected events or interrupts. Software components configured for execution on computing devices may be provided on a computer readable medium, such as a compact disc, digital video disc, flash drive, magnetic disc, or any other tangible medium, or as a digital download (and may be originally stored in a compressed or installable format that requires installation, decompression or decryption prior to execution). Such software code may be stored, partially or fully, on a memory device of the executing computing device, for execution by the computing device. Software instructions may be embedded in firmware, such as an EPROM. It will be further appreciated that hardware components may be comprised of connected logic units, such as gates and flip-flops, and/or may be comprised of programmable units, such as programmable gate arrays or processors.
- The
computer system 700 may implement the techniques described herein using customized hard-wired logic, one or more ASICs or FPGAs, firmware and/or program logic which in combination with the computer system causes orprograms computer system 700 to be a special-purpose machine. According to one embodiment, the techniques herein are performed bycomputer system 700 in response to processor(s) 704 executing one or more sequences of one or more instructions contained inmain memory 708. Such instructions may be read intomain memory 708 from another storage medium, such asstorage device 710. Execution of the sequences of instructions contained inmain memory 708 causes processor(s) 704 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions. - The term “non-transitory media,” and similar terms, as used herein refers to any media that store data and/or instructions that cause a machine to operate in a specific fashion. Such non-transitory media may comprise non-volatile media and/or volatile media. Non-volatile media includes, for example, optical or magnetic disks, such as
storage device 510. Volatile media includes dynamic memory, such as main memory 508. Common forms of non-transitory media include, for example, a floppy disk, a flexible disk, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge, and networked versions of the same. - Non-transitory media is distinct from but may be used in conjunction with transmission media. Transmission media participates in transferring information between non-transitory media. For example, transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise
bus 702. Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications. - The
computer system 700 also includes a communication interface 718 coupled tobus 702. Network interface 718 provides a two-way data communication coupling to one or more network links that are connected to one or more local networks. For example, communication interface 718 may be an integrated service digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, network interface 718 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN (or WAN component to communicated with a WAN). Wireless links may also be implemented. In any such implementation, network interface 718 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information. - A network link typically provides data communication through one or more networks to other data devices. For example, a network link may provide a connection through local network to a host computer or to data equipment operated by an Internet Service Provider (ISP). The ISP in turn provides data communication services through the world-wide packet data communication network now commonly referred to as the “Internet.” Local network and Internet both use electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals on network link and through communication interface 718, which carry the digital data to and from
computer system 700, are example forms of transmission media. - The
computer system 700 can send messages and receive data, including program code, through the network(s), network link and communication interface 718. In the Internet example, a server might transmit a requested code for an application program through the Internet, the ISP, the local network and the communication interface 718. - The received code may be executed by
processor 704 as it is received, and/or stored instorage device 710, or other non-volatile storage for later execution. - Each of the processes, methods, and algorithms described in the preceding sections may be embodied in, and fully or partially automated by, code components executed by one or more computer systems or computer processors comprising computer hardware. The one or more computer systems or computer processors may also operate to support performance of the relevant operations in a “cloud computing” environment or as a “software as a service” (SaaS). The processes and algorithms may be implemented partially or wholly in application-specific circuitry. The various features and processes described above may be used independently of one another or may be combined in various ways. Different combinations and sub-combinations are intended to fall within the scope of this disclosure, and certain method or process blocks may be omitted in some implementations. The methods and processes described herein are also not limited to any particular sequence, and the blocks or states relating thereto can be performed in other sequences that are appropriate, or may be performed in parallel, or in some other manner. Blocks or states may be added to or removed from the disclosed example embodiments. The performance of certain of the operations or processes may be distributed among computer systems or computers processors, not only residing within a single machine, but deployed across a number of machines.
- As used herein, a circuit might be implemented utilizing any form of hardware, software, or a combination thereof. For example, one or more processors, controllers, ASICs, PLAs, PALs, CPLDs, FPGAs, logical components, software routines or other mechanisms might be implemented to make up a circuit. In implementation, the various circuits described herein might be implemented as discrete circuits or the functions and features described can be shared in part or in total among one or more circuits. Even though various features or elements of functionality may be individually described or claimed as separate circuits, these features and functionality can be shared among one or more common circuits, and such description shall not require or imply that separate circuits are required to implement such features or functionality. Where a circuit is implemented in whole or in part using software, such software can be implemented to operate with a computing or processing system capable of carrying out the functionality described with respect thereto, such as
computer system 700. - As used herein, the term “or” may be construed in either an inclusive or exclusive sense. Moreover, the description of resources, operations, or structures in the singular shall not be read to exclude the plural. Conditional language, such as, among others, “can,” “could,” “might,” or “may,” unless specifically stated otherwise, or otherwise understood within the context as used, is generally intended to convey that certain embodiments include, while other embodiments do not include, certain features, elements and/or steps.
- Terms and phrases used in this document, and variations thereof, unless otherwise expressly stated, should be construed as open ended as opposed to limiting. Adjectives such as “conventional,” “traditional,” “normal,” “standard,” “known,” and terms of similar meaning should not be construed as limiting the item described to a given time period or to an item available as of a given time, but instead should be read to encompass conventional, traditional, normal, or standard technologies that may be available or known now or at any time in the future. The presence of broadening words and phrases such as “one or more,” “at least,” “but not limited to” or other like phrases in some instances shall not be read to mean that the narrower case is intended or required in instances where such broadening phrases may be absent.
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/374,728 US10805173B1 (en) | 2019-04-03 | 2019-04-03 | Methods and systems for device grouping with interactive clustering using hierarchical distance across protocols |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/374,728 US10805173B1 (en) | 2019-04-03 | 2019-04-03 | Methods and systems for device grouping with interactive clustering using hierarchical distance across protocols |
Publications (2)
Publication Number | Publication Date |
---|---|
US20200322227A1 true US20200322227A1 (en) | 2020-10-08 |
US10805173B1 US10805173B1 (en) | 2020-10-13 |
Family
ID=72662497
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/374,728 Active US10805173B1 (en) | 2019-04-03 | 2019-04-03 | Methods and systems for device grouping with interactive clustering using hierarchical distance across protocols |
Country Status (1)
Country | Link |
---|---|
US (1) | US10805173B1 (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210344769A1 (en) * | 2020-04-30 | 2021-11-04 | Perygee Inc. | Network security layer |
US20210359984A1 (en) * | 2020-05-14 | 2021-11-18 | Nokia Technologies Oy | Device monitoring in accessing network |
US11374982B1 (en) * | 2020-12-02 | 2022-06-28 | Wiz, Inc. | Static analysis techniques for determining reachability properties of network and computing objects |
CN114938402A (en) * | 2022-04-11 | 2022-08-23 | 清华大学 | Unknown protocol frame structure identification method and device based on dictionary tree |
US20220271961A1 (en) * | 2021-02-23 | 2022-08-25 | Universal Electronics Inc. | System and method for using a multicast service to configure a controlling device |
US11431786B1 (en) * | 2020-12-02 | 2022-08-30 | Wiz, Inc. | System and method for analyzing network objects in a cloud environment |
US11509715B2 (en) * | 2020-10-08 | 2022-11-22 | Dell Products L.P. | Proactive replication of software containers using geographic location affinity to predicted clusters in a distributed computing environment |
US20230127149A1 (en) * | 2021-10-25 | 2023-04-27 | Dell Products L.P. | Cluster-based data compression for ai training on the cloud for an edge network |
US11929896B1 (en) | 2021-01-28 | 2024-03-12 | Wiz, Inc. | System and method for generation of unified graph models for network entities |
US11979328B1 (en) * | 2020-04-28 | 2024-05-07 | Cable Television Laboratories, Inc. | Traffic flow classifiers and associated methods |
Family Cites Families (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7975035B2 (en) * | 2003-12-01 | 2011-07-05 | International Business Machines Corporation | Method and apparatus to support application and network awareness of collaborative applications using multi-attribute clustering |
US8117486B2 (en) | 2007-04-10 | 2012-02-14 | Xerox Corporation | Method and system for detecting an anomalous networked device |
US20100031156A1 (en) | 2008-07-31 | 2010-02-04 | Mazu Networks, Inc. | User Interface For Network Events and Tuning |
US9077644B2 (en) * | 2010-12-08 | 2015-07-07 | At&T Intellectual Property I, L.P. | Methods and apparatus for communicating with groups of devices sharing an attribute |
US9886321B2 (en) * | 2012-04-03 | 2018-02-06 | Microsoft Technology Licensing, Llc | Managing distributed analytics on device groups |
US10003642B2 (en) | 2013-06-28 | 2018-06-19 | Apple Inc. | Operating a cluster of peer-to-peer devices |
US9647897B2 (en) * | 2014-08-20 | 2017-05-09 | Jamf Software, Llc | Dynamic grouping of managed devices |
US10305758B1 (en) * | 2014-10-09 | 2019-05-28 | Splunk Inc. | Service monitoring interface reflecting by-service mode |
US10069883B2 (en) * | 2015-06-22 | 2018-09-04 | Intel IP Corporation | Apparatus, system and method of communicating in a multicast group |
US10248718B2 (en) | 2015-07-04 | 2019-04-02 | Accenture Global Solutions Limited | Generating a domain ontology using word embeddings |
WO2017015231A1 (en) | 2015-07-17 | 2017-01-26 | Fido Labs, Inc. | Natural language processing system and method |
WO2017037444A1 (en) | 2015-08-28 | 2017-03-09 | Statustoday Ltd | Malicious activity detection on a computer network and network metadata normalisation |
US11010550B2 (en) | 2015-09-29 | 2021-05-18 | Apple Inc. | Unified language modeling framework for word prediction, auto-completion and auto-correction |
US10769189B2 (en) | 2015-11-13 | 2020-09-08 | Microsoft Technology Licensing, Llc | Computer speech recognition and semantic understanding from activity patterns |
US10354201B1 (en) * | 2016-01-07 | 2019-07-16 | Amazon Technologies, Inc. | Scalable clustering for mixed machine learning data |
US20170262523A1 (en) * | 2016-03-14 | 2017-09-14 | Cisco Technology, Inc. | Device discovery system |
US10218726B2 (en) | 2016-03-25 | 2019-02-26 | Cisco Technology, Inc. | Dynamic device clustering using device profile information |
US10372910B2 (en) | 2016-06-20 | 2019-08-06 | Jask Labs Inc. | Method for predicting and characterizing cyber attacks |
US10404794B2 (en) * | 2016-06-21 | 2019-09-03 | Orion Labs | Discovery and formation of local communication group |
US20190180141A1 (en) * | 2017-12-08 | 2019-06-13 | Nicira, Inc. | Unsupervised machine learning for clustering datacenter nodes on the basis of network traffic patterns |
US20190273510A1 (en) * | 2018-03-01 | 2019-09-05 | Crowdstrike, Inc. | Classification of source data by neural network processing |
US11216502B2 (en) * | 2018-06-05 | 2022-01-04 | LogsHero Ltd. | Clustering of log messages |
-
2019
- 2019-04-03 US US16/374,728 patent/US10805173B1/en active Active
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11979328B1 (en) * | 2020-04-28 | 2024-05-07 | Cable Television Laboratories, Inc. | Traffic flow classifiers and associated methods |
US20210344769A1 (en) * | 2020-04-30 | 2021-11-04 | Perygee Inc. | Network security layer |
US20210359984A1 (en) * | 2020-05-14 | 2021-11-18 | Nokia Technologies Oy | Device monitoring in accessing network |
US11943211B2 (en) * | 2020-05-14 | 2024-03-26 | Nokia Technologies Oy | Device monitoring in accessing network |
US11509715B2 (en) * | 2020-10-08 | 2022-11-22 | Dell Products L.P. | Proactive replication of software containers using geographic location affinity to predicted clusters in a distributed computing environment |
US11902333B2 (en) * | 2020-12-02 | 2024-02-13 | Wiz, Inc. | Static analysis techniques for determining reachability properties of network and computing objects |
US11431786B1 (en) * | 2020-12-02 | 2022-08-30 | Wiz, Inc. | System and method for analyzing network objects in a cloud environment |
US20220394082A1 (en) * | 2020-12-02 | 2022-12-08 | Wiz, Inc. | System and method for analyzing network objects in a cloud environment |
US11374982B1 (en) * | 2020-12-02 | 2022-06-28 | Wiz, Inc. | Static analysis techniques for determining reachability properties of network and computing objects |
US20220286479A1 (en) * | 2020-12-02 | 2022-09-08 | Wiz, Inc. | Determining reachability of objects deployed in a cloud environment from to external network |
US11671460B2 (en) * | 2020-12-02 | 2023-06-06 | Wiz, Inc. | Determining reachability of objects deployed in a cloud environment from to external network |
US11722554B2 (en) * | 2020-12-02 | 2023-08-08 | Wiz, Inc. | System and method for analyzing network objects in a cloud environment |
US11962623B2 (en) * | 2020-12-02 | 2024-04-16 | Wiz, Inc. | Static analysis techniques for determining reachability properties of network and computing objects |
US20230275929A1 (en) * | 2020-12-02 | 2023-08-31 | Wiz, Inc. | Static analysis techniques for determining reachability properties of network and computing objects |
US11985185B2 (en) | 2020-12-02 | 2024-05-14 | Wiz, Inc. | System and method for analyzing network objects in a cloud environment |
US20240056487A1 (en) * | 2020-12-02 | 2024-02-15 | Wiz, Inc. | Static analysis techniques for determining reachability properties of network and computing objects |
US11929896B1 (en) | 2021-01-28 | 2024-03-12 | Wiz, Inc. | System and method for generation of unified graph models for network entities |
US11595222B2 (en) * | 2021-02-23 | 2023-02-28 | Universal Electronics Inc. | System and method for using a multicast service to configure a controlling device |
US20220271961A1 (en) * | 2021-02-23 | 2022-08-25 | Universal Electronics Inc. | System and method for using a multicast service to configure a controlling device |
US11996951B2 (en) | 2021-02-23 | 2024-05-28 | Universal Electronics Inc. | System and method for using a multicast service to configure a controlling device |
US20230127149A1 (en) * | 2021-10-25 | 2023-04-27 | Dell Products L.P. | Cluster-based data compression for ai training on the cloud for an edge network |
US11728825B2 (en) * | 2021-10-25 | 2023-08-15 | Dell Products L.P. | Cluster-based data compression for AI training on the cloud for an edge network |
CN114938402A (en) * | 2022-04-11 | 2022-08-23 | 清华大学 | Unknown protocol frame structure identification method and device based on dictionary tree |
Also Published As
Publication number | Publication date |
---|---|
US10805173B1 (en) | 2020-10-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10805173B1 (en) | Methods and systems for device grouping with interactive clustering using hierarchical distance across protocols | |
US11025674B2 (en) | Cybersecurity profiling and rating using active and passive external reconnaissance | |
US11196839B1 (en) | System and method for classifying API requests in API processing systems using a tree configuration | |
US11601475B2 (en) | Rating organization cybersecurity using active and passive external reconnaissance | |
US10833954B2 (en) | Extracting dependencies between network assets using deep learning | |
US20160226893A1 (en) | Methods for optimizing an automated determination in real-time of a risk rating of cyber-attack and devices thereof | |
US20160065534A1 (en) | System for correlation of domain names | |
Sija et al. | A survey of automatic protocol reverse engineering approaches, methods, and tools on the inputs and outputs view | |
US11082293B2 (en) | System and method for validating correctness of changes to network device configurations | |
US20210281609A1 (en) | Rating organization cybersecurity using probe-based network reconnaissance techniques | |
US11451575B2 (en) | Method and system for determining cybersecurity maturity | |
US11297105B2 (en) | Dynamically determining a trust level of an end-to-end link | |
US11108835B2 (en) | Anomaly detection for streaming data | |
Fan et al. | An interactive visual analytics approach for network anomaly detection through smart labeling | |
US20180196861A1 (en) | Method for generating graph database of incident resources and apparatus thereof | |
US11770380B1 (en) | Systems and methods for enhanced network detection | |
US20230283641A1 (en) | Dynamic cybersecurity scoring using traffic fingerprinting and risk score improvement | |
US20140344418A1 (en) | Dynamic configuration analysis | |
US20150106279A1 (en) | Compliance as a service for an organization | |
Ren et al. | App identification based on encrypted multi-smartphone sources traffic fingerprints | |
US9904716B2 (en) | Optimal analytic workflow | |
CN111026607A (en) | Server monitoring system and method and server data acquisition method and system | |
CN116894018A (en) | Event data processing | |
Jiang et al. | Seq2Path: a sequence-to-path-based flow feature fusion approach for encrypted traffic classification | |
US20240113942A1 (en) | Network asset tracking using graph size estimation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP, TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:JANAKIRAMAN, RAMSUNDAR;REEL/FRAME:048787/0784 Effective date: 20190403 |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |