CN116455758A - Application session specific network topology generation for application session failover - Google Patents

Application session specific network topology generation for application session failover Download PDF

Info

Publication number
CN116455758A
CN116455758A CN202211526051.9A CN202211526051A CN116455758A CN 116455758 A CN116455758 A CN 116455758A CN 202211526051 A CN202211526051 A CN 202211526051A CN 116455758 A CN116455758 A CN 116455758A
Authority
CN
China
Prior art keywords
network
application session
application
session
network devices
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211526051.9A
Other languages
Chinese (zh)
Inventor
王继生
吴小英
夜杜南段·西达林家帕-维塔拉普尔
阿布希拉姆·马杜吉里·沙姆桑达尔
罗伯特·J·弗里达
苏提尔·马塔
苏纳里尼·桑哈瓦拉姆
库什·沙阿
兰德尔·弗赖
苏尧伊·哈杰拉
雅各布·托马斯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Juniper Networks Inc
Original Assignee
Juniper Networks Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US17/935,704 external-priority patent/US11968075B2/en
Application filed by Juniper Networks Inc filed Critical Juniper Networks Inc
Publication of CN116455758A publication Critical patent/CN116455758A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/12Discovery or management of network topologies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/04Processing captured monitoring data, e.g. for logfile generation
    • H04L43/045Processing captured monitoring data, e.g. for logfile generation for graphical visualisation of monitoring data

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses application session specific network topology generation for application session failover. A network management system provides an application session level granularity troubleshooting workflow using an application session specific topology from a client device to a cloud-based application server. During an application session of a cloud-based application, a client device running the application exchanges data through an access point device, a switch at the edge of a wired network, and a network node to reach a cloud-based application server. For a particular application session, the NMS generates a topology based on network data received from a subset of network devices, e.g., client devices, AP devices, switches, routers, and/or gateways, involved in the particular application session for the duration of the particular application session. In this way, the NMS enables retrospective troubleshooting of a particular application session.

Description

Application session specific network topology generation for application session failover
The present application claims priority from U.S. patent application Ser. No. 17/935704, filed on Ser. No. 2022, 9, 27, and U.S. provisional patent application Ser. No. 63/299733, filed on Ser. No. 2022, 1, 14, the entire contents of which are incorporated herein by reference.
Technical Field
The present disclosure relates generally to computer networks, and more particularly to monitoring and troubleshooting of computer networks.
Background
A business or location, such as an office, hospital, airport, stadium, or retail store, typically installs a complex wireless network system throughout the location, including a wireless Access Point (AP) network, to provide wireless network services to one or more wireless client devices (or simply "clients"). An AP is a physical electronic device that enables other devices to wirelessly connect to a wired network using various wireless network protocols and technologies (e.g., wireless local area network protocols conforming to one or more IEEE 802.11 standards (i.e., "Wi-Fi"), bluetooth/Bluetooth Low Energy (BLE), mesh network protocols (e.g., zigBee), or other wireless network technologies). Many different types of wireless client devices, such as notebook computers, smartphones, tablets, wearable devices, appliances, and internet of things (IoT) devices, incorporate wireless communication technologies and may be configured to connect to a wireless access point when the device is within range of a compatible wireless access point in order to access a wired network. In the case of a client device running a cloud-based application, such as a Voice Over Internet Protocol (VOIP) application, streaming video application, gaming application, or video conferencing application, during an application session, data is exchanged from the client device through one or more APs and one or more wired network devices (e.g., switches, routers, and/or gateway devices) to reach a cloud-based application server.
Disclosure of Invention
In general, this disclosure describes one or more techniques for a Network Management System (NMS) that provides a granular troubleshooting workflow at an application session level using an application session specific topology from a client device to a cloud-based application server. During an application session of a cloud-based application, such as a VOIP or video conference call, a streaming video viewing session, or a gaming session, a client device running the application exchanges data through one or more Access Point (AP) devices, one or more switches at the edge of a wired network, and one or more network nodes (e.g., switches, routers, and/or gateway devices) to reach a cloud-based application server.
In accordance with the techniques of this disclosure, for a particular application session, the NMS generates a topology based on data received from a subset of network devices (e.g., client devices, AP devices, switches, routers, and/or gateways) that are involved in the particular application session for the duration of the particular application session. More specifically, the NMS may construct the application session specific topology based on data of a particular application session retrieved from a time graph database of the network. The temporal graph database is configured to store entity and connection information extracted from historical data collected by the network device at an application session level granularity over a longer period of time (e.g., weeks or months). In this way, the techniques of this disclosure may enable retrospective troubleshooting of a particular application session even if the current network topology changes after the particular application session ends or the current application session does not experience the same problems as the particular application session.
The techniques of this disclosure enable troubleshooting a particular application session by identifying connectivity issues at one or more network devices in a subset of network devices related to the particular application session. For example, the NMS may generate data representing a user interface to provide a user (e.g., a site or network administrator) with a visualization of an application session specific topology, including color coding, icons, or other indicia of connection problems within the topology for the duration of a particular application session. Responsive to user input selecting an icon representing a network device identified as having a connectivity problem during a particular application session, the NMS may further generate a troubleshooting user interface for the network device, or may redirect the user to a customer insight (insight) or recommended action user interface specific to the network device.
Furthermore, the techniques of this disclosure enable the identification of third party application servers and other third party network devices that are involved in a particular application session to provide a complete topology from a client device to a cloud-based application server. For example, the NMS may determine which switches, routers, and/or gateways are connected to a third party application server, a third party service provider server, or other third party network device based on uplink data (e.g., LLDP advertisements) contained in data received from the switches, routers, and/or gateways during a particular application session. The NMS may then determine entity ID data (e.g., IP address or interface address) for the third party network device. In some examples, the NMS may perform some integration with third party application/service performance monitoring (APM) providers to retrieve insight data for application services and/or service providers via Application Programming Interfaces (APIs) to determine whether the application services and/or service providers are malfunctioning or experiencing problems.
In one example, the present disclosure describes a network management system comprising a memory storing network data received from a plurality of network devices configured to provide client-to-cloud connectivity in a network; and one or more processors coupled with the memory and configured to: receiving a query identifying an application session of an application running on a client device, wherein the client device comprises one of a plurality of network devices; retrieving entity information and connection information for an application session from a time graph database, wherein the entity information represents a subset of network devices from a plurality of network devices that are involved in the application session during the application session and is stored as nodes of the time graph database, wherein the connection information represents connections between the subset of network devices during the duration of the application session and is stored as edges of the time graph database, wherein the time graph database represents a history of at least a portion of the network at an application session level granularity over a period of time; generating an application session specific topology for the application session based on the entity information and the connection information of the application session; identifying at least one connection problem within the subset of network devices during the application session based at least on network data received from the subset of network devices during the application session; and generating data representing a user interface for presentation on an administrator device, the user interface including a visualization of an application session specific topology over an application session duration, including an indication of at least one connection problem.
In another example, the disclosure describes a method comprising receiving, by a network management system, a query identifying an application session of an application running on a client device, wherein the client device comprises one of a plurality of network devices configured to provide client-to-cloud connectivity in a network; retrieving, by the network management device, entity information and connection information for the application session from a time graph database, wherein the entity information represents a subset of network devices from a plurality of network devices involved in the application session during the application session and is stored as nodes of the time graph database, wherein the connection information represents connections between the subset of network devices during the application session duration and is stored as edges of the time graph database, wherein the time graph database represents a history of at least a portion of the network at an application session level granularity; generating, by the network management device, an application session-specific topology for the application session based on the entity information and the connection information of the application session; identifying, by the network management device, at least one connection problem within the subset of network devices during the application session based at least on network data received from the subset of network devices during the application session; and generating, by the network management device, data representing a user interface for presentation on the administrator device, the user interface including a visualization of the application session specific topology over the duration of the application session, including an indication of at least one connection problem.
In another example, the disclosure describes a computer-readable storage medium comprising instructions that, when executed, cause one or more processors of a network management system to: receiving a query identifying an application session of an application running on a client device, wherein the client device comprises one of a plurality of network devices configured to provide client-to-cloud connectivity in a network; retrieving entity information and connection information for an application session from a time graph database, wherein the entity information represents a subset of network devices from a plurality of network devices that are involved in the application session during the application session and is stored as nodes of the time graph database, wherein the connection information represents connections between the subset of network devices during the application session and is stored as edges of the time graph database, wherein the time graph database represents a history of at least a portion of the network over a period of time at an application session level granularity; generating an application session specific topology for the application session based on the entity information and the connection information of the application session; identifying at least one connection problem within the subset of network devices during the application session based at least on network data received from the subset of network devices during the application session; and generating data representing a user interface for presentation on an administrator device, the user interface including a visualization of an application session specific topology over an application session duration, including an indication of at least one connection problem.
The details of one or more examples of the disclosed technology are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the technology will be apparent from the description and drawings, and from the claims.
Drawings
FIG. 1A is a block diagram of an example network system in which a network management system provides granular troubleshooting workflows at an application session level using an application session specific topology from a client device to a cloud-based application server, based on one or more techniques of the present disclosure.
Fig. 1B is a block diagram illustrating further example details of the network system of fig. 1A.
Fig. 2 is a block diagram of an example access point device based on one or more techniques of this disclosure.
FIG. 3 is a block diagram of an example network management system configured to provide granular troubleshooting workflows at an application session level using an application session specific topology from a client device to a cloud-based application server, based on one or more techniques of the present disclosure.
Fig. 4 is a block diagram of an example user equipment device based on one or more techniques of this disclosure.
Fig. 5 is a block diagram of an example network node (e.g., a router or switch) based on one or more techniques of this disclosure.
Fig. 6A-6C illustrate example user interfaces of a network management system for visualizing application session specific topologies and related troubleshooting workflows for network gateway devices involving application sessions.
Fig. 7A-7B illustrate example user interfaces of a network management system for visualizing application session specific topologies and related troubleshooting workflows for AP devices involved in an application session.
FIG. 8 illustrates an example user interface of a network management system for visualizing an application session specific topology including a service provider server.
FIG. 9 illustrates an example user interface of a network management system for visualizing a troubleshooting workflow for a particular set of application sessions.
FIG. 10 is a flow diagram illustrating example operations for providing a granular troubleshooting workflow at an application session level using an application session specific topology from a client device to a cloud-based application server, based on one or more techniques of the present disclosure.
Detailed Description
Fig. 1A is a block diagram of an example network system 100 in which a Network Management System (NMS) 130 provides a granular troubleshooting workflow at an application session level using an application session specific topology from a client device to a cloud-based application server, based on one or more techniques of the present disclosure. The example network system 100 includes a plurality of sites 102A-102N at which a network service provider manages one or more wireless networks 106A-106N, respectively. Although in fig. 1A, each station 102A-102N is shown to include a single wireless network 106A-106N, respectively, in some examples, each station 102A-102N may include multiple wireless networks, and the disclosure is not limited in this respect.
Each site 102A-102N includes a plurality of Network Access Server (NAS) devices, such as Access Points (APs) 142, switches 146, or routers (not shown) within the edge of a wired network. For example, station 102A includes a plurality of APs 142A-1 through 142A-M. Likewise, station 102N includes a plurality of APs 142N-1 through 142N-M. Each AP 142 may be any type of wireless access point including, but not limited to, a business or enterprise AP, a router, or any other device connected to a wired network and capable of providing wireless network access to client devices within a site.
Each site 102A-102N also includes a plurality of client devices, or user equipment devices (UEs), commonly referred to as UEs or client devices 148, representing various wireless-enabled devices within each site. For example, a plurality of UEs 148A-1 through 148A-N are currently located at site 102A. Likewise, a plurality of UEs 148N-1 through 148N-N are currently located at site 102N. Each UE 148 may be any type of wireless client device including, but not limited to, a mobile device such as a smart phone, tablet or notebook computer, personal Digital Assistant (PDA), wireless terminal, smart watch, smart ring, or other wearable device. UE 148 may also include wired client-side devices, such as an internet of things device (e.g., printer, security device, environmental sensor) or any other device connected to a wired network and configured to communicate over one or more wireless networks 106.
To provide wireless network services to UE 148 and/or communicate over wireless network 106, AP142 and other wired client-side devices of site 102 are directly or indirectly connected to one or more network devices (e.g., switches, routers, etc.) via physical cables (e.g., ethernet cables). In the example of FIG. 1A, station 102A includes a switch 146A to which each of the APs 142A-1 through 142A-M of station 102A are connected. Similarly, station 102N includes a switch 146N to which each of the APs 142N-1 through 142N-M of station 102N are connected. Although as shown in fig. 1A, it appears that each station 102 includes a single switch 146 and that all APs 142 of a given station 102 are connected to the single switch 146, in other examples, each station 102 may include more or fewer switches and/or routers. Further, APs and other wired client-side devices of a given site may be connected to two or more switches and/or routers. Furthermore, two or more switches of a site may be interconnected and/or connected to two or more routers, e.g., through a mesh or partial mesh topology in a hub-and-spoke architecture. In some examples, the interconnected switches and routers include a wired Local Area Network (LAN) on site 102 to carry wireless network 106.
Example network system 100 also includes various networking components for providing networking services within a wired network, including, for example, authentication, authorization, and accounting (AAA) server 110 for authenticating users and/or UEs 148, dynamic Host Configuration Protocol (DHCP) server 116 for dynamically assigning network addresses (e.g., IP addresses) to UEs 148 upon authentication, domain Name System (DNS) server 122 for resolving domain names to network addresses, a plurality of servers 128A-128X (collectively "servers 128") (e.g., web servers, database servers, file servers, etc.), and Network Management System (NMS) 130. As shown in fig. 1A, various devices and systems of network 100 are coupled together by one or more networks (e.g., the internet and/or an intranet) 134.
In the example of fig. 1A, NMS 130 is a cloud-based computing platform that manages wireless networks 106A-106N at one or more sites 102A-102N. As further described herein, NMS 130 provides an integrated management tool kit of the present disclosure and implements various techniques. In general, NMS 130 may provide a cloud-based platform for wireless network data acquisition, monitoring, activity logging, reporting, predictive analysis, network anomaly identification, and alarm generation. In some examples, NMS 130 outputs notifications (e.g., alarms, warnings, graphical indicators on the dashboard, log messages, text/SMS messages, email messages, etc.) and/or suggestions regarding wireless network problems to a site or network administrator ("admin") interacting with management device 111 and/or operating management device 111. Further, in some examples, NMS 130 operates in response to configuration inputs received from an administrator interacting with and/or operating management device 111.
The administrator and management device 111 may include IT personnel and administrator computing devices associated with the switch 146 and/or the site 102 at the edge of the wired network. The management device 111 may be implemented as any device suitable for presenting output and/or accepting user input. For example, the management device 111 may include a display. The management device 111 may be a computing system, such as a mobile or non-mobile computing device operated by a user and/or administrator. For example, management device 111 may represent a workstation, a laptop or notebook, a desktop, a tablet, or any other computing device operable by a user and/or presenting a user interface, based on one or more aspects of the present disclosure. The management device 111 may be physically separate and/or located in a different location from the NMS130 such that the management device 111 may communicate with the NMS130 via the network 134 or other communication means.
In some examples, one or more NAS devices (e.g., AP 142, switch 146, or router) may be connected to edge devices 150A-150N via a physical cable (e.g., ethernet cable). Edge device 150 includes a cloud managed, wireless Local Area Network (LAN) controller. Each of the edge devices 150 may contain a locally deployed device at the site 102 that communicates with the NMS130 to extend some micro services from the NMS130 to the locally deployed NAS devices while using the NMS130 and its distributed software architecture for scalable and resilient operation, management, troubleshooting, and analysis.
Each network device of network system 100 (e.g., servers 110, 116, 122 and/or 128, AP 142, UE 148, switch 146) and any other servers or devices attached to or forming part of network system 100 may contain a system log or error log module in which each of these network devices records the status of the network device, including normal operating status and error conditions. In this disclosure, one or more network devices of network system 100 (e.g., servers 110, 116, 122 and/or 128, AP 142, UE 148, and switch 146) may be considered "third party" network devices when they belong to and/or are associated with an entity other than NMS 130, and thus NMS 130 does not receive, collect, or otherwise access the record status and other data of the third party network devices. In some examples, edge device 150 may provide an agent through which the record status and other data of the third party network device may be reported to NMS 130.
In some examples, NMS 130 monitors network data 137 (e.g., one or more Service Level Expectations (SLE) metrics received from wireless networks 106A through 106N of each site 102A through 102N, respectively) and manages network resources (e.g., AP 142 of each site) to provide high quality wireless experiences to end users, ioT devices, and clients of the sites. For example, NMS 130 may include a Virtual Network Assistant (VNA) 133, which VNA 133 implements an event handling platform for providing real-time insight and simplified troubleshooting for IT operations, and automatically take corrective action or provide advice to proactively address wireless network issues. For example, VNA 133 may include an event processing platform configured to process hundreds or thousands of concurrent streams of network data 137 from sensors and/or agents associated with nodes and/or APs 142 within network 134. For example, based on the various examples described in this disclosure, VNA 133 of NMS 130 may include an underlying analysis and network error identification engine and an alarm system. The underlying analysis engine of VNA 133 may apply historical data and models to the inbound event stream to calculate assertions, such as identified anomalies or to predict the occurrence of events that constitute a network error condition. In addition, VNA 133 may provide real-time alarms and reports to notify a site or network administrator of any predicted events, anomalies, trends via management equipment 111, and may perform root cause analysis and automatic or assisted error remediation. In some examples, VNA 133 of NMS 130 may apply machine learning techniques to identify the root cause of an error condition detected or predicted from a stream of network data 137. If the root cause can be automatically resolved, the VNA 133 can invoke one or more corrective actions to correct the root cause of the error condition, thereby automatically improving the underlying SLE metrics and also automatically improving the user experience.
Further example details of the operations implemented by VNA 133 of NMS 130 are described in: U.S. patent No. 9,832,082, entitled "Monitoring Wireless Access Point Events", disclosed in 2017, 11, 28; U.S. patent publication No. US 2021/0306201, entitled "Network System Fault Resolution Using aMachine Learning Model", published on month 9 and 30 of 2021; U.S. patent No. 10,985,969, entitled "Systems and Methods for a Virtual Network Assistant", which is disclosed in 2021, month 4 and day 20; U.S. patent No. 10,958,585, titled "Methods and Apparatus for Facilitating Fault Detection and/or Predictive Fault Detection", published on day 23, 3, 2021; U.S. patent No. 10,958,537, entitled "Method for Spatio-Temporal Modeling", published on day 23 of 3 of 2021; and U.S. patent No. 10,862,742, entitled "Method for Conveying AP Error Codes Over BLE Advertisements," published in month 12 and 8 of 2020, which is incorporated herein by reference in its entirety.
In operation, NMS 130 observes, collects and/or receives network data 137, which may take the form of data extracted from messages, counters and statistics, for example. The computing device is part of NMS 130, based on one particular embodiment. Based on other embodiments, NMS 130 may include one or more computing devices, dedicated servers, virtual machines, containers, services, or other forms of environments for performing the techniques described in this disclosure. Similarly, the computing resources and components implementing VNA 133 may be part of NMS 130, may execute on other servers or execution environments, or may be distributed to nodes (e.g., routers, switches, controllers, gateways, etc.) within network 134.
Based on one or more techniques of this disclosure, NMS 130 is configured to provide a granular troubleshooting workflow at an application session level using an application session specific topology from a client device (e.g., one of UEs 148) to a cloud-based application server. During an application session (e.g., VOIP or video conference call, streaming video viewing session, or gaming session) of a cloud-based application, a client device 148 running the application exchanges data through one or more AP devices 142, one or more switches 146 at the edge of the wired network, and one or more nodes (e.g., routers, switches, controllers, gateways, etc.) within the network 134 to reach the cloud-based application server.
Based on the techniques of this disclosure, for a particular application session, application session troubleshooting engine 135 of VNA 133 generates a topology based on network data 137 received from a subset of network devices (e.g., client device 148, ap device 142, switch 146, and/or network nodes) that are involved in the particular application session for the duration of the particular application session. For example, the temporal graph database 138 is configured to store connection and entity information for the network system 100 extracted from historical telemetry data collected by the client devices 148, the AP 142, the switch 146, and/or other network nodes within the network 134 at an application session level granularity over an extended period of time (e.g., weeks or months).
The application session troubleshooting engine 135 of the VNA133 may construct an application session-specific topology for a particular application session based on the entity and connection information of the particular application session retrieved from the temporal graph database 138. In this way, the techniques of this disclosure may enable retrospective troubleshooting of a particular application session even if the current network topology changes after the particular application session ends or the current application session does not experience the same problems as the particular application session.
The disclosed technology enables troubleshooting of a particular application session by identifying connection problems for one or more of a subset of network devices involved in the particular application session for the duration of the particular application session. For example, the application session troubleshooting engine 135 of the VNA133 may generate data representing a user interface to provide a user (e.g., a network administrator or site using the management device 111) with a visualization of an application session-specific topology, including color coding, icons, or other indicia of connection problems within the topology for the duration of a particular application session. In response to user input selecting an icon representing a network device identified as having a connection problem during a particular application session, the application session troubleshooting engine 135 of the VNA133 may further generate a troubleshooting user interface for the network device, or may redirect the user to a customer insight or recommended action user interface specific to the network device.
Furthermore, the techniques of this disclosure enable the identification of third party application servers, third party service provider servers, and other third party network devices that are involved in a particular application session to provide a complete topology from client devices to cloud-based application servers. For example, the application session troubleshooting engine 135 of the VNA133 may determine which switches, routers, and/or gateways are connected to a third party application server, a third party service provider server, or other third party network device based on uplink data (e.g., LLDP advertisements) contained in the network data 137 received from the switches, routers, and/or gateways during a particular application session. The application session troubleshooting engine 135 of the VNA133 may then determine entity ID data (e.g., an IP address or an interface address) of the third party network device.
In some examples, the application session troubleshooting engine 135 of the VNA133 may perform some integration with third party application/service performance monitoring (APM) providers to retrieve insight data related to third party network devices to help determine root causes of network problems affecting users. For example, when a user encounters an online application or service (e.g., microsoft ) Can be the service itself (e.g. Teams) or the service provider (e.g. Comcast +.>) Malfunction or encountering problems. As described above, NMS 130 does not receive, collect, or otherwise access the recorded status and other data of the third party network device. Instead, the NMS 130 may utilize insight data from third party APM vendors to perform troubleshooting and determine the root cause of the network problem.
NMS 130 may handle third party integration in two different ways: on demand or active. For on-demand third party integration, the application session troubleshooting engine 135 may query the third party APM provider's online application services and/or the insight data of the service provider through the API in response to a troubleshooting request for a particular application session experiencing a problem. For active third party integration, the application session troubleshooting engine 135 may actively query the third party APM provider for insight data of the online application service and/or service provider to monitor and detect the online application service and/or service provider.
In some examples, a site or network administrator, for example, using management device 111, may initiate topology visualization and troubleshooting of a particular application session through session assistant engine 136 of VNA 133. The conversation assistant engine 136 can be configured to process user input (e.g., text strings) and generate responses. In some examples, the conversation assistant engine 136 can include one or more natural language processors configured to process user input. The conversation assistant engine 136 can be configured to simulate the manner in which humans act as conversation partners to conduct chat conversations, which can help simplify and/or improve administrator satisfaction with monitoring and controlling the network.
Based on one or more techniques of this disclosure, the conversation assistant engine 136 can generate a conversation assistant configured to receive user input. In particular use cases, an administrator may input queries for particular network devices and/or particular application sessions into the session assistant engine 136 via the management device 111. The session assistant engine 136 may provide a platform in which an administrator may be presented with and interact with the application session specific topology.
For example, the session assistant may receive a string indicating an application, duration, and/or device identifier (e.g., "troubleshooting teams call from client device a," where "teams call" indicates an application, "client device a" contains a client device identifier, or "how the DC84AP544 was in the past 7 days," where "DC84AP544" contains an AP device identifier, and "7 days" represents a duration). In some cases, the session assistant may receive a string indicating the application, duration, and/or user identifier (e.g., "troubleshooting user B teams call," where "user B" is a user of the client device and "teams call" represents the application). The session assistant engine 136 may determine a particular network device of the plurality of network devices based on the user input and determine one or more application sessions to which the particular network device relates. For example, if the indicated application and/or the indicated duration are provided to the session assistant in user input, the session assistant engine 136 may automatically filter application sessions for a particular network device based thereon. Without including additional session identification information in the user input, the session assistant engine 136 may identify all application sessions for a particular network device for a default duration (e.g., today or the past 7 days). In another case where no additional session identification information is included in the user input, the session assistant engine 136 may filter out high quality application sessions to identify one or more application sessions of a particular network device that have experienced a connection problem recently or within a default duration.
Once a particular application session is identified, the application session troubleshooting engine 135 constructs an application session specific topology for the particular application session based on the data of the particular application session retrieved from the temporal graph database 138. The application session troubleshooting engine 135 generates data representing an application session-specific topology for presentation to an administrator using the management device 111 in a session assistance. The visualizations include color codes, icons, or other indicia of connection problems within the topology during a particular application session, as determined by the application session troubleshooting engine 135 based on temporal data stored as network data 137 and/or temporal graph database 138. In this example, an administrator using management device 111 can interact with the application session specific topology presented in the session facilitation to select an icon indicating a network device in the topology that is identified as having a connection problem during a particular application session. In response to selecting a network device, the application session troubleshooting engine 135 can further generate a troubleshooting user interface for the network device for presentation within the session assistant. Alternatively, the application session troubleshooting engine 135 can redirect the user to a customer insight or recommended action user interface specific to the network device. Additional information about session assistants is described in U.S. patent application Ser. No. 17/647,954, filed on 1/13 at 2022, entitled "CONVERSATIONAL ASSISTANT FOR OBTAINING NETWORK INFORMATION" (attorney docket No. JNP3538-US/2014-515US 01), the entire contents of which are incorporated herein by reference.
The disclosed technology provides one or more technical advantages and practical applications. For example, the techniques can determine an application session specific topology from a client to a cloud, thereby enabling troubleshooting of a particular application session based on topology and connection issues related to the particular application session for the duration of the particular application session. These techniques allow retrospective lookup troubleshooting to determine what resulted in a low quality session even if the current network topology has changed or the problem has been solved. In addition, these techniques are also capable of troubleshooting (including root cause analysis) connection problems experienced by any network device in the application session specific topology that affect the user during the duration of a particular application session. This includes third party network devices that may be owned and/or associated by an entity other than NMS 130, so that NMS 130 does not receive, collect, or otherwise access network data for the third party network devices.
Although the techniques of this disclosure are described in this example as being performed by NMS 130, the techniques described in this example may be performed by any other computing device, system, and/or server, and this disclosure is not limited in this respect. For example, one or more computing devices configured to perform the functions of the disclosed techniques may reside in a dedicated server, or be included with any other server of NMS 130 or any other server different from NMS 130, or may be distributed throughout network 100, and may or may not form part of NMS 130.
Fig. 1B is a block diagram illustrating further example details of the network system of fig. 1A. In this example, fig. 1B illustrates that NMS 130 is configured to operate in accordance with an artificial intelligence/machine learning-based computing platform providing comprehensive automation, insight and assurance (Wi-Fi assurance, wire assurance and WAN (assurance)) spanning from "clients" (e.g., user devices 148 (leftmost in fig. 1B) connected to wireless network 106 and wire lan 175 at the network edge) to "clouds" (e.g., cloud-based application services 181 (rightmost in fig. 1B) that may be hosted by computing resources within data center 179).
As described herein, NMS 130 provides an integrated management tool suite and implements the various techniques of the disclosure. In general, NMS 130 may provide a cloud-based platform for wireless network data acquisition, monitoring, activity logging, reporting, predictive analysis, network anomaly identification, and alarm generation. For example, the network management system 130 may be configured to actively monitor and adaptively configure the network 100 to provide self-driving capabilities. In addition, the VNA 133 also includes a natural language processing engine to provide AI-driven support and troubleshooting, anomaly detection, AI-driven location services, and AI-driven Radio Frequency (RF) optimization with reinforcement learning.
As shown in the example of fig. 1B, AI-driven NMS 130 also provides for configuration management, monitoring, and automatic supervision of a software-defined wide area network (SD-WAN) 177 that operates as an intermediate network communicatively coupling wireless network 106 and wired LAN 175 to a data center 179 and application services (e.g., a cloudy application) 181. In general, SD-WAN 177 provides a seamless, secure, traffic engineering connection between "spoke" router 187A (e.g., a branch or campus network) hosting edge wired network 175 of wireless network 106 to "hub" router 187B of further up the cloud stack to cloud-based application service 181. SD-WAN 177 typically operates and manages the overlay network over an underlying physical Wide Area Network (WAN) that provides connectivity to geographically independent customer networks. In other words, SD-WAN 177 extends Software Defined Network (SDN) capabilities to WANs and allows networks to decouple the underlying physical network infrastructure from virtualized network infrastructure and applications so that configuration and management can be done in a flexible and extensible manner.
In some examples, the underlying router of SD-WAN 177 may implement a stateful, session-based routing scheme in which routers 187A, 187B dynamically modify the content of the original packet header originating from client device 148 to direct traffic along a selected path (e.g., path 189) to application service 181 without the use of tunnels and/or additional labels. In this way, the routers 187A, 187B may be more efficient and scalable for large networks because the use of tunnel-free, session-based routing may enable the routers 187A, 187B to achieve considerable network resources by eliminating the need to perform encapsulation and decapsulation at tunnel endpoints. Further, in some examples, each router 187A, 187B may perform path selection and traffic engineering independently to control the flow of data packets associated with each session without requiring the use of a centralized SDN controller for path selection and label distribution. In some examples, routers 187A, 187B implement session-based routing as Security Vector Routing (SVR) (provided by Juniper Networks, inc.).
Additional information description about session-based routing and SVR is found in: U.S. patent No. 9,729,439, entitled "COMPUTER NETWORK PACKET FLOW CONTROLLER", issued 8/2017; U.S. patent No. 9,729,682, entitled "NETWORK DEVICE AND METHOD FOR PROCESSING ASESSION USING A PACKET SIGNATURE", issued on 8/2017; U.S. patent No. 9,762,485, entitled "NETWORK PACKET FLOW CONTROLLER WITH EXTENDED SESSION MANAGEMENT", issued on the 9 th and 12 th 2017; U.S. patent No. 9,871,748, entitled "ROUTER WITH OPTIMIZED STATISTICAL FUNCTIONALITY", issued on 1/16/2018; U.S. patent No. 9,985,883, entitled "NAME-BASED ROUTING SYSTEM AND METHOD", issued on 5/29/2018; U.S. patent No. 10,200,264, entitled "LINK STATUS MONITORING BASED ON PACKET LOSS DETECTION", issued on 5/2/2019; U.S. patent No. 10,277,506, entitled "STATEFUL LOAD BALANCING IN A STATELESS NETWORK," issued on the 4 th month 30 th 2019; U.S. patent No. 10,432,522, entitled "NETWORK PACKET FLOW CONTROLLER WITH EXTENDED SESSION MANAGEMENT", issued on 10/1/2019; and U.S. patent No. 11,075,824, titled "IN-LINE PERFORMANCE MONITORING," issued to 2021, 7, 27, wherein the entire contents of each are incorporated herein by reference IN their entirety.
In some examples, AI-driven NMS 130 may implement intent-based configuration and management of network system 100, including constructing, representing, and executing intent-driven workflows to configure and manage devices associated with wireless network 106, wired LAN network 175, and/or SD-WAN 177. For example, declarative requirements express the desired configuration of network components without specifying the exact local device configuration and control flow. By utilizing declarative requirements, it is possible to specify what should be done, not how. Declarative requirements may be contrasted with imperative instructions that describe the exact device configuration syntax and control flow that implements the configuration. By utilizing declarative requirements rather than imperative instructions, the user and/or user system eases the burden of determining the exact device configuration needed to achieve the user/system's intended results. For example, when using a variety of different types of devices from different vendors, it is often difficult and burdensome to specify and manage precise imperative instructions to configure each device of the network. The type and kind of network devices may change dynamically as new devices are added and device failures occur. Managing various different types of devices from different vendors with different configuration protocols, grammars, and software versions to configure a cohesive network of devices is often difficult to achieve. Thus, by requiring only the user/system to specify declarative requirements to specify desired results applicable to a variety of different types of devices, management and configuration of network devices will become more efficient. Further example details and technical descriptions of Intent-based network management systems are found in U.S. patent 10,756,983 entitled "Intent-based analysis" and U.S. patent 10,992,543 entitled "Automatically generating an Intent-based network model of an existing computer network," each of which are incorporated herein by reference.
In accordance with the techniques described in this disclosure, for a particular application session, the application session troubleshooting engine 135 of the VNA 133 generates a topology based on network data 137 received from a subset of network devices (e.g., client devices 148, AP devices supporting the wireless network 106, switches 146 supporting the wired LAN 178, and routers 187A, 187B supporting the SD-WAN 177) that are involved in the particular application session during the particular application session. More specifically, the application session troubleshooting engine 135 can construct an application session-specific topology based on the data of the particular application session retrieved from the temporal map database 138 of the network. The time map database 138 is configured to store, at an application session level granularity, entity and connection information extracted from historical telemetry data collected by the network device over an extended period of time (e.g., weeks or months). The techniques of this disclosure may enable retrospective troubleshooting of a particular application session even if the current network topology changes after the particular application session or the current application session does not experience the same problems as the particular application session. In this way, VNA 133 provides WAN guarantees for application sessions between client devices 148 connected to wireless network 106 and wired LAN 175 and cloud-based application service 181, which may be hosted by computing resources within data center 179.
Fig. 2 is a block diagram of an example Access Point (AP) device 200 configured based on one or more techniques of this disclosure. The example access point 200 shown in fig. 2 may be used to implement any of the APs 142 shown and described herein with respect to fig. 1A. The access point 200 may include, for example, a Wi-Fi, bluetooth, and/or Bluetooth Low Energy (BLE) base station or any other type of wireless access point.
In the example of fig. 2, access point 200 includes a wired interface 230, wireless interfaces 220A-220B, one or more processors 206, memory 212, and input/output 210, coupled together by bus 214, through which various elements may exchange data and information. The wired interface 230 represents a physical network interface including a receiver 232 and a transmitter 234 for transmitting and receiving network communications (e.g., data packets). The wired interface 230 connects the access point 200 directly or indirectly to a wired network device (such as one of the switches 146 in fig. 1A) via a cable (e.g., an ethernet cable).
First wireless interface 220A and second wireless interface 220B represent wireless network interfaces and include receivers 222A and 222B, respectively, each of which includes a receive antenna through which access point 200 may receive wireless signals from a wireless communication device, such as UE 148 of fig. 1A. First wireless interface 220A and second wireless interface 220B also include transmitters 224A and 224B, respectively, including transmit antennas through which access point 200 may transmit wireless signals to a wireless communication device, such as UE 148 of fig. 1A. In some examples, the first wireless interface 220A may include a Wi-Fi 802.11 interface (e.g., 2.4GHz and/or 5 GHz), and the second wireless interface 220B may include a bluetooth interface and/or a Bluetooth Low Energy (BLE) interface.
The processor 206 is a programmable hardware-based processor configured to execute software instructions, such as instructions for defining software or a computer program, that are stored in a computer-readable storage medium (e.g., memory 212), such as a non-transitory computer-readable medium, including a storage device (e.g., disk drive or optical drive) or memory (e.g., flash memory or RAM) or any other type of volatile or non-volatile memory, that stores instructions to cause the one or more processors 206 to perform the techniques described herein.
Memory 212 includes one or more devices configured to store programming modules and/or data related to the operation of access point 200. For example, memory 212 may include a computer-readable storage medium, such as a non-transitory computer-readable storage medium including a storage device (e.g., a disk drive or optical drive) or memory (e.g., flash memory or RAM) or any other type of volatile or non-volatile memory, that stores instructions to cause one or more processors 206 to perform the techniques described herein.
In this example, memory 212 stores executable software including an Application Programming Interface (API) 240, a communication manager 242, configuration/radio settings 250, a device status log 252, and data 254. The device status log 252 contains a list of events specific to the access point 200. These events may include a log of normal and error events, such as memory state, restart or restart events, crash events, cloud disconnect and self-restore events, low link speed or link speed oscillation events, ethernet port state, ethernet interface packet errors, upgrade failure events, firmware upgrade events, configuration changes, etc., and time and date stamps for each event. The log controller 255 determines the log level of the device based on instructions from the NMS 130. Data 254 may store any data used and/or generated by access point 200, including data collected from UE 148, such as data used to calculate one or more SLE metrics, transmitted by access point 200 for cloud-based management of wireless network 106A by NMS 130.
Input/output (I/O) 210 represents physical hardware components, such as buttons, displays, etc., that are capable of interacting with a user. Although not shown, memory 212 typically stores executable software for controlling a user interface with respect to inputs received through I/O210. Communication manager 242 includes program code that, when executed by processor 206, allows access point 200 to communicate with UE 148 and/or network 134 over any interface 230 and/or 220A-220C. Configuration settings 250 include any device settings of access point 200, such as radio settings of each wireless interface 220A-220C. These settings may be manually configured or may be monitored and managed remotely by NMS 130 to optimize wireless network performance on a periodic basis (e.g., hourly or daily).
As described herein, AP device 200 may measure network data from status log 252 and report to NMS 130. Network data can include event data, telemetry data, and/or other SLE-related data. The network data may include various parameters indicative of wireless network performance and/or status. These parameters may be measured and/or determined by one or more UE devices and/or one or more APs in the wireless network. NMS 130 may determine one or more SLE metrics based on SLE related data received from APs in the wireless network and store the SLE metrics as network data 137 (fig. 1A). NMS 130 may further update time graph database 138 (fig. 1A) of the network to include telemetry data received from APs in the wireless network over time, or at least entity and connection information extracted from the telemetry data.
Fig. 3 is a block diagram of an example Network Management System (NMS) 300 based on one or more techniques of the present disclosure, the NMS 300 configured to provide a granular troubleshooting workflow at an application session level using an application session specific topology from a client device to a cloud-based application server. NMS 300 may be used to implement NMS 130 in fig. 1A-1B, for example. In such an example, NMS 300 is responsible for monitoring and managing one or more wireless networks 106A-106N of sites 102A-102N, respectively.
NMS 300 includes a communication interface 330, one or more processors 306, a user interface 310, memory 312, and database 318. The various elements are coupled together by a bus 314 on which they may exchange data and information. In some examples, NMS 300 receives data from one or more of client device 148, ap 142, switch 146, and other network nodes within network 134 (e.g., router 187 of fig. 1B) that can be used to calculate one or more SLE metrics and/or update temporal graph database 317. The NMS 300 analyzes these data for cloud-based management of the wireless networks 106A through 106N. The received data is stored as network data 316 in a database 318 and the telemetry data contained in the received data or at least the entity and connection information extracted from the telemetry data is stored in a temporal map database 317 in the database 318. In some examples, NMS 300 may be part of another server as shown in fig. 1A, or part of any other server.
Processor 306 executes software instructions (such as software instructions for defining software or a computer program) and stores them onto a computer-readable storage medium (e.g., memory 312), such as a non-computer-readable medium including a storage device (e.g., a disk drive, or optical drive) or memory (e.g., flash memory or RAM) or any other type of volatile or non-volatile memory, the stored instructions causing one or more processors 306 to perform the techniques described herein.
The communication interface 330 may comprise, for example, an ethernet interface. Communication interface 330 couples NMS 300 to a network and/or the internet (e.g., any of networks 134 and/or any of the local area networks shown in fig. 1A). Communication interface 330 includes receiver 332 and transmitter 334, nms 300 receives data and information from any client device 148, ap 142, switch 146, servers 110, 116, 122, 128 and/or any other network node, device or system forming part of network system 100 such as shown in fig. 1A, and/or transmits data and information to any client device 148, ap 142, switch 146, servers 110, 116, 122, 128 and/or any other network node, device or system forming part of network system 100 such as shown in fig. 1A via communication interface 330. In some scenarios described herein, network system 100 includes a "third party" network device owned and/or associated by an entity other than NMS 300, and NMS 300 does not receive, collect, or otherwise access network data from the third party network device.
The data and information received by NMS 300 may include, for example, telemetry data, SLE-related data, or event data received from one or more of client devices AP 148, AP 142, switch 146, or other network nodes (e.g., router 187 of fig. 1B), used by NMS 300 to remotely monitor the performance of wireless networks 106A-106N and application sessions from client devices to cloud-based application servers. NMS 300 may further transmit data to any network device (e.g., client device 148, AP 142, switch 146, other network nodes within network 134, management device 111) via communication interface 330 to remotely manage wireless networks 106A through 106N and portions of the wired network.
Memory 312 includes one or more devices configured to store programming modules and/or data related to the operation of NMS 300. For example, memory 312 may include a computer-readable storage medium, such as a non-transitory computer-readable medium including a storage device (e.g., a magnetic disk drive or optical disk drive) or memory (e.g., flash memory or RAM) or any other type of volatile or non-volatile memory, that stores instructions that cause one or more processors 306 to perform the techniques described herein.
In this example, memory 312 includes an API 320, SLE module 322, virtual Network Assistant (VNA)/AI engine 350, and Radio Resource Manager (RRM) 360. Based on the disclosed technology, the VNA/AI engine 350 includes an application session troubleshooting engine 352 that builds an application session-specific topology for a particular application session based on the data of the particular application session retrieved from the temporal graph database 317. In some examples, the application session troubleshooting engine 352 applies the ML model 380 to the network data 316 and/or the temporal graph database 317 to perform troubleshooting of a particular application session by identifying a root cause of a connectivity problem related to one or more of a subset of network devices of the particular application session. NMS 300 may also include any other programming modules, software engines, and/or interfaces configured to remotely monitor and manage wireless networks 106A through 106N and portions of the wired network, including any AP 142/200, switch 146, or other network devices (e.g., router 187 of fig. 1B).
SLE module 322 can set and track thresholds for SLE metrics for each network 106A-106N. SLE module 322 further analyzes SLE related data collected by the APs (e.g., any AP 142 from the UE in each wireless network 106A-106N). For example, APs 142A-1 through 142A-N collect SLE-related data from UEs 148A-1 through 148A-N currently connected to wireless network 106A. These data are transmitted to NMS 300 for execution by SLE module 322 to determine one or more SLE metrics for each UE 148A-1 through 148A-N currently connected to wireless network 106A. These data, as well as any network data collected by one or more APs 142A-1 through 142A-N in wireless network 106A, are transmitted to NMS 300 and stored, for example, as network data 316 in database 318.
RRM engine 360 monitors one or more metrics for each site 102A-102N to learn and optimize the RF environment for each site. For example, RRM engine 360 can monitor coverage and capacity SLE metrics for wireless network 106 at sites 102 to identify potential problems with SLE coverage and/or capacity in wireless network 106 and adjust the radio settings of the access points for each site to address the identified problems. For example, RRM engine 360 may determine the channel and transmit power distribution of all APs 142 in each network 106A-106N. For example, RRM engine 360 may monitor events, power, channels, bandwidth, and the number of clients connected to each AP. RRM engine 360 may further automatically alter or update the configuration of one or more APs 142 at station 102 in order to improve coverage and capacity SLE metrics, thereby providing an improved wireless experience for the user.
The VNA/AI engine 350 analyzes data received from the network devices and its own data to identify when an undesirable to abnormal state is encountered at one of the network devices. For example, the VNA/AI engine 350 can identify the root cause of any undesired or abnormal state, e.g., any bad SLE metric indicative of a connection problem for one or more network devices. In addition, the VNA/AI engine 350 can automatically invoke one or more corrective actions to address the identified root cause of the one or more bad SLE metrics. Examples of corrective actions that VNA/AI engine 350 may automatically invoke may include, but are not limited to, invoking RRM 360 to restart one or more APs, adjusting/modifying transmit power of a particular radio in a specialized AP, adding an SSID configuration for the specialized AP, changing channels of an AP or a group of APs, and the like. The corrective action may further include restarting the switch and/or router, invoking a download of new software to the AP, switch or router, etc. These corrective actions are merely examples and the present disclosure is not limited in this respect. If there is no automatic corrective action or the root cause cannot be adequately resolved, the VNA/AI engine 350 can proactively provide notifications, including suggested corrective actions that an IT person (e.g., a site or network administrator using the management device 111) should take to resolve the network error.
A problem in cloud-based management wireless networks is how to troubleshoot and improve the user experience for high user sensitivity applications (e.g., VOIP applications, streaming video applications, gaming applications, or video conferencing applications) at the streaming session level. The current industry standard for high user-sensitive applications is to provide an overall score of quality (i.e., a Mean Opinion Score (MOS) with a range of 1 to 5 points) for voice or video sessions. The quality scores of the sessions are typically presented separately, without additional analysis or troubleshooting to identify the root cause of the low quality scores.
Based on one or more techniques of the present disclosure, the VNA/AI engine 350 is configured to provide a granular troubleshooting workflow at an application session level using an application session specific topology from a client to a cloud. For a particular application session, the application session troubleshooting engine 352 generates a topology of the network devices and connections between the network devices that are involved in the particular application session for the duration of the particular application session. The application session specific topology is built based on telemetry data received from network devices (e.g., client device 148, AP device 142, switch 146, and other network nodes such as router 187 in fig. 1B) during the duration of a particular application session. The application session troubleshooting engine 352 enables visualization of application session-specific topologies, including color coding, icons, or other indicia of connection problems within the topology for the duration of a particular application session.
The temporal graph database 317 is configured to store connection and entity information from within the network extracted from historical telemetry data collected by the NMS 300 from client devices 148, AP 142, switch 146, and/or other network nodes within the network 134 for an extended period of time (e.g., weeks or months) with application session level granularity. The connection information may represent different types of connections, including wireless, wired, and logical links, such as a peer-to-peer path or IPsec tunnel for the SD-WAN device, such as router 187 of SD-WAN 177 in fig. 1B. The entity information may represent different types of network devices including client devices, AP devices, switches, other network nodes (e.g., routes and gateways), third party network devices, and applications running on the network devices. NMS 300 updates temporal map database 317 with application session level connection and entity information, where the map represents the application session level network topology over a period of time.
An application session includes a user session with the application, such as a VOIP or video conference call, a streaming video viewing session, or a gaming session. The application session may include a plurality of application streams (e.g., 10s to 100s of application streams), each of which contains network-level streams (e.g., defined by 5-tuples) between network devices during the application session. For example, in a one-hour duration VOIP call, a client device running an application may connect to multiple different AP devices (e.g., if the client device moves during a session) and generate multiple application flows for the application session. Further, each AP device may connect to one or more of a switch, router, and/or gateway, up to a cloud-based application server, where each new connection may contain another application flow of an application session.
The application session troubleshooting engine 352 correlates the plurality of application streams of the identified application session and uses the application stream data to determine a subset of network devices within the network that are associated with the application session for the duration of the application session. The application session troubleshooting engine 352 then retrieves from the temporal graph database 317 all AP devices, switches, routers, and/or gateways, and connection and entity information for cloud-based application servers to which the client device running the application was connected for the duration of the application session.
The application session troubleshooting engine 352 may construct an application session-specific topology for the application session based on the entity and connection information of the application session retrieved from the temporal graph database 317. In this way, the techniques of this disclosure may enable retrospective troubleshooting of application sessions even if the current network topology changes after a particular application session ends or the current application session does not experience the same problems as the particular application session.
The application session troubleshooting engine 352 may further implement the troubleshooting of the application session by identifying one or more connection problems for a subset of network devices involved in the particular application session for the duration of the particular application session. For example, the application session troubleshooting engine 352 analyzes the network data 316 related to the subset of network devices for the particular application session to identify root causes of connection problems related to one or more of the subset of network devices for the particular application session. More specifically, the application session troubleshooting engine 352 may analyze event data contained in or derived from the network data 316 to determine if a connection problem exists. In some scenarios, the application session troubleshooting engine 352 may apply at least a portion of the network data 316 to the ML model 380 to determine the root cause of the connection problem.
Event data identifying a connection problem and the root cause of the connection problem may be different for each different network device involved in a particular application session. For example, the application session troubleshooting engine 352 may analyze the application activity event data to identify connection problems caused by applications running on the cloud-based application server. The application session troubleshooting engine 352 may analyze pre-connection problems, such as DNS, DHCP, and Address Resolution Protocol (ARP) problems, to identify connection problems caused by the client devices 148/400. The application session troubleshooting engine 352 may analyze the AP health, radio health, pre-connection issues, RF issues, and/or configuration issues to identify connection issues caused by the access points 142/200. The application session troubleshooting engine 352 may analyze switch health, cable issues, missing Virtual Local Area Networks (VLANs), congestion, and/or configuration issues to identify connection issues caused by the switch 146. The application session troubleshooting engine 352 may analyze gateway health, WAN links, and/or configuration issues to identify connectivity issues caused by the gateways 187A, 187B or routers of the SD-WAN 177.
The application session troubleshooting engine 352 may generate data representing a user interface to provide a user (e.g., a site or network administrator using the management device 111) with a visualization of application session-specific topology, including color coding, icons, or other indicia of connection problems within the topology during a particular application session. In response to user input selecting an icon representing a network device identified as having a connection problem during a particular application session, the application session troubleshooting engine 352 may further generate a troubleshooting user interface for the network device, or may redirect the user to a client insight or recommended action user interface specific to the network device. The application session troubleshooting engine 352 can utilize the API to expose appropriate data from the timing diagram database 317 to a user interface to build and visualize an application session specific topology for a particular application session.
In addition, the application session troubleshooting engine 352 can implement identifying third-party application servers, third-party service provider servers, and other third-party network devices related to a particular application session to provide a complete topology from the client device to the cloud-based application server. For example, the application session troubleshooting engine 352 may determine which switches, routers, and/or gateways are connected to a third party application server, a third party service provider server, or other third party network device based on uplink data (e.g., LLDP advertisements) contained in network data 316 received from the switches, routers, and/or gateways during a particular application session. The application session troubleshooting engine 352 may then determine entity ID data (e.g., IP address or interface address) of the third-party network device. In some examples, the application session troubleshooting engine 352 may have some integration with the APM provider to retrieve insight data of online application services and/or service providers through the API to determine if the application services and/or service providers are malfunctioning or experiencing problems.
In some examples, ML model 380 may include a supervised ML model that is trained using training data composed of pre-collected, labeled network data received from network devices (e.g., client devices, APs, switches, and/or other network nodes) to identify root causes of connection problems for one or more network devices of a subset of network devices involved in a particular application session. The supervised ML model may include logistic regression, hackingPlain BayesBayesian), a Support Vector Machine (SVM), etc. In other examples, ML model 380 may include an unsupervised ML model. Although not shown in fig. 3, in some examples, database 318 may store training data and VNA/AI engine 350 or a dedicated training module may be configured to train ML model 380 based on the training data to determine appropriate weights for one or more features of the training data.
In the event that a connection problem is detected in one or more network devices involved in an application session, the application session troubleshooting engine 352 generates data representative of a user interface to provide a user (e.g., a site or network administrator using the management device 111) with a visualization of the application session-specific topology, including color coding, icons, or other indicia of the connection problem within the topology for the duration of the particular application session. In some examples, the VNA/AI engine 350 may determine the suggested action based on the detected connection problem and/or the root cause determined for the detected connection problem. The VNA/AI engine 350 may output a notification of the connectivity problem and/or the root cause of the connectivity problem through the communication interface 330 for display on the administrator's management device 111 via one or more of the user interface 310, the API 320, a web hook (webhook), or an email.
In some examples, a site or network administrator (e.g., using management device 111) may initiate topology visualization and troubleshooting of a particular application session through session assistant engine 356. Session assistant engine 356 may be configured to process user input (e.g., text strings) and generate a response. In some examples, session assistant engine 356 may include one or more natural language processors configured to process user input. Session assistant engine 356 may be configured to conduct chat conversations that simulate the manner in which humans act as conversation partners, which may help simplify and/or improve administrator satisfaction with monitoring and controlling the network.
Based on one or more techniques of this disclosure, session assistant engine 356 can generate a session assistant configured to receive user input. In particular use cases, an administrator, via management device 111, may input queries for particular network devices and/or particular application sessions into session assistant engine 356. Session assistant engine 356 can provide a platform in which an administrator is presented with an application session specific topology and with which the administrator can interact with the application session specific topology.
For example, the session assistant may receive a string indicating an application, duration, and/or device identifier (e.g., "troubleshooting teams call from client device a," where "teams call" indicates an application, "client device a" contains a client device identifier, or "how the DC84AP544 was in the past 7 days," where "DC84AP544" contains an AP device identifier, and "7 days" represents a duration). In some cases, the session assistant may receive a string indicating the application, duration, and/or user identifier (e.g., "troubleshooting user B teams call," where "user B" is a user of the client device and "teams call" represents the application). Session assistant engine 356 may determine a particular network device of the plurality of network devices based on the user input and determine one or more application sessions to which the particular network device relates. For example, if there is an indicated application and/or an indicated duration in the user input to the session assistant, session assistant engine 356 may automatically filter application sessions for a particular network device based on the indicated application and/or the indicated duration. Without including additional session identification information in the user input, session assistant engine 356 may identify all application sessions for a particular network device for a default duration (e.g., today or the past 7 days). In another case where no additional session identification information is included in the user input, session assistant engine 356 may filter out high quality application sessions to identify one or more application sessions of a particular network device that have experienced a connection problem recently or during a default duration.
Once a particular application session is identified, the application session troubleshooting engine 352 builds an application session-specific topology for the particular application session based on the data of the particular application session retrieved from the temporal graph database 317. The application session troubleshooting engine 352 generates data representing an application session-specific topology for presentation to an administrator using the management device 111 in a session assistance. The visualization includes color coding, icons, or other indicia of connection problems within the topology during a particular application session, as determined by the application session troubleshooting engine 352 based on temporal data stored as network data 316 and/or temporal graph database 317. In this example, an administrator using management device 111 can interact with the application session specific topology presented in the session facilitation to select an icon indicating a network device in the topology that is identified as having a connection problem during a particular application session. In response to selecting a network device, the application session troubleshooting engine 352 may further generate a troubleshooting user interface for the network device for presentation within the session assistant. Alternatively, the application session troubleshooting engine 352 may redirect the user to a client insight or recommended action user interface specific to the network device.
The disclosed technology provides one or more technical advantages and practical applications. For example, the techniques can determine an application session specific topology from a client to a cloud, thereby enabling troubleshooting of a particular application session based on topology and connection issues associated with the particular application session for the duration of the particular application session. These techniques enable retrospective lookup troubleshooting to determine what resulted in a low quality session even if the current network topology has changed or the problem has been solved. In addition, these techniques are also capable of troubleshooting (including root cause analysis) connection problems experienced by any network device in the application session specific topology that affect the user during the duration of a particular application session. This includes third party network devices that may be owned and/or associated by an entity other than NMS 130, so that NMS 130 does not receive, collect, or otherwise access network data for the third party network devices.
Although the techniques of this disclosure are described in this example as being performed by NMS 130, the techniques described in this example may be performed by any other computing device, system, and/or server, and this disclosure is not limited in this respect. For example, one or more computing devices configured to perform the functions of the disclosed techniques may reside in a dedicated server, or be included in any other server of NMS 130 in addition to or in any other server other than NMS 130, or may be distributed throughout network 100 and may or may not form part of NMS 130.
Fig. 4 shows an example User Equipment (UE) device 400. The example UE device 400 shown in fig. 4 may be used to implement any UE 148 shown and described herein with respect to fig. 1A. UE device 400 may include any type of wireless client device and the present disclosure is not limited in this respect. For example, UE device 400 may include a mobile device such as a smart phone, tablet or notebook, personal Digital Assistant (PDA), wireless terminal, smart watch, smart ring, or any other type of mobile or wearable device. Based on the techniques described in this disclosure, UE 400 may also include a wired client-side device, e.g., an internet of things device, such as a printer, security sensor or device, environmental sensor, or any other device connected to a wired network and configured to communicate over one or more wireless networks.
UE device 400 includes wired interface 430, wireless interfaces 420A-420C, one or more processors 406, memory 412, and user interface 410. The various elements are coupled together by a bus 414, which may exchange data and information over the bus 414. The wired interface 430 represents a physical network interface and includes a receiver 432 and a transmitter 434. If desired, the wired interface 430 may be used to directly or indirectly couple the UE 400 to a wired network device (such as one of the switches 146 of FIG. 1A) within a wired network via a cable (such as one of the Ethernet cables 144 of FIG. 1A).
The first wireless interface 420A, the second wireless interface 420B, and the third wireless interface 420C include receivers 422a,422B, and 422C, respectively, each including a receive antenna through which the UE 400 may receive wireless signals from a wireless communication device (such as the AP 142 of fig. 1A, the AP 200 of fig. 2, other UEs 148, or other devices configured for wireless communication). The first wireless interface 420A, the second wireless interface 420B, and the third wireless interface 420C also include transmitters 424A, 424B, and 424C, respectively, each including a transmit antenna through which the UE 400 may transmit wireless signals to wireless communication devices (such as the AP 142 of fig. 1A, the AP 200 of fig. 2, other UEs 148, and/or other devices configured for wireless communication). In some examples, the first wireless interface 420A may include a Wi-Fi 802.11 interface (e.g., 2.4GHz and/or 5 GHz), and the second wireless interface 420B may include a bluetooth interface and/or a bluetooth low energy interface. The third wireless interface 420C may include, for example, a cellular interface through which the UE device 400 may connect to a cellular network.
The processor 406 executes software instructions, such as software instructions for defining software or a computer program, that are stored on a computer-readable storage medium (e.g., memory 412), such as a non-transitory computer-readable medium including a storage device (e.g., a disk drive, or optical drive) or memory (e.g., flash memory or RAM) or any other type of volatile or non-volatile memory, that stores the instructions to cause the one or more processors 406 to perform the techniques described herein.
Memory 412 includes one or more devices configured to store programming modules and/or data related to the operation of UE 400. For example, memory 412 may include a computer-readable storage medium, such as a non-transitory computer-readable medium including a storage device (e.g., a disk drive or optical drive) or memory (e.g., flash memory or RAM) or any other type of volatile or non-volatile memory that stores instructions to cause one or more processors 406 to perform the techniques described herein.
In this example, memory 412 includes an operating system 440, application programs 442, communication modules 444, configuration settings 450, and data 454. The communication module 444 includes program code that, when executed by the processor 406, enables the UE 400 to communicate using any of the wired interface 430, the wireless interfaces 420A-420B, and/or the cellular interface 450C. Configuration settings 450 include any device settings for wireless interfaces 420A-420B and/or cellular interface 420C for UE 400 settings.
The data 454 may include, for example, a status/error log including a list of events specific to the UE 400. The events may include a log of normal events and error events at the log level based on instructions from the NMS 130. Data 454 may include any data used and/or generated by UE 400, such as data used to calculate one or more SLE metrics or identify relevant behavior data, which is collected by UE 400 and transmitted directly to NMS 130 or to any AP 142 in wireless network 106 for further transmission to NMS 130.
As described herein, UE 400 may measure network data from data 454 and report to NMS130. Network data can include event data, telemetry data, and/or other SLE related data. The network data may include various parameters that indicate the performance and/or status of the wireless network. NMS130 may determine one or more SLE metrics and store the SLE metrics as network data 137 (fig. 1A) based on SLE related data received from UEs or client devices in the wireless network. NMS130 may further update a time map database 138 (fig. 1A) of the network to include telemetry data received from UEs or client devices in the wireless network over time, or at least entity and connection information extracted from the telemetry data.
NMS agent 456 is a software agent of NMS130 installed on UE 400. In some examples, NMS agent 456 may be implemented as a software application running on UE 400. NMS agent 456 gathers information from UE 400 including detailed client device attributes, including insight into the roaming behavior of UE 400. This information provides insight into the client roaming algorithm, as roaming is a client device decision. In some examples, NMS agent 456 may display client device attributes on UE 400. NMS agent 456 transmits client device attributes to NMS130 via the AP device to which UE 400 is connected. NMS agent 456 may be integrated into a custom application or as part of a location application. NMS agent 456 may be configured to identify the device connection type (e.g., cellular or Wi-Fi), and the corresponding signal strength. For example, NMS agent 456 identifies an access point connection and its corresponding signal strength. NMS agent 456 may store information specifying APs identified by UE 400 and their corresponding signal strengths. The NMS agent 456 or other element of the UE 400 also gathers information about the APs to which the UE 400 is connected, which also indicates that the UE 400 has no APs to which it is connected. The NMS agent 456 of the UE 400 transmits this information to the NMS130 through the AP to which it is connected in such a way that the UE 400 transmits not only information of the AP to which the UE 400 is connected but also information that the UE 400 recognizes other APs to which it is not connected, and their signal strengths. The AP in turn forwards this information to the NMS, including information of other APs identified by the UE 400 than the AP itself. This additional level of granularity enables NMS130, and ultimately the network administrator, to better determine Wi-Fi experiences directly from the perspective of the client device.
In some examples, NMS agent 456 further enriches the client device data utilized in the service level. For example, NMS agent 456 may provide additional details of attributes such as device type, manufacturer, and different versions of the operating system beyond basic fingerprinting. In the detailed client properties, NMS 130 may display the radio hardware and firmware information of UE 400 received from NMS client agent 456. The more details the NMS agent 456 can extract, the better the advanced device classification the VNA/AI engine gets. The VNA/AI engine of NMS 130 constantly learns and becomes more accurate in its ability to distinguish device-specific problems or wide range of device problems, for example, to explicitly identify that a particular operating system version is affecting certain clients.
In some examples, before NMS agent 456 can report the location of the device, client information, and network connection data to the NMS, NMS agent 456 may cause user interface 410 to display a prompt prompting the end user of UE 400 to enable location permissions. NMS agent 456 will then begin reporting connection data and location data to the NMS. In this way, the end user of the client device may control whether NMS agent 456 is enabled to report client device information to the NMS.
Fig. 5 is a block diagram illustrating an example network node 500 configured in accordance with the techniques described in this disclosure. In one or more examples, network node 500 implements a device or server (e.g., switch 146, AAA server 110, DHCP server 116, DNS server 122, web server 128, etc.) attached to network 134 of fig. 1A, or another network device (e.g., router 187) supporting one or more of wireless network 106, wired LAN175, or SD-WAN 177, or data center 179 of fig. 1B.
In this example, network node 500 includes a wired interface 502, such as an ethernet interface, one or more processors 506, input/output 508, such as a display, buttons, keyboard, keypad, touch screen, mouse, etc., and memory 512 coupled together by bus 514, various elements may exchange data and information over bus 514. A wired interface 502 couples the network node 500 to a network, such as an enterprise network. Although only one interface is shown by way of example, a network node may (and typically does) have multiple communication interfaces and/or multiple communication interface ports. The wired interface 502 includes a receiver 520 and a transmitter 522.
Memory 512 stores executable software applications 532, operating system 540, and data 530. The data 530 may include a system log and/or an error log that stores event data, including behavior data, for the network node 500. In examples where network node 500 includes a "third party" network device, the same entity does not own or access the AP or wired client-side device and network node 500. Thus, in examples where network node 500 is a third party network device, NMS 130 does not receive, collect, or otherwise access network data from network node 500.
In examples where network node 500 comprises a server, network node 500 may receive data and information, including, for example, operation-related information, such as registration requests, AAA services, DHCP requests, simple Notification Service (SNS) queries, and web page requests, through receiver 520, and send data and information, including, for example, configuration information, authentication information, web page data, etc., through transmitter 522.
In examples where network node 500 includes a wired network device, network node 500 may connect to one or more APs or other wired client-side devices (e.g., internet of things devices) within the wired network edge through wired interface 502. For example, network node 500 may include multiple wired interfaces 502 and/or wired interfaces 502 may include multiple physical ports to connect to multiple APs or other wired client-side devices within a site via respective ethernet cables. In some examples, each AP or other wired client-side device connected to network node 500 may access a wired network through wired interface 502 of network node 500. In some examples, one or more APs or other wired client-side devices connected to network node 500 may draw power from network node 500 through respective power over ethernet (PoE) ports of ethernet cable and wired interface 502.
In examples where network node 500 includes a session-based router (which employs a stateful, session-based routing scheme), network node 500 may be configured to independently perform path selection and traffic engineering. The use of session-based routing may enable network node 500 to avoid the use of a centralized controller (such as an SDN controller) to perform path selection and traffic engineering and avoid the use of tunnels. In some examples, network node 500 may implement session-based routing as a Security Vector Routing (SVR) (provided by Juniper Networks inc.). Where network node 500 includes a session-based router (e.g., router 187A of fig. 1B) that operates as a network gateway for a site of an enterprise network, network node 500 may establish a plurality of peer paths (e.g., logical path 189 of fig. 1B) over an underlying physical WAN (e.g., SD-WAN 177 of fig. 1B), with one or more other session-based routers (e.g., router 187B of fig. 1B) operating as network gateways for other sites of the enterprise network. The network node 500, operating as a session-based router, may collect peer path level data and report the peer path data to the NMS 130.
In examples where network node 500 includes a packet-based router, network node 500 may employ a packet-based or flow-based routing scheme to forward packets according to defined network paths (e.g., network paths established by a centralized controller performing path selection and traffic engineering). Where network node 500 includes a packet-based router (e.g., router 187A of fig. 1B) that operates as a network gateway for a site of an enterprise network, network node 500 may establish a plurality of tunnels (e.g., logical path 189 of fig. 1B) over an underlying physical WAN (e.g., SD-WAN 177 of fig. 1B), where one or more other packet-based routers (e.g., router 187B of fig. 1B) operate as network gateways for other sites of the enterprise network. The network node 500, operating as a packet-based router, may collect data at the tunnel level, which may be retrieved by the NMS 130 through an API or open configuration protocol, or which may be reported to the NMS 130 by the NMS agent 544 or other module operating on the network node 500.
The data collected and reported by the network node 500 may include periodically reported data and event driven data. The network node 500 is configured to collect logical path statistics and data extracted from messages and/or counters at the level of logical paths (e.g., peer paths or tunnels) through Bidirectional Forwarding Detection (BFD) detection. In some examples, the network node 500 is configured to collect statistics and/or sample other data based on a first week interval (e.g., every 3 seconds, every 5 seconds, etc.). The network node 500 may store the collected and sampled data as path data (e.g., in a buffer). In some examples, NMS agent 544 may periodically create packets of statistics based on a second periodic interval (e.g., every 3 minutes). The collected and sampled data that is reported periodically in packets of statistical data may be referred to herein as "oc-stats".
In some examples, the packet of statistics may also include detailed information of clients and related client sessions connected to the network node 500. NMS agent 544 may then report the package of statistics to NMS 130 in the cloud. In other examples, NMS 130 may request, retrieve, or otherwise receive packets of statistics from network node 500 through an API, open configuration protocol, or other communication protocol. The packet of statistics created by NMS agent 544 or another module of network node 500 may include statistics and data samples identifying the header of network node 500 and each logical path from network node 500. In other examples, upon occurrence of a particular event, NMS agent 544 reports event data to NMS 130 in the cloud in response to the occurrence of the particular event of network node 500. Event driven data may be referred to herein as "oc-events".
Fig. 6A-6C illustrate an example user interface of NMS 130/300 for visualizing an application session specific topology and related troubleshooting workflows for network gateway devices involved in an application session.
Fig. 6A illustrates an example session assistant user interface 600 including a query or user input 610 from an administrator initiating topology visualization and troubleshooting of a particular application session through the management device 111, and a response or output 612, 614 generated by the session assistant engine 136, 356. In the example of FIG. 6A, user input 610 to session assistant user interface 600 includes a string indicating an application and a device identifier (i.e., "troubleshooting teams call from client device A," where "teams call" indicates an application and "client device A" contains a client device identifier). The session assistant engine 136, 356 may be based on an indicated application (Microsoft in this example ) And a default duration (the last 7 days in this example) to automatically filter application sessions for a particular network device. Response 612 in session assistant user interface 600 includes a statement "troubleshooting Teams. The following is a string of content I find between 9 months 30 and 10 months 7. In addition, the session assistant user interface 600 will output 614 as a list of all application sessions (Teams calls in this example) for a particular network device for a default duration. For example, application session 620 includes 12 pm from 10 months 7 days: 01-1 pm: 03 Teams call for a particular network device (client device a in this example).
Fig. 6B shows a further example of a session assistant user interface 600, including a user input 640 (in this example, "troubleshooting application Teams") based on selection of an application session 620 contained in the output 614 of fig. 6A, and a network health output 642 (in this example, "Teams call from client device a 12:01-1:03 afternoon from 10 months 7) of the application session 620. In the example shown in fig. 6B, as part of the network health output 642, the session assistant user interface 600 presents an application session specific topology 644 generated by the session assistant engine for the selected application session 620.
Application session specific topology 644 includes the client devices ("guests") running the application
User equipment a "), AP equipment (" basements "), third party switches (" unnamed switches "), 5-spoke routers (" offices "), central routers (" head-ends "), and cloud-based application services
A "Teams server") and further provides an address for each network device in the application session specific topology. In addition, as shown in fig. 6B, the application session specific topology 644 includes indicia of the performance or connection health of each network device. For example, in one or more network devices (e.g.
For example, the inclusion of a green circle on a client device ("client device a"), a third party switch ("unnamed switch"), a 0-center router ("headend") and an application server ("Teams server") may indicate that these network devices have no known problems during an application session. Orange or red triangles with exclamation marks contained on one or more network devices (e.g., an AP device ("basement") and a spoke router ("office") 646) may indicate that there are moderate or severe connectivity issues with these application devices during an application session, respectively.
Fig. 5-6C illustrates another example of a session assistant user interface 600, including an application program session
The network health output 642 of session 620 (in this example, "Teams call for client device a from 12:01-1:03 pm 10 months 7) includes a troubleshooting user interface 648 based on selecting the spoke router (" office ") 646 in the application session specific topology 644 in fig. 6B.
The troubleshooting user interface 648 for the spoke router 646 related to the application session 620 includes: 0 connection problem of the spoke router 646 as determined by the application session troubleshooting engine 135/352
The root cause. In the example shown in FIG. 6C, the troubleshooting user interface 648 indicates: the first root cause is the "high CPU control plane" due to high control plane CPU utilization on the gateway (spoke router ("office") 646), the second root cause is due to the gateway (spoke router
A "high network latency" due to a 5 high latency between the server ("office") 646 and the cloud (application servers ("Teams servers").
Fig. 7A-7B illustrate example user interfaces of NMS 130/300 for visualizing application-specific topologies and related troubleshooting workflows for AP devices involved in an application session.
Fig. 7A illustrates an example session assistant user interface 700 that includes a query or user input 710 for an administrator to initiate topology visualization and troubleshooting of a particular application session through the management device 111, and a response or output 712, 714 generated by the session assistant engines 136, 356. In the example of fig. 7A, user input 710 to session assistant user interface 700 includes a designation application and a user identifier (i.e., "troubleshooting user B teams call," where "user B" is a user of the client device and "teams call" designates application). The session assistant engine 136, 356 may be based on the application program indicated (Microsoft in this example) Application sessions of one or more network devices associated with the user are automatically filtered to identify one or more application sessions of network devices associated with the user that have recently experienced a connection problem. Response 712 within session assistant user interface 700 includes a statement "user B-iphoneThe MS-Teams call experience is poor, mainly due to the exceptionally high latency' string on the gateway spoke router.
In addition, the session assistant user interface 700 presents an output 714 for the identified application session (in this example, "MS-Teams call from client user B-iphone"). In the example shown in fig. 7A, as part of output 714, session assistant user interface 700 presents application session specific topology 716 generated by session assistant engines 136, 356 for the identified application session. Application session specific topology 716 includes a client device ("iPhone"), two AP devices ("wireless") 718, a switch ("wired"), a gateway ("WAN"), and a cloud-based application server ("Teams") that run the application. Further, as shown in fig. 7A, the application session specific topology 716 includes indicia of the performance or connection health of each network device. For example, a green circle with a pair of hooks contained on one or more network devices (e.g., client devices ("iphones"), switches ("wires"), gateways ("WANs"), and application servers ("Teams")) may indicate: during an application session, these network devices have no known problems. Orange triangles with exclamation marks contained on one or more network devices (e.g., AP devices ("wireless") 718) may each represent a moderate connection problem for these application devices during an application session.
Fig. 7B illustrates a further example of a session assistant user interface 700 including an output 714 for an identified application session (in this example, "MS-Teams call from client user B-iphone"), including a troubleshooting user interface 720 based on selecting an AP device ("wireless") 718 in the application session specific topology 716 in fig. 7A. The troubleshooting user interface 720 for the first AP device ("office-united states") 718A involved in the identified application session includes: the root cause of the connection problem at the first AP device 718A as determined by the application session troubleshooting engine 135/352. In the example shown in fig. 7B, the troubleshooting user interface 720 for the first AP router 718A indicates: the first root cause is "slow-to-join" due to the invalidation of the FTE when the client (first AP router 718A) is offline, and the second root cause is "poor coverage" due to the asymmetry of the uplink coverage. The troubleshooting user interface 720 for the second AP device ("AP-45 kitchen") 718B involved in the identified application session will also include the root cause of the connection problem for the second AP device 718B as determined by the application session troubleshooting engine 135/352 (if extended).
Fig. 8 shows an example user interface of the NMS 300 for visualizing an application session specific topology comprising a service provider server.
Fig. 8 illustrates an example session assistant user interface 750, including queries or user inputs 760 from an administrator initiating topology visualization and troubleshooting of a particular application session through the management device 111, and output 762 generated by the session assistant engine. In the example of fig. 8, user input 760 to session assistant user interface 750 includes a string of characters specifying an application, a user identifier, and a date (i.e., "troubleshooting application ms-teams of 4 months 14 days client device C," where "ms-teams" specifies an application, "client device C" specifies a client device identifier, and "4 months 14 days" specifies a date). The session assistant engine 136, 356 may be based on the application program indicated (Microsoft in this example) Application sessions of one or more network devices associated with the user are automatically filtered to identify one or more application sessions of the network device associated with connection problems experienced by the user on a specified date.
The session assistant user interface 750 presents an output 762 for the identified application session. In the example shown in fig. 8, as part of output 762, the session assistant user interface 750 presents an application session specific topology 764 generated by the session assistant engine 136, 356 for the identified application session. The application session specific topology 764 includes a client device ("client device C"), a wireless AP device ("wireless"), a wired switch ("wired"), two gateway devices ("WAN"), a service provider server ("Comcast Cable") 766, and a cloud-based application server ("MS-Teams") that run the application. Further, as shown in fig. 8, the application session specific topology 764 includes indicia of the performance or connection health of each network device. For example, a green circle with a pair of hooks on one or more network devices (e.g., client device ("client device C"), service provider server ("Comcast Cable") 766, and application server ("MS-Teams")) may indicate that these network devices have no known problems during an application session. Gray or red triangles with exclamation marks contained on one or more network devices (e.g., wired switches ("wired"), AP devices ("wireless"), and gateway devices ("WAN")) may indicate moderate or severe connection problems at these application devices during an application session, respectively.
To determine and graphically represent a connection problem for a third party network device (e.g., service provider server 766), the application session troubleshooting engine 136/352 may query the third party APM provider for insight data for the service provider server 766. In some examples, the application session troubleshooting engine 136/352 can perform on-demand queries for insight data of the service provider server 766 on third-party APM providers in response to user input 760 requesting troubleshooting of the identified application session. In other examples, the application session troubleshooting engine 136/352 may actively perform queries for third-party APM providers against the insight data of the service provider server 766 to monitor the service provider server 766. Based on the retrieved insight data, the session assistant engine 136, 356 can generate a performance or connection health flag for network devices related to the identified application session (including third party network devices such as the service provider server 766).
Fig. 9 shows an example user interface of a Network Management System (NMS) 130/300 for visualizing a troubleshooting workflow for a set of application sessions for a particular AP device.
Fig. 9 illustrates an example session assistant user interface 800 including a troubleshooting query or user input 810 from an administrator initiating an application session associated with a particular network device through the management device 111, and responses or outputs 812, 814 generated by the session assistant engines 136, 356. In the example of fig. 9, the user input 810 to the session assistant user interface 800 includes a string indicating a device identifier and a duration (i.e., "how recently 7 days DC84AP544," where "DC84AP544" includes an AP device identifier and "7 days" indicates a duration). The session assistant engine 136, 356 may automatically filter application sessions for a particular network device based on the indicated duration (in this example, the last 7 days) to identify one or more application sessions for network devices experiencing connectivity problems within the last 7 days.
The response 812 in the session assistant user interface 800 includes a statement "check DC84AP544. The following is a string of what i find is between 11 months 10 and 11 months 17 ". The session assistant user interface 800 also includes a troubleshooting user interface 814 for the AP device ("DC 84AP 544") as a root cause list for the AP device ("DC 84AP 544") connectivity problem determined by the application session troubleshooting engine 135/352. In the example shown in fig. 9, the troubleshooting user interface 814 indicates: the first root cause is "slow association" because the clients of the AP associate slowly due to transmission failure, the second root cause is "AP disconnected" because the AP disconnects from the cloud due to no ethernet link (configuration failure), but the AP is currently on-line, and the third root cause is "poor coverage" due to weak signal strength.
FIG. 10 is a flow diagram illustrating example operations for providing granular troubleshooting workflows at an application session level using an application session specific topology from a client device to a cloud-based application server, based on one or more techniques of the present disclosure. The example operation of fig. 10 is described herein with respect to NMS 300 of fig. 3. In other examples, the operations of fig. 10 may be performed by other computing devices (e.g., NMS 130 of fig. 1A-1B).
NMS 300 receives a query identifying an application session of an application running on a client device, wherein the client device comprises one of a plurality of network devices configured to provide client-to-cloud connectivity in a network (902). NMS 300 may associate application flow data from the plurality of application flows of the identified application session and determine a subset of network devices from the plurality of network devices involved in the application session for the duration of the application session based on the application flow data of the application session. NMS 300 then retrieves entity information and connection information for the application session from tense map database 317 based on the determined subset of network devices (907). The entity information represents a subset of network devices involved in an application session for the duration of the application session and is stored as nodes of the time graph database 317. The connection information represents the connections between the subset of network devices for the duration of the application session and is stored as an edge of the time graph database 317.
The tense map database 317 represents a history of at least a portion of the network at an application session level granularity over a period of time. In some examples, NMS 300 may extract entity information and connection information from telemetry data within network data 316 received from a subset of network devices for the duration of an application session and update temporal graph database 317 with the entity information and connection information for the application session.
NMS 300 generates an application session specific topology for the application session based on the entity information and the connection information for the application session (912). The application session specific topology may include a historical view of a subset of network devices and connections between the subset of network devices for the duration of the application session. NMS 300 identifies at least one connectivity problem within the subset of network devices during the application session based at least on network data received from the subset of network devices during the application session (917). The NMS 300 may perform root cause analysis to determine a root cause of at least one connectivity problem within a subset of network devices during an application session. NMS 300 may analyze network data 316 received from the subset of network devices during the application session to identify one or more third party application servers, third party service provider servers, or third party network devices involved in the application session for the duration of the application session. NMS 300 may then retrieve the network data from one or more third party application servers, third party service provider servers, or third party network devices.
NMS 300 generates data representing user interface 310 for presentation on an administrator device, user interface 310 including a visualization of an application session specific topology over an application session duration, the visualization including indicia of at least one connection problem (922). In some examples, nms 300 generates an icon representing at least one network device in the application session specific topology for generating user interface 310 with an indication of at least one connectivity problem for the duration of the application session, wherein the indication of at least one connectivity problem comprises at least one of a color, shape, or symbol. In response to receiving a user input selecting an icon representing a network device having an indication of at least one connectivity problem, NMS 300 may further generate data representing a troubleshooting user interface for presentation on an administrator device, the troubleshooting user interface including at least one indication of a root cause of the at least one connectivity problem for the network device. In some further examples, NMS 300 may generate data representing a session assistant user interface, including a platform configured to receive queries identifying application sessions, present an application session-specific topology, and receive user input interacting with the application session-specific topology.
The techniques described herein may be implemented in hardware, software, firmware, or any combination thereof. The various features described as modules, units, or components may be implemented together in an integrated logic device or separately as discrete but interoperable logic devices or other hardware devices. In some cases, various features of the electronic circuit may be implemented as one or more integrated circuit devices, such as an integrated circuit chip or chip set.
If implemented in hardware, this disclosure may relate to an apparatus such as a processor or integrated circuit device, such as an integrated circuit chip or chipset. Alternatively or additionally, if implemented in software or firmware, the techniques may be realized at least in part by a computer-readable data storage medium containing instructions that, when executed, cause a processor to perform one or more of the methods described above. For example, a computer-readable data storage medium may store such instructions for execution by a processor.
The computer readable medium may form part of a computer program product, which may include packaging material. The computer readable medium may include computer data storage media such as Random Access Memory (RAM), read Only Memory (ROM), non-volatile random access memory (NVRAM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory, magnetic or optical data storage media, and the like. In some examples, an article of manufacture may comprise one or more computer-readable storage media.
In some examples, the computer-readable storage medium may include a non-transitory medium. The term "non-transitory" may mean that the storage medium is not embodied as a carrier wave or propagated signal. In some examples, a non-transitory storage medium may store data over time (e.g., in RAM or cache).
The code or instructions may be software and/or firmware executed by processing circuitry comprising one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Thus, the term "processor," as used herein, may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. Furthermore, in certain aspects, the functionality described in this disclosure may be provided in software modules or hardware modules.

Claims (20)

1. A network management system, comprising:
a memory storing network data received from a plurality of network devices, wherein the network devices are configured to provide client-to-cloud connectivity in a network; and
one or more processors coupled with the memory, and configured to:
Receiving a query identifying an application session of an application running on a client device, wherein the client device comprises one of the plurality of network devices;
retrieving entity information and connection information for the application session from a time graph database, wherein the entity information represents a subset of network devices of the plurality of network devices involved in the application session for the duration of the application session and the entity information is stored as nodes of the time graph database, wherein the connection information represents connections between the subset of network devices for the duration of the application session and the connection information is stored as edges of the time graph database, and wherein the time graph database represents a history of the network at least in part at application session level granularity over a period of time;
generating an application session specific topology for the application session based on the entity information and the connection information for the application session;
identifying at least one connection problem within the subset of network devices during the application session based at least on network data received from the subset of network devices during the application session; and
Data representative of a user interface for presentation on an administrator device is generated, the user interface comprising a visualization of the application session specific topology for the duration of the application session, the visualization comprising an indication of the at least one connection problem.
2. The network management system of claim 1, wherein the one or more processors are configured to perform root cause analysis to determine a root cause of the at least one connectivity problem within the subset of network devices during the application session.
3. The network management system of claim 1, wherein the one or more processors are configured to analyze the network data received from the subset of network devices during the application session to identify one or more third party application servers, third party service provider servers, or third party network devices involved in the application session for the duration of the application session.
4. The network management system of claim 3, wherein the one or more processors are configured to retrieve insight data for the one or more third party application servers, third party service provider servers, or third party network devices from a third party monitoring provider.
5. The network management system of claim 1, wherein the application session specific topology comprises a connection between the subset of network devices and a historical view of the subset of network devices for the duration of the application session.
6. The network management system of claim 1, wherein to generate data representative of the user interface, the one or more processors are configured to: an icon representing at least one network device in the application session specific topology having an indication of the at least one connection problem for a duration of the application session is generated, wherein the indication of the at least one connection problem comprises at least one of a color, a shape, or a symbol.
7. The network management system of claim 6, wherein the one or more processors are configured to: in response to receiving user input selecting an icon representing a network device having an indication of the at least one connectivity problem, data representing a troubleshooting user interface for presentation on an administrator device is generated, the troubleshooting user interface including at least one indication of a root cause of the at least one connectivity problem at the network device.
8. The network management system of claim 1, wherein the one or more processors are configured to generate data representative of a session assistant user interface comprising a platform configured to receive the query identifying the application session, present the application session-specific topology, and receive user input interacting with the application session-specific topology.
9. The network management system of any of claims 1 to 8, wherein the one or more processors are configured to:
application flow data associated with a plurality of application flows from the application session; and
determining the subset of the plurality of network devices involved in the application session for the duration of the application session based on the application flow data of the application session,
wherein the one or more processors are configured to retrieve the entity information and connection information for the application session from the time graph database based on the determined subset of network devices.
10. The network management system of any of claims 1 to 8, wherein the one or more processors are configured to:
Extracting the entity information and the connection information from telemetry data within network data received from the subset of network devices for the duration of the application session; and
updating the temporal map database with the connection information and entity information for the application session.
11. A network management method, comprising:
receiving, by a network management system, a query identifying an application session of an application running on a client device, wherein the client device comprises one of a plurality of network devices configured to provide client-to-cloud connectivity in a network;
retrieving, by the network management device, entity information and connection information for the application session from a time graph database, wherein the entity information represents a subset of network devices of the plurality of network devices involved in the application session for the duration of the application session and the entity information is stored as nodes of the time graph database, wherein the connection information represents connections between the subset of network devices for the duration of the application session and the connection information is stored as edges of the time graph database, and wherein the time graph database represents a history of the network at least in part at application session level granularity over a period of time;
Generating, by the network management device, an application session specific topology for the application session based on the entity information and the connection information for the application session;
identifying, by the network management device, at least one connectivity problem within the subset of network devices during the application session based at least on network data received from the subset of network devices during the application session; and
data representative of a user interface for presentation on an administrator device is generated by the network management device, the user interface including a visualization of the application session specific topology for the duration of the application session, the visualization including an indication of the at least one connection problem.
12. The network management method of claim 11, further comprising: root cause analysis is performed to determine a root cause of the at least one connection problem within the subset of network devices during the application session.
13. The network management method of claim 11, further comprising: the network data received from the subset of network devices during the application session is analyzed to identify one or more third party application servers, third party service provider servers, or third party network devices involved in the application session for the duration of the application session.
14. The network management method of claim 13, further comprising: insight data for the one or more third party application servers, third party service provider servers, or third party network devices is retrieved from a third party monitoring provider.
15. The network management method of claim 11, wherein the application session specific topology includes connections between the subset of network devices and a historical view of the subset of network devices for the duration of the application session.
16. The network management method of claim 11, wherein generating data representative of the user interface comprises: an icon representing at least one network device in the application session specific topology having an indication of the at least one connection problem for a duration of the application session is generated, wherein the indication of the at least one connection problem comprises at least one of a color, a shape, or a symbol.
17. The network management method of claim 16, further comprising: in response to receiving user input selecting an icon representing a network device having an indication of the at least one connectivity problem, data representing a troubleshooting user interface for presentation on an administrator device is generated, the troubleshooting user interface including at least one indication of a root cause of the at least one connectivity problem at the network device.
18. The network management method according to any one of claims 11 to 17, further comprising:
application flow data associated with a plurality of application flows from the application session; and
determining the subset of the plurality of network devices involved in the application session for the duration of the application session based on the application flow data of the application session,
wherein retrieving the entity information and connection information for the application session comprises: retrieving the entity information and connection information for the application session from the time graph database based on the determined subset of network devices.
19. The network management method according to any one of claims 11 to 17, further comprising:
extracting the entity information and the connection information from telemetry data within network data received from the subset of network devices for the duration of the application session; and
updating the temporal map database with the connection information and entity information for the application session.
20. A computer-readable storage medium encoded with instructions for causing one or more processors of a network management system to perform the network management method of any of claims 11-19.
CN202211526051.9A 2022-01-14 2022-12-01 Application session specific network topology generation for application session failover Pending CN116455758A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US63/299,733 2022-01-14
US17/935,704 2022-09-27
US17/935,704 US11968075B2 (en) 2022-01-14 2022-09-27 Application session-specific network topology generation for troubleshooting the application session

Publications (1)

Publication Number Publication Date
CN116455758A true CN116455758A (en) 2023-07-18

Family

ID=87126244

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211526051.9A Pending CN116455758A (en) 2022-01-14 2022-12-01 Application session specific network topology generation for application session failover

Country Status (1)

Country Link
CN (1) CN116455758A (en)

Similar Documents

Publication Publication Date Title
US20230231785A1 (en) Application service level expectation health and performance
CN115733727A (en) Network management system and method for enterprise network and storage medium
US12003363B2 (en) Automatically troubleshooting and remediating network issues via connected neighbors
US20230126313A1 (en) Collecting client data for wireless client devices
EP4080850A1 (en) Onboarding virtualized network devices to cloud-based network assurance system
US11968075B2 (en) Application session-specific network topology generation for troubleshooting the application session
CN115955690A (en) Wireless signal strength based detection of poor network link performance
CN116455758A (en) Application session specific network topology generation for application session failover
US20240137289A1 (en) Conversational assistant for troubleshooting a site
US20230231776A1 (en) Conversational assistant dialog design
CN117917877A (en) Dialogue assistant for site troubleshooting
US20240187302A1 (en) Network anomaly detection and mitigation
WO2023137374A1 (en) Conversational assistant dialog design
US11991046B2 (en) Determining an organizational level network topology
US12021722B2 (en) Detecting network events having adverse user impact
US20230308374A1 (en) Detecting network events having adverse user impact
US20230125903A1 (en) Location metrics for monitoring or control of wireless networks
US11973640B1 (en) Physical layer issue detection based on client-side behavior assessments
US20230403305A1 (en) Network access control intent-based policy configuration
US20240154970A1 (en) Applying security policies based on endpoint and user attributes
CN116800579A (en) Detecting network events with non-user effects
CN116760557A (en) Closed loop network provisioning based on network access control fingerprinting
CN116455759A (en) Determining organization level network topology
CN117240490A (en) Network access control system, network access control method, and storage medium
CN117331598A (en) Software image score for recommended software images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication